Merge ZFS version 15 and almost all OpenSolaris bugfixes referenced

in Solaris 10 updates 141445-09 and 142901-14.

Detailed information:
(OpenSolaris revisions and Bug IDs, Solaris 10 patch numbers)

7844:effed23820ae
6755435	zfs_open() and zfs_close() needs to use ZFS_ENTER/ZFS_VERIFY_ZP (141445-01)

7897:e520d8258820
6748436	inconsistent zpool.cache in boot_archive could panic a zfs root filesystem upon boot-up (141445-01)

7965:b795da521357
6740164	zpool attach can create an illegal root pool (141909-02)

8084:b811cc60d650
6769612	zpool_import() will continue to write to cachefile even if altroot is set (N/A)

8121:7fd09d4ebd9c
6757430	want an option for zdb to disable space map loading and leak tracking (141445-01)

8129:e4f45a0bfbb0
6542860	ASSERT: reason != VDEV_LABEL_REMOVE||vdev_inuse(vd, crtxg, reason, 0) (141445-01)

8188:fd00c0a81e80
6761100	want zdb option to select older uberblocks (141445-01)

8190:6eeea43ced42
6774886	zfs_setattr() won't allow ndmp to restore SUNWattr_rw (141445-01)

8225:59a9961c2aeb
6737463	panic while trying to write out config file if root pool import fails (141445-01)

8227:f7d7be9b1f56
6765294	Refactor replay (141445-01)

8228:51e9ca9ee3a5
6572357	libzfs should do more to avoid mnttab lookups (141909-01)
6572376	zfs_iter_filesystems and zfs_iter_snapshots get objset stats twice (141909-01)

8241:5a60f16123ba
6328632	zpool offline is a bit too conservative (141445-01)
6739487	ASSERT: txg <= spa_final_txg due to scrub/export race (141445-01)
6767129	ASSERT: cvd->vdev_isspare, in spa_vdev_detach() (141445-01)
6747698	checksum failures after offline -t / export / import / scrub (141445-01)
6745863	ZFS writes to disk after it has been offlined (141445-01)
6722540	50% slowdown on scrub/resilver with certain vdev configurations (141445-01)
6759999	resilver logic rewrites ditto blocks on both source and destination (141445-01)
6758107	I/O should never suspend during spa_load() (141445-01)
6776548	codereview(1) runs off the page when faced with multi-line comments (N/A)
6761406	AMD errata 91 workaround doesn't work on 64-bit systems (141445-01)

8242:e46e4b2f0a03
6770866	GRUB/ZFS should require physical path or devid, but not both (141445-01)

8269:03a7e9050cfd
6674216	"zfs share" doesn't work, but "zfs set sharenfs=on" does (141445-01)
6621164	$SRC/cmd/zfs/zfs_main.c seems to have a syntax error in the translation note (141445-01)
6635482	i18n problems in libzfs_dataset.c and zfs_main.c (141445-01)
6595194	"zfs get" VALUE column is as wide as NAME (141445-01)
6722991	vdev_disk.c: error checking for ddi_pathname_to_dev_t() must test for NODEV (141445-01)
6396518	ASSERT strings shouldn't be pre-processed (141445-01)

8274:846b39508aff
6713916	scrub/resilver needlessly decompress data (141445-01)

8343:655db2375fed
6739553	libzfs_status msgid table is out of sync (141445-01)
6784104	libzfs unfairly rejects numerical values greater than 2^63 (141445-01)
6784108	zfs_realloc() should not free original memory on failure (141445-01)

8525:e0e0e525d0f8
6788830	set large value to reservation cause core dump (141445-01)
6791064	want sysevents for ZFS scrub (141445-01)
6791066	need to be able to set cachefile on faulted pools (141445-01)
6791071	zpool_do_import() should not enable datasets on faulted pools (141445-01)
6792134	getting multiple properties on a faulted pool leads to confusion (141445-01)

8547:bcc7b46e5ff7
6792884	Vista clients cannot access .zfs (141445-01)

8632:36ef517870a3
6798384	It can take a village to raise a zio (141445-01)

8636:7e4ce9158df3
6551866	deadlock between zfs_write(), zfs_freesp(), and zfs_putapage() (141909-01)
6504953	zfs_getpage() misunderstands VOP_GETPAGE() interface (141909-01)
6702206	ZFS read/writer lock contention throttles sendfile() benchmark (141445-01)
6780491	Zone on a ZFS filesystem has poor fork/exec performance (141445-01)
6747596	assertion failed: DVA_EQUAL(BP_IDENTITY(&zio->io_bp_orig), BP_IDENTITY(zio->io_bp))); (141445-01)

8692:692d4668b40d
6801507	ZFS read aggregation should not mind the gap (141445-01)

8697:e62d2612c14d
6633095	creating a filesystem with many properties set is slow (141445-01)

8768:dfecfdbb27ed
6775697	oracle crashes when overwriting after hitting quota on zfs (141909-01)

8811:f8deccf701cf
6790687	libzfs mnttab caching ignores external changes (141445-01)
6791101	memory leak from libzfs_mnttab_init (141445-01)

8845:91af0d9c0790
6800942	smb_session_create() incorrectly stores IP addresses (N/A)
6582163	Access Control List (ACL) for shares (141445-01)
6804954	smb_search - shortname field should be space padded following the NULL terminator (N/A)
6800184	Panic at smb_oplock_conflict+0x35() (N/A)

8876:59d2e67b4b65
6803822	Reboot after replacement of system disk in a ZFS mirror drops to grub> prompt (141445-01)

8924:5af812f84759
6789318	coredump when issue zdb -uuuu poolname/ (141445-01)
6790345 zdb -dddd -e poolname coredump (141445-01)
6797109 zdb: 'zdb -dddddd pool_name/fs_name inode' coredump if the file with inode was deleted (141445-01)
6797118 zdb: 'zdb -dddddd poolname inum' coredump if I miss the fs name (141445-01)
6803343 shareiscsi=on failed, iscsitgtd failed request to share (141445-01)

9030:243fd360d81f
6815893	hang mounting a dataset after booting into a new boot environment (141445-01)

9056:826e1858a846
6809691	'zpool create -f' no longer overwrites ufs infomation (141445-01)

9179:d8fbd96b79b3
6790064	zfs needs to determine uid and gid earlier in create process (141445-01)

9214:8d350e5d04aa
6604992	forced unmount + being in .zfs/snapshot/<snap1> = not happy (141909-01)
6810367	assertion failed: dvp->v_flag & VROOT, file: ../../common/fs/gfs.c, line: 426 (141909-01)

9229:e3f8b41e5db4
6807765	ztest_dsl_dataset_promote_busy needs to clean up after ENOSPC (141445-01)

9230:e4561e3eb1ef
6821169	offlining a device results in checksum errors (141445-01)
6821170	ZFS should not increment error stats for unavailable devices (141445-01)
6824006	need to increase issue and interrupt taskqs threads in zfs (141445-01)

9234:bffdc4fc05c4
6792139	recovering from a suspended pool needs some work (141445-01)
6794830	reboot command hangs on a failed zfs pool (141445-01)

9246:67c03c93c071
6824062	System panicked in zfs_mount due to NULL pointer dereference when running btts and svvs tests (141909-01)

9276:a8a7fc849933
6816124	System crash running zpool destroy on broken zpool (141445-03)

9355:09928982c591
6818183	zfs snapshot -r is slow due to set_snap_props() doing txg_wait_synced() for each new snapshot (141445-03)

9391:413d0661ef33
6710376	log device can show incorrect status when other parts of pool are degraded (141445-03)

9396:f41cf682d0d3 (part already merged)
6501037	want user/group quotas on ZFS (141445-03)
6827260	assertion failed in arc_read(): hdr == pbuf->b_hdr (141445-03)
6815592	panic: No such hold X on refcount Y from zfs_znode_move (141445-03)
6759986	zfs list shows temporary %clone when doing online zfs recv (141445-03)

9404:319573cd93f8
6774713	zfs ignores canmount=noauto when sharenfs property != off (141445-03)

9412:4aefd8704ce0
6717022	ZFS DMU needs zero-copy support (141445-03)

9425:e7ffacaec3a8
6799895	spa_add_spares() needs to be protected by config lock (141445-03)
6826466	want to post sysevents on hot spare activation (141445-03)
6826468	spa 'allowfaulted' needs some work (141445-03)
6826469	kernel support for storing vdev FRU information (141445-03)
6826470	skip posting checksum errors from DTL regions of leaf vdevs (141445-03)
6826471	I/O errors after device remove probe can confuse FMA (141445-03)
6826472	spares should enjoy some of the benefits of cache devices (141445-03)

9443:2a96d8478e95
6833711	gang leaders shouldn't have to be logical (141445-03)

9463:d0bd231c7518
6764124	want zdb to be able to checksum metadata blocks only (141445-03)

9465:8372081b8019
6830237	zfs panic in zfs_groupmember() (141445-03)

9466:1fdfd1fed9c4
6833162	phantom log device in zpool status (141445-03)

9469:4f68f041ddcd
6824968	add ZFS userquota support to rquotad (141445-03)

9470:6d827468d7b5
6834217	godfather I/O should reexecute (141445-03)

9480:fcff33da767f
6596237	Stop looking and start ganging (141909-02)

9493:9933d599bc93
6623978	lwb->lwb_buf != NULL, file ../../../uts/common/fs/zfs/zil.c, line 787, function zil_lwb_commit (141445-06)

9512:64cafcbcc337
6801810	Commit of aligned streaming rewrites to ZIL device causes unwanted disk reads (N/A)

9515:d3b739d9d043
6586537	async zio taskqs can block out userland commands (142901-09)

9554:787363635b6a
6836768	zfs_userspace() callback has no way to indicate failure (N/A)

9574:1eb6a6ab2c57
6838062	zfs panics when an error is encountered in space_map_load() (141909-02)

9583:b0696cd037cc
6794136	Panic BAD TRAP: type=e when importing degraded zraid pool. (141909-03)

9630:e25a03f552e0
6776104	"zfs import" deadlock between spa_unload() and spa_async_thread() (141445-06)

9653:a70048a304d1
6664765	Unable to remove files when using fat-zap and quota exceeded on ZFS filesystem (141445-06)

9688:127be1845343
6841321	zfs userspace / zfs get userused@ doesn't work on mounted snapshot (N/A)
6843069	zfs get userused@S-1-... doesn't work (N/A)

9873:8ddc892eca6e
6847229	assertion failed: refcount_count(&tx->tx_space_written) + delta <= tx->tx_space_towrite in dmu_tx.c (141445-06)

9904:d260bd3fd47c
6838344	kernel heap corruption detected on zil while stress testing (141445-06)

9951:a4895b3dd543
6844900	zfs_ioc_userspace_upgrade leaks (N/A)

10040:38b25aeeaf7a
6857012	zfs panics on zpool import (141445-06)

10000:241a51d8720c
6848242	zdb -e no longer works as expected (N/A)

10100:4a6965f6bef8
6856634	snv_117 not booting: zfs_parse_bootfs: error2 (141445-07)

10160:a45b03783d44
6861983	zfs should use new name <-> SID interfaces (N/A)
6862984	userquota commands can hang (141445-06)

10299:80845694147f
6696858	zfs receive of incremental replication stream can dereference NULL pointer and crash (N/A)

10302:a9e3d1987706
6696858	zfs receive of incremental replication stream can dereference NULL pointer and crash (fix lint) (N/A)

10575:2a8816c5173b (partial merge)
6882227 spa_async_remove() shouldn't do a full clear (142901-14)

10800:469478b180d9
6880764	fsync on zfs is broken if writes are greater than 32kb on a hard crash and no log attached (142901-09)
6793430 zdb -ivvvv assertion failure: bp->blk_cksum.zc_word[2] == dmu_objset_id(zilog->zl_os) (N/A)

10801:e0bf032e8673 (partial merge)
6822816 assertion failed: zap_remove_int(ds_next_clones_obj) returns ENOENT (142901-09)

10810:b6b161a6ae4a
6892298 buf->b_hdr->b_state != arc_anon, file: ../../common/fs/zfs/arc.c, line: 2849 (142901-09)

10890:499786962772
6807339	spurious checksum errors when replacing a vdev (142901-13)

11249:6c30f7dfc97b
6906110 bad trap panic in zil_replay_log_record (142901-13)
6906946 zfs replay isn't handling uid/gid correctly (142901-13)

11454:6e69bacc1a5a
6898245 suspended zpool should not cause rest of the zfs/zpool commands to hang (142901-10)

11546:42ea6be8961b (partial merge)
6833999 3-way deadlock in dsl_dataset_hold_ref() and dsl_sync_task_group_sync() (142901-09)

Discussed with:	pjd
Approved by:	delphij (mentor)
Obtained from:	OpenSolaris (multiple Bug IDs)
MFC after:	2 months
This commit is contained in:
Martin Matuska 2010-07-12 23:49:04 +00:00
parent fd435d5c06
commit 8fc257994d
Notes: svn2git 2020-12-20 02:59:44 +00:00
svn path=/head/; revision=209962
129 changed files with 10788 additions and 4785 deletions

View File

@ -23,6 +23,13 @@ NOTE TO PEOPLE WHO THINK THAT FreeBSD 9.x IS SLOW:
ln -s aj /etc/malloc.conf.)
20100713:
A new version of ZFS (version 15) has been merged to -HEAD.
This version uses a python library for the following subcommands:
zfs allow, zfs unallow, zfs groupspace, zfs userspace.
For full functionality of these commands the following port must
be installed: sysutils/py-zfs
20100429:
'vm_page's are now hashed by physical address to an array of mutexes.
Currently this is only used to serialize access to hold_count. Over
@ -964,6 +971,22 @@ COMMON ITEMS:
path, and has the highest probability of being successful. Please try
this approach before reporting problems with a major version upgrade.
ZFS notes
---------
When upgrading the boot ZFS pool to a new version, always follow
these two steps:
1.) recompile and reinstall the ZFS boot loader and boot block
(this is part of "make buildworld" and "make installworld")
2.) update the ZFS boot block on your boot drive
The following example updates the ZFS boot block on the first
partition (freebsd-boot) of a GPT partitioned drive ad0:
"gpart bootcode -p /boot/gptzfsboot -i 1 ad0"
Non-boot pools do not need these updates.
To build a kernel
-----------------
If you are updating from a prior version of FreeBSD (even one just

View File

@ -3,10 +3,13 @@
#ifndef _OPENSOLARIS_MNTTAB_H_
#define _OPENSOLARIS_MNTTAB_H_
#include <sys/param.h>
#include <sys/mount.h>
#include <stdio.h>
#include <paths.h>
#define MNTTAB _PATH_DEVNULL
#define MNTTAB _PATH_DEVZERO
#define MNT_LINE_MAX 1024
#define umount2(p, f) unmount(p, f)
@ -17,7 +20,12 @@ struct mnttab {
char *mnt_fstype;
char *mnt_mntopts;
};
#define extmnttab mnttab
int getmntany(FILE *fd, struct mnttab *mgetp, struct mnttab *mrefp);
int getmntent(FILE *fp, struct mnttab *mp);
char *hasmntopt(struct mnttab *mnt, char *opt);
void statfs2mnttab(struct statfs *sfs, struct mnttab *mp);
#endif /* !_OPENSOLARIS_MNTTAB_H_ */

View File

@ -36,6 +36,9 @@ __FBSDID("$FreeBSD$");
#include <sys/mount.h>
#include <sys/mntent.h>
#include <sys/mnttab.h>
#include <ctype.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
@ -88,75 +91,126 @@ optadd(char *mntopts, size_t size, const char *opt)
strlcat(mntopts, opt, size);
}
void
statfs2mnttab(struct statfs *sfs, struct mnttab *mp)
{
static char mntopts[MNTMAXSTR];
long flags;
mntopts[0] = '\0';
flags = sfs->f_flags;
#define OPTADD(opt) optadd(mntopts, sizeof(mntopts), (opt))
if (flags & MNT_RDONLY)
OPTADD(MNTOPT_RO);
else
OPTADD(MNTOPT_RW);
if (flags & MNT_NOSUID)
OPTADD(MNTOPT_NOSUID);
else
OPTADD(MNTOPT_SETUID);
if (flags & MNT_UPDATE)
OPTADD(MNTOPT_REMOUNT);
if (flags & MNT_NOATIME)
OPTADD(MNTOPT_NOATIME);
else
OPTADD(MNTOPT_ATIME);
OPTADD(MNTOPT_NOXATTR);
if (flags & MNT_NOEXEC)
OPTADD(MNTOPT_NOEXEC);
else
OPTADD(MNTOPT_EXEC);
#undef OPTADD
mp->mnt_special = sfs->f_mntfromname;
mp->mnt_mountp = sfs->f_mntonname;
mp->mnt_fstype = sfs->f_fstypename;
mp->mnt_mntopts = mntopts;
}
static struct statfs *gsfs = NULL;
static int allfs = 0;
static int
statfs_init(void)
{
struct statfs *sfs;
int error;
if (gsfs != NULL) {
free(gsfs);
gsfs = NULL;
}
allfs = getfsstat(NULL, 0, MNT_WAIT);
if (allfs == -1)
goto fail;
gsfs = malloc(sizeof(gsfs[0]) * allfs * 2);
if (gsfs == NULL)
goto fail;
allfs = getfsstat(gsfs, (long)(sizeof(gsfs[0]) * allfs * 2),
MNT_WAIT);
if (allfs == -1)
goto fail;
sfs = realloc(gsfs, allfs * sizeof(gsfs[0]));
if (sfs != NULL)
gsfs = sfs;
return (0);
fail:
error = errno;
if (gsfs != NULL)
free(gsfs);
gsfs = NULL;
allfs = 0;
return (error);
}
int
getmntany(FILE *fd __unused, struct mnttab *mgetp, struct mnttab *mrefp)
{
static struct statfs *sfs = NULL;
static char mntopts[MNTMAXSTR];
struct opt *o;
long i, n, flags;
struct statfs *sfs;
int i, error;
if (sfs != NULL) {
free(sfs);
sfs = NULL;
}
mntopts[0] = '\0';
error = statfs_init();
if (error != 0)
return (error);
n = getfsstat(NULL, 0, MNT_NOWAIT);
if (n == -1)
return (-1);
n = sizeof(*sfs) * (n + 8);
sfs = malloc(n);
if (sfs == NULL)
return (-1);
n = getfsstat(sfs, n, MNT_WAIT);
if (n == -1) {
free(sfs);
sfs = NULL;
return (-1);
}
for (i = 0; i < n; i++) {
for (i = 0; i < allfs; i++) {
if (mrefp->mnt_special != NULL &&
strcmp(mrefp->mnt_special, sfs[i].f_mntfromname) != 0) {
strcmp(mrefp->mnt_special, gsfs[i].f_mntfromname) != 0) {
continue;
}
if (mrefp->mnt_mountp != NULL &&
strcmp(mrefp->mnt_mountp, sfs[i].f_mntonname) != 0) {
strcmp(mrefp->mnt_mountp, gsfs[i].f_mntonname) != 0) {
continue;
}
if (mrefp->mnt_fstype != NULL &&
strcmp(mrefp->mnt_fstype, sfs[i].f_fstypename) != 0) {
strcmp(mrefp->mnt_fstype, gsfs[i].f_fstypename) != 0) {
continue;
}
flags = sfs[i].f_flags;
#define OPTADD(opt) optadd(mntopts, sizeof(mntopts), (opt))
if (flags & MNT_RDONLY)
OPTADD(MNTOPT_RO);
else
OPTADD(MNTOPT_RW);
if (flags & MNT_NOSUID)
OPTADD(MNTOPT_NOSUID);
else
OPTADD(MNTOPT_SETUID);
if (flags & MNT_UPDATE)
OPTADD(MNTOPT_REMOUNT);
if (flags & MNT_NOATIME)
OPTADD(MNTOPT_NOATIME);
else
OPTADD(MNTOPT_ATIME);
OPTADD(MNTOPT_NOXATTR);
if (flags & MNT_NOEXEC)
OPTADD(MNTOPT_NOEXEC);
else
OPTADD(MNTOPT_EXEC);
#undef OPTADD
mgetp->mnt_special = sfs[i].f_mntfromname;
mgetp->mnt_mountp = sfs[i].f_mntonname;
mgetp->mnt_fstype = sfs[i].f_fstypename;
mgetp->mnt_mntopts = mntopts;
statfs2mnttab(&gsfs[i], mgetp);
return (0);
}
free(sfs);
sfs = NULL;
return (-1);
}
int
getmntent(FILE *fp, struct mnttab *mp)
{
struct statfs *sfs;
int error, nfs;
nfs = (int)lseek(fileno(fp), 0, SEEK_CUR);
if (nfs == -1)
return (errno);
/* If nfs is 0, we want to refresh out cache. */
if (nfs == 0 || gsfs == NULL) {
error = statfs_init();
if (error != 0)
return (error);
}
if (nfs >= allfs)
return (-1);
statfs2mnttab(&gsfs[nfs], mp);
if (lseek(fileno(fp), 1, SEEK_CUR) == -1)
return (errno);
return (0);
}

View File

@ -0,0 +1,79 @@
#! /usr/bin/python2.4 -S
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
# Copyright 2009 Sun Microsystems, Inc. All rights reserved.
# Use is subject to license terms.
#
# Note, we want SIGINT (control-c) to exit the process quietly, to mimic
# the standard behavior of C programs. The best we can do with pure
# Python is to run with -S (to disable "import site"), and start our
# program with a "try" statement. Hopefully nobody hits ^C before our
# try statement is executed.
try:
import site
import gettext
import zfs.util
import zfs.ioctl
import sys
import errno
"""This is the main script for doing zfs subcommands. It doesn't know
what subcommands there are, it just looks for a module zfs.<subcommand>
that implements that subcommand."""
_ = gettext.translation("SUNW_OST_OSCMD", "/usr/lib/locale",
fallback=True).gettext
if len(sys.argv) < 2:
sys.exit(_("missing subcommand argument"))
zfs.ioctl.set_cmdstr(" ".join(["zfs"] + sys.argv[1:]))
try:
# import zfs.<subcommand>
# subfunc = zfs.<subcommand>.do_<subcommand>
subcmd = sys.argv[1]
__import__("zfs." + subcmd)
submod = getattr(zfs, subcmd)
subfunc = getattr(submod, "do_" + subcmd)
except (ImportError, AttributeError):
sys.exit(_("invalid subcommand"))
try:
subfunc()
except zfs.util.ZFSError, e:
print(e)
sys.exit(1)
except IOError, e:
import errno
import sys
if e.errno == errno.EPIPE:
sys.exit(1)
raise
except KeyboardInterrupt:
import sys
sys.exit(1)

View File

@ -1,23 +1,8 @@
'\" te
.\" CDDL HEADER START
.\"
.\" The contents of this file are subject to the terms of the
.\" Common Development and Distribution License (the "License").
.\" You may not use this file except in compliance with the License.
.\"
.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
.\" or http://www.opensolaris.org/os/licensing.
.\" See the License for the specific language governing permissions
.\" and limitations under the License.
.\"
.\" When distributing Covered Code, include this CDDL HEADER in each
.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
.\" If applicable, add the following below this CDDL HEADER, with the
.\" fields enclosed by brackets "[]" replaced with your own identifying
.\" information: Portions Copyright [yyyy] [name of copyright owner]
.\"
.\" CDDL HEADER END
.\" Copyright (c) 2004, Sun Microsystems, Inc. All Rights Reserved.
.\" The contents of this file are subject to the terms of the Common Development and Distribution License (the "License"). You may not use this file except in compliance with the License.
.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing. See the License for the specific language governing permissions and limitations under the License.
.\" When distributing Covered Code, include this CDDL HEADER in each file and include the License file at usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your own identifying information: Portions Copyright [yyyy] [name of copyright owner]
.TH zdb 1M "31 Oct 2005" "SunOS 5.11" "System Administration Commands"
.SH NAME
zdb \- ZFS debugger

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -87,8 +87,8 @@ static void
usage(void)
{
(void) fprintf(stderr,
"Usage: %s [-udibcsv] [-U cachefile_path] "
"[-S user:cksumalg] "
"Usage: %s [-udibcsvL] [-U cachefile_path] [-t txg]\n"
"\t [-S user:cksumalg] "
"dataset [object...]\n"
" %s -C [pool]\n"
" %s -l dev\n"
@ -102,12 +102,16 @@ usage(void)
(void) fprintf(stderr, " -C cached pool configuration\n");
(void) fprintf(stderr, " -i intent logs\n");
(void) fprintf(stderr, " -b block statistics\n");
(void) fprintf(stderr, " -c checksum all data blocks\n");
(void) fprintf(stderr, " -m metaslabs\n");
(void) fprintf(stderr, " -c checksum all metadata (twice for "
"all data) blocks\n");
(void) fprintf(stderr, " -s report stats on zdb's I/O\n");
(void) fprintf(stderr, " -S <user|all>:<cksum_alg|all> -- "
"dump blkptr signatures\n");
(void) fprintf(stderr, " -v verbose (applies to all others)\n");
(void) fprintf(stderr, " -l dump label contents\n");
(void) fprintf(stderr, " -L disable leak tracking (do not "
"load spacemaps)\n");
(void) fprintf(stderr, " -U cachefile_path -- use alternate "
"cachefile\n");
(void) fprintf(stderr, " -R read and display block from a "
@ -115,12 +119,19 @@ usage(void)
(void) fprintf(stderr, " -e Pool is exported/destroyed/"
"has altroot\n");
(void) fprintf(stderr, " -p <Path to vdev dir> (use with -e)\n");
(void) fprintf(stderr, " -t <txg> highest txg to use when "
"searching for uberblocks\n");
(void) fprintf(stderr, "Specify an option more than once (e.g. -bb) "
"to make only that option verbose\n");
(void) fprintf(stderr, "Default is to dump everything non-verbosely\n");
exit(1);
}
/*
* Called for usage errors that are discovered after a call to spa_open(),
* dmu_bonus_hold(), or pool_match(). abort() is called for other errors.
*/
static void
fatal(const char *fmt, ...)
{
@ -132,7 +143,7 @@ fatal(const char *fmt, ...)
va_end(ap);
(void) fprintf(stderr, "\n");
abort();
exit(1);
}
static void
@ -205,7 +216,7 @@ dump_packed_nvlist(objset_t *os, uint64_t object, void *data, size_t size)
size_t nvsize = *(uint64_t *)data;
char *packed = umem_alloc(nvsize, UMEM_NOFAIL);
VERIFY(0 == dmu_read(os, object, 0, nvsize, packed));
VERIFY(0 == dmu_read(os, object, 0, nvsize, packed, DMU_READ_PREFETCH));
VERIFY(nvlist_unpack(packed, nvsize, &nv, 0) == 0);
@ -431,7 +442,7 @@ dump_spacemap(objset_t *os, space_map_obj_t *smo, space_map_t *sm)
alloc = 0;
for (offset = 0; offset < smo->smo_objsize; offset += sizeof (entry)) {
VERIFY(0 == dmu_read(os, smo->smo_object, offset,
sizeof (entry), &entry));
sizeof (entry), &entry, DMU_READ_PREFETCH));
if (SM_DEBUG_DECODE(entry)) {
(void) printf("\t\t[%4llu] %s: txg %llu, pass %llu\n",
(u_longlong_t)(offset / sizeof (entry)),
@ -462,6 +473,21 @@ dump_spacemap(objset_t *os, space_map_obj_t *smo, space_map_t *sm)
}
}
static void
dump_metaslab_stats(metaslab_t *msp)
{
char maxbuf[5];
space_map_t *sm = &msp->ms_map;
avl_tree_t *t = sm->sm_pp_root;
int free_pct = sm->sm_space * 100 / sm->sm_size;
nicenum(space_map_maxsize(sm), maxbuf);
(void) printf("\t %20s %10lu %7s %6s %4s %4d%%\n",
"segments", avl_numnodes(t), "maxsize", maxbuf,
"freepct", free_pct);
}
static void
dump_metaslab(metaslab_t *msp)
{
@ -472,22 +498,28 @@ dump_metaslab(metaslab_t *msp)
nicenum(msp->ms_map.sm_size - smo->smo_alloc, freebuf);
if (dump_opt['d'] <= 5) {
(void) printf("\t%10llx %10llu %5s\n",
(u_longlong_t)msp->ms_map.sm_start,
(u_longlong_t)smo->smo_object,
freebuf);
return;
}
(void) printf(
"\tvdev %llu offset %08llx spacemap %4llu free %5s\n",
"\tvdev %5llu offset %12llx spacemap %6llu free %5s\n",
(u_longlong_t)vd->vdev_id, (u_longlong_t)msp->ms_map.sm_start,
(u_longlong_t)smo->smo_object, freebuf);
ASSERT(msp->ms_map.sm_size == (1ULL << vd->vdev_ms_shift));
if (dump_opt['m'] > 1) {
mutex_enter(&msp->ms_lock);
VERIFY(space_map_load(&msp->ms_map, zfs_metaslab_ops,
SM_FREE, &msp->ms_smo, spa->spa_meta_objset) == 0);
dump_metaslab_stats(msp);
space_map_unload(&msp->ms_map);
mutex_exit(&msp->ms_lock);
}
if (dump_opt['d'] > 5 || dump_opt['m'] > 2) {
ASSERT(msp->ms_map.sm_size == (1ULL << vd->vdev_ms_shift));
mutex_enter(&msp->ms_lock);
dump_spacemap(spa->spa_meta_objset, smo, &msp->ms_map);
mutex_exit(&msp->ms_lock);
}
dump_spacemap(spa->spa_meta_objset, smo, &msp->ms_map);
}
static void
@ -502,59 +534,65 @@ dump_metaslabs(spa_t *spa)
for (c = 0; c < rvd->vdev_children; c++) {
vd = rvd->vdev_child[c];
(void) printf("\n vdev %llu\n\n", (u_longlong_t)vd->vdev_id);
(void) printf("\t%-10s %-19s %-15s %-10s\n",
"vdev", "offset", "spacemap", "free");
(void) printf("\t%10s %19s %15s %10s\n",
"----------", "-------------------",
"---------------", "-------------");
if (dump_opt['d'] <= 5) {
(void) printf("\t%10s %10s %5s\n",
"offset", "spacemap", "free");
(void) printf("\t%10s %10s %5s\n",
"------", "--------", "----");
}
for (m = 0; m < vd->vdev_ms_count; m++)
dump_metaslab(vd->vdev_ms[m]);
(void) printf("\n");
}
}
static void
dump_dtl_seg(space_map_t *sm, uint64_t start, uint64_t size)
{
char *prefix = (void *)sm;
(void) printf("%s [%llu,%llu) length %llu\n",
prefix,
(u_longlong_t)start,
(u_longlong_t)(start + size),
(u_longlong_t)(size));
}
static void
dump_dtl(vdev_t *vd, int indent)
{
avl_tree_t *t = &vd->vdev_dtl_map.sm_root;
space_seg_t *ss;
vdev_t *pvd;
int c;
spa_t *spa = vd->vdev_spa;
boolean_t required;
char *name[DTL_TYPES] = { "missing", "partial", "scrub", "outage" };
char prefix[256];
spa_vdev_state_enter(spa);
required = vdev_dtl_required(vd);
(void) spa_vdev_state_exit(spa, NULL, 0);
if (indent == 0)
(void) printf("\nDirty time logs:\n\n");
(void) printf("\t%*s%s\n", indent, "",
(void) printf("\t%*s%s [%s]\n", indent, "",
vd->vdev_path ? vd->vdev_path :
vd->vdev_parent ? vd->vdev_ops->vdev_op_type :
spa_name(vd->vdev_spa));
vd->vdev_parent ? vd->vdev_ops->vdev_op_type : spa_name(spa),
required ? "DTL-required" : "DTL-expendable");
for (ss = avl_first(t); ss; ss = AVL_NEXT(t, ss)) {
/*
* Everything in this DTL must appear in all parent DTL unions.
*/
for (pvd = vd; pvd; pvd = pvd->vdev_parent)
ASSERT(vdev_dtl_contains(&pvd->vdev_dtl_map,
ss->ss_start, ss->ss_end - ss->ss_start));
(void) printf("\t%*soutage [%llu,%llu] length %llu\n",
indent, "",
(u_longlong_t)ss->ss_start,
(u_longlong_t)ss->ss_end - 1,
(u_longlong_t)(ss->ss_end - ss->ss_start));
for (int t = 0; t < DTL_TYPES; t++) {
space_map_t *sm = &vd->vdev_dtl[t];
if (sm->sm_space == 0)
continue;
(void) snprintf(prefix, sizeof (prefix), "\t%*s%s",
indent + 2, "", name[t]);
mutex_enter(sm->sm_lock);
space_map_walk(sm, dump_dtl_seg, (void *)prefix);
mutex_exit(sm->sm_lock);
if (dump_opt['d'] > 5 && vd->vdev_children == 0)
dump_spacemap(spa->spa_meta_objset,
&vd->vdev_dtl_smo, sm);
}
(void) printf("\n");
if (dump_opt['d'] > 5 && vd->vdev_children == 0) {
dump_spacemap(vd->vdev_spa->spa_meta_objset, &vd->vdev_dtl,
&vd->vdev_dtl_map);
(void) printf("\n");
}
for (c = 0; c < vd->vdev_children; c++)
for (int c = 0; c < vd->vdev_children; c++)
dump_dtl(vd->vdev_child[c], indent + 4);
}
@ -668,7 +706,8 @@ visit_indirect(spa_t *spa, const dnode_phys_t *dnp,
break;
fill += cbp->blk_fill;
}
ASSERT3U(fill, ==, bp->blk_fill);
if (!err)
ASSERT3U(fill, ==, bp->blk_fill);
(void) arc_buf_remove_ref(buf, &buf);
}
@ -904,6 +943,7 @@ dump_uidgid(objset_t *os, znode_phys_t *zp)
/* first find the fuid object. It lives in the master node */
VERIFY(zap_lookup(os, MASTER_NODE_OBJ, ZFS_FUID_TABLES,
8, 1, &fuid_obj) == 0);
zfs_fuid_avl_tree_create(&idx_tree, &domain_tree);
(void) zfs_fuid_table_load(os, fuid_obj,
&idx_tree, &domain_tree);
fuid_table_loaded = B_TRUE;
@ -1007,6 +1047,8 @@ static object_viewer_t *object_viewer[DMU_OT_NUMTYPES] = {
dump_packed_nvlist, /* FUID nvlist size */
dump_zap, /* DSL dataset next clones */
dump_zap, /* DSL scrub queue */
dump_zap, /* ZFS user/group used */
dump_zap, /* ZFS user/group quota */
};
static void
@ -1070,6 +1112,14 @@ dump_object(objset_t *os, uint64_t object, int verbosity, int *print_header)
}
if (verbosity >= 4) {
(void) printf("\tdnode flags: %s%s\n",
(dn->dn_phys->dn_flags & DNODE_FLAG_USED_BYTES) ?
"USED_BYTES " : "",
(dn->dn_phys->dn_flags & DNODE_FLAG_USERUSED_ACCOUNTED) ?
"USERUSED_ACCOUNTED " : "");
(void) printf("\tdnode maxblkid: %llu\n",
(longlong_t)dn->dn_phys->dn_maxblkid);
object_viewer[doi.doi_bonus_type](os, object, bonus, bsize);
object_viewer[doi.doi_type](os, object, NULL, 0);
*print_header = 1;
@ -1124,7 +1174,7 @@ dump_dir(objset_t *os)
uint64_t object, object_count;
uint64_t refdbytes, usedobjs, scratch;
char numbuf[8];
char blkbuf[BP_SPRINTF_LEN];
char blkbuf[BP_SPRINTF_LEN + 20];
char osname[MAXNAMELEN];
char *type = "UNKNOWN";
int verbosity = dump_opt['d'];
@ -1150,8 +1200,8 @@ dump_dir(objset_t *os)
nicenum(refdbytes, numbuf);
if (verbosity >= 4) {
(void) strcpy(blkbuf, ", rootbp ");
sprintf_blkptr(blkbuf + strlen(blkbuf),
(void) sprintf(blkbuf + strlen(blkbuf), ", rootbp ");
(void) sprintf_blkptr(blkbuf + strlen(blkbuf),
BP_SPRINTF_LEN - strlen(blkbuf), os->os->os_rootbp);
} else {
blkbuf[0] = '\0';
@ -1186,7 +1236,12 @@ dump_dir(objset_t *os)
}
dump_object(os, 0, verbosity, &print_header);
object_count = 1;
object_count = 0;
if (os->os->os_userused_dnode &&
os->os->os_userused_dnode->dn_type != 0) {
dump_object(os, DMU_USERUSED_OBJECT, verbosity, &print_header);
dump_object(os, DMU_GROUPUSED_OBJECT, verbosity, &print_header);
}
object = 0;
while ((error = dmu_object_next(os, &object, B_FALSE, 0)) == 0) {
@ -1198,8 +1253,10 @@ dump_dir(objset_t *os)
(void) printf("\n");
if (error != ESRCH)
fatal("dmu_object_next() = %d", error);
if (error != ESRCH) {
(void) fprintf(stderr, "dmu_object_next() = %d\n", error);
abort();
}
}
static void
@ -1390,7 +1447,8 @@ static space_map_ops_t zdb_space_map_ops = {
zdb_space_map_unload,
NULL, /* alloc */
zdb_space_map_claim,
NULL /* free */
NULL, /* free */
NULL /* maxsize */
};
static void
@ -1489,8 +1547,9 @@ zdb_count_block(spa_t *spa, zdb_cb_t *zcb, blkptr_t *bp, dmu_object_type_t type)
}
}
VERIFY(zio_wait(zio_claim(NULL, spa, spa_first_txg(spa), bp,
NULL, NULL, ZIO_FLAG_MUSTSUCCEED)) == 0);
if (!dump_opt['L'])
VERIFY(zio_wait(zio_claim(NULL, spa, spa_first_txg(spa), bp,
NULL, NULL, ZIO_FLAG_MUSTSUCCEED)) == 0);
}
static int
@ -1499,13 +1558,25 @@ zdb_blkptr_cb(spa_t *spa, blkptr_t *bp, const zbookmark_t *zb,
{
zdb_cb_t *zcb = arg;
char blkbuf[BP_SPRINTF_LEN];
dmu_object_type_t type;
boolean_t is_l0_metadata;
if (bp == NULL)
return (0);
zdb_count_block(spa, zcb, bp, BP_GET_TYPE(bp));
type = BP_GET_TYPE(bp);
if (dump_opt['c'] || dump_opt['S']) {
zdb_count_block(spa, zcb, bp, type);
/*
* if we do metadata-only checksumming there's no need to checksum
* indirect blocks here because it is done during traverse
*/
is_l0_metadata = (BP_GET_LEVEL(bp) == 0 && type < DMU_OT_NUMTYPES &&
dmu_ot[type].ot_metadata);
if (dump_opt['c'] > 1 || dump_opt['S'] ||
(dump_opt['c'] && is_l0_metadata)) {
int ioerr, size;
void *data;
@ -1517,7 +1588,7 @@ zdb_blkptr_cb(spa_t *spa, blkptr_t *bp, const zbookmark_t *zb,
free(data);
/* We expect io errors on intent log */
if (ioerr && BP_GET_TYPE(bp) != DMU_OT_INTENT_LOG) {
if (ioerr && type != DMU_OT_INTENT_LOG) {
zcb->zcb_haderrors = 1;
zcb->zcb_errors[ioerr]++;
@ -1565,9 +1636,12 @@ dump_block_stats(spa_t *spa)
int c, e;
if (!dump_opt['S']) {
(void) printf("\nTraversing all blocks to %sverify"
" nothing leaked ...\n",
dump_opt['c'] ? "verify checksums and " : "");
(void) printf("\nTraversing all blocks %s%s%s%s%s...\n",
(dump_opt['c'] || !dump_opt['L']) ? "to verify " : "",
(dump_opt['c'] == 1) ? "metadata " : "",
dump_opt['c'] ? "checksums " : "",
(dump_opt['c'] && !dump_opt['L']) ? "and verify " : "",
!dump_opt['L'] ? "nothing leaked " : "");
}
/*
@ -1578,7 +1652,8 @@ dump_block_stats(spa_t *spa)
* it's not part of any space map) is a double allocation,
* reference to a freed block, or an unclaimed log block.
*/
zdb_leak_init(spa);
if (!dump_opt['L'])
zdb_leak_init(spa);
/*
* If there's a deferred-free bplist, process that first.
@ -1620,7 +1695,8 @@ dump_block_stats(spa_t *spa)
/*
* Report any leaked segments.
*/
zdb_leak_fini(spa);
if (!dump_opt['L'])
zdb_leak_fini(spa);
/*
* If we're interested in printing out the blkptr signatures,
@ -1646,14 +1722,16 @@ dump_block_stats(spa_t *spa)
tzb = &zcb.zcb_type[ZB_TOTAL][DMU_OT_TOTAL];
if (tzb->zb_asize == alloc + logalloc) {
(void) printf("\n\tNo leaks (block sum matches space"
" maps exactly)\n");
if (!dump_opt['L'])
(void) printf("\n\tNo leaks (block sum matches space"
" maps exactly)\n");
} else {
(void) printf("block traversal size %llu != alloc %llu "
"(leaked %lld)\n",
"(%s %lld)\n",
(u_longlong_t)tzb->zb_asize,
(u_longlong_t)alloc + logalloc,
(u_longlong_t)(alloc + logalloc - tzb->zb_asize));
(dump_opt['L']) ? "unreachable" : "leaked",
(longlong_t)(alloc + logalloc - tzb->zb_asize));
leaks = 1;
}
@ -1760,14 +1838,17 @@ dump_zpool(spa_t *spa)
if (dump_opt['u'])
dump_uberblock(&spa->spa_uberblock);
if (dump_opt['d'] || dump_opt['i']) {
if (dump_opt['d'] || dump_opt['i'] || dump_opt['m']) {
dump_dir(dp->dp_meta_objset);
if (dump_opt['d'] >= 3) {
dump_bplist(dp->dp_meta_objset,
spa->spa_sync_bplist_obj, "Deferred frees");
dump_dtl(spa->spa_root_vdev, 0);
dump_metaslabs(spa);
}
if (dump_opt['d'] >= 3 || dump_opt['m'])
dump_metaslabs(spa);
(void) dmu_objset_find(spa_name(spa), dump_one_dir, NULL,
DS_FIND_SNAPSHOTS | DS_FIND_CHILDREN);
}
@ -2243,13 +2324,14 @@ main(int argc, char **argv)
dprintf_setup(&argc, argv);
while ((c = getopt(argc, argv, "udibcsvCS:U:lRep:")) != -1) {
while ((c = getopt(argc, argv, "udibcmsvCLS:U:lRep:t:")) != -1) {
switch (c) {
case 'u':
case 'd':
case 'i':
case 'b':
case 'c':
case 'm':
case 's':
case 'C':
case 'l':
@ -2257,6 +2339,9 @@ main(int argc, char **argv)
dump_opt[c]++;
dump_all = 0;
break;
case 'L':
dump_opt[c]++;
break;
case 'v':
verbose++;
break;
@ -2287,6 +2372,14 @@ main(int argc, char **argv)
else
usage();
break;
case 't':
ub_max_txg = strtoull(optarg, NULL, 0);
if (ub_max_txg < TXG_INITIAL) {
(void) fprintf(stderr, "incorrect txg "
"specified: %s\n", optarg);
usage();
}
break;
default:
usage();
break;
@ -2374,7 +2467,7 @@ main(int argc, char **argv)
}
if (error == 0)
error = spa_import_faulted(argv[0],
error = spa_import_verbatim(argv[0],
exported_conf, nvl);
nvlist_free(nvl);

View File

@ -115,7 +115,7 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, lr_write_t *lr)
(u_longlong_t)lr->lr_foid, (longlong_t)lr->lr_offset,
(u_longlong_t)lr->lr_length, (u_longlong_t)lr->lr_blkoff);
if (verbose < 5)
if (txtype == TX_WRITE2 || verbose < 5)
return;
if (lr->lr_common.lrc_reclen == sizeof (lr_write_t)) {
@ -123,18 +123,19 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, lr_write_t *lr)
bp->blk_birth >= spa_first_txg(zilog->zl_spa) ?
"will claim" : "won't claim");
print_log_bp(bp, "\t\t\t");
if (BP_IS_HOLE(bp)) {
(void) printf("\t\t\tLSIZE 0x%llx\n",
(u_longlong_t)BP_GET_LSIZE(bp));
}
if (bp->blk_birth == 0) {
bzero(buf, sizeof (buf));
} else {
zbookmark_t zb;
ASSERT3U(bp->blk_cksum.zc_word[ZIL_ZC_OBJSET], ==,
dmu_objset_id(zilog->zl_os));
zb.zb_objset = bp->blk_cksum.zc_word[ZIL_ZC_OBJSET];
zb.zb_object = 0;
zb.zb_level = -1;
zb.zb_blkid = bp->blk_cksum.zc_word[ZIL_ZC_SEQ];
zb.zb_objset = dmu_objset_id(zilog->zl_os);
zb.zb_object = lr->lr_foid;
zb.zb_level = 0;
zb.zb_blkid = -1; /* unknown */
error = zio_wait(zio_read(NULL, zilog->zl_spa,
bp, buf, BP_GET_LSIZE(bp), NULL, NULL,
@ -251,6 +252,7 @@ static zil_rec_info_t zil_rec_info[TX_MAX_TYPE] = {
{ zil_prt_rec_create, "TX_MKDIR_ACL " },
{ zil_prt_rec_create, "TX_MKDIR_ATTR " },
{ zil_prt_rec_create, "TX_MKDIR_ACL_ATTR " },
{ zil_prt_rec_write, "TX_WRITE2 " },
};
/* ARGSUSED */

File diff suppressed because it is too large Load Diff

View File

@ -39,11 +39,13 @@
#include <unistd.h>
#include <fcntl.h>
#include <zone.h>
#include <grp.h>
#include <pwd.h>
#include <sys/mntent.h>
#include <sys/mnttab.h>
#include <sys/mount.h>
#include <sys/stat.h>
#include <sys/avl.h>
#include <sys/fs/zfs.h>
#include <libzfs.h>
#include <libuutil.h>
@ -55,6 +57,7 @@ libzfs_handle_t *g_zfs;
static FILE *mnttab_file;
static char history_str[HIS_MAX_RECORD_LEN];
const char *pypath = "/usr/lib/zfs/pyzfs.py";
static int zfs_do_clone(int argc, char **argv);
static int zfs_do_create(int argc, char **argv);
@ -74,8 +77,8 @@ static int zfs_do_unshare(int argc, char **argv);
static int zfs_do_send(int argc, char **argv);
static int zfs_do_receive(int argc, char **argv);
static int zfs_do_promote(int argc, char **argv);
static int zfs_do_allow(int argc, char **argv);
static int zfs_do_unallow(int argc, char **argv);
static int zfs_do_userspace(int argc, char **argv);
static int zfs_do_python(int argc, char **argv);
static int zfs_do_jail(int argc, char **argv);
static int zfs_do_unjail(int argc, char **argv);
@ -119,7 +122,9 @@ typedef enum {
HELP_UNMOUNT,
HELP_UNSHARE,
HELP_ALLOW,
HELP_UNALLOW
HELP_UNALLOW,
HELP_USERSPACE,
HELP_GROUPSPACE
} zfs_help_t;
typedef struct zfs_command {
@ -153,6 +158,8 @@ static zfs_command_t command_table[] = {
{ "get", zfs_do_get, HELP_GET },
{ "inherit", zfs_do_inherit, HELP_INHERIT },
{ "upgrade", zfs_do_upgrade, HELP_UPGRADE },
{ "userspace", zfs_do_userspace, HELP_USERSPACE },
{ "groupspace", zfs_do_userspace, HELP_GROUPSPACE },
{ NULL },
{ "mount", zfs_do_mount, HELP_MOUNT },
{ "unmount", zfs_do_unmount, HELP_UNMOUNT },
@ -162,9 +169,9 @@ static zfs_command_t command_table[] = {
{ "send", zfs_do_send, HELP_SEND },
{ "receive", zfs_do_receive, HELP_RECEIVE },
{ NULL },
{ "allow", zfs_do_allow, HELP_ALLOW },
{ "allow", zfs_do_python, HELP_ALLOW },
{ NULL },
{ "unallow", zfs_do_unallow, HELP_UNALLOW },
{ "unallow", zfs_do_python, HELP_UNALLOW },
{ NULL },
{ "jail", zfs_do_jail, HELP_JAIL },
{ "unjail", zfs_do_unjail, HELP_UNJAIL },
@ -260,6 +267,14 @@ get_usage(zfs_help_t idx)
"<filesystem|volume>\n"
"\tunallow [-r] -s @setname [<perm|@setname>[,...]] "
"<filesystem|volume>\n"));
case HELP_USERSPACE:
return (gettext("\tuserspace [-hniHp] [-o field[,...]] "
"[-sS field] ... [-t type[,...]]\n"
"\t <filesystem|snapshot>\n"));
case HELP_GROUPSPACE:
return (gettext("\tgroupspace [-hniHpU] [-o field[,...]] "
"[-sS field] ... [-t type[,...]]\n"
"\t <filesystem|snapshot>\n"));
}
abort();
@ -321,7 +336,6 @@ usage(boolean_t requested)
{
int i;
boolean_t show_properties = B_FALSE;
boolean_t show_permissions = B_FALSE;
FILE *fp = requested ? stdout : stderr;
if (current_command == NULL) {
@ -352,13 +366,7 @@ usage(boolean_t requested)
strcmp(current_command->name, "list") == 0))
show_properties = B_TRUE;
if (current_command != NULL &&
(strcmp(current_command->name, "allow") == 0 ||
strcmp(current_command->name, "unallow") == 0))
show_permissions = B_TRUE;
if (show_properties) {
(void) fprintf(fp,
gettext("\nThe following properties are supported:\n"));
@ -369,29 +377,33 @@ usage(boolean_t requested)
(void) zprop_iter(usage_prop_cb, fp, B_FALSE, B_TRUE,
ZFS_TYPE_DATASET);
(void) fprintf(fp, "\t%-15s ", "userused@...");
(void) fprintf(fp, " NO NO <size>\n");
(void) fprintf(fp, "\t%-15s ", "groupused@...");
(void) fprintf(fp, " NO NO <size>\n");
(void) fprintf(fp, "\t%-15s ", "userquota@...");
(void) fprintf(fp, "YES NO <size> | none\n");
(void) fprintf(fp, "\t%-15s ", "groupquota@...");
(void) fprintf(fp, "YES NO <size> | none\n");
(void) fprintf(fp, gettext("\nSizes are specified in bytes "
"with standard units such as K, M, G, etc.\n"));
(void) fprintf(fp, gettext("\nUser-defined properties can "
"be specified by using a name containing a colon (:).\n"));
} else if (show_permissions) {
(void) fprintf(fp,
gettext("\nThe following permissions are supported:\n"));
zfs_deleg_permissions();
(void) fprintf(fp, gettext("\nThe {user|group}{used|quota}@ "
"properties must be appended with\n"
"a user or group specifier of one of these forms:\n"
" POSIX name (eg: \"matt\")\n"
" POSIX id (eg: \"126829\")\n"
" SMB name@domain (eg: \"matt@sun\")\n"
" SMB SID (eg: \"S-1-234-567-89\")\n"));
} else {
/*
* TRANSLATION NOTE:
* "zfs set|get" must not be localised this is the
* command name and arguments.
*/
(void) fprintf(fp,
gettext("\nFor the property list, run: zfs set|get\n"));
gettext("\nFor the property list, run: %s\n"),
"zfs set|get");
(void) fprintf(fp,
gettext("\nFor the delegated permission list, run:"
" zfs allow|unallow\n"));
gettext("\nFor the delegated permission list, run: %s\n"),
"zfs allow|unallow");
}
/*
@ -429,7 +441,6 @@ parseprop(nvlist_t *props)
return (-1);
}
return (0);
}
static int
@ -1101,6 +1112,17 @@ get_callback(zfs_handle_t *zhp, void *data)
zprop_print_one_property(zfs_get_name(zhp), cbp,
zfs_prop_to_name(pl->pl_prop),
buf, sourcetype, source);
} else if (zfs_prop_userquota(pl->pl_user_prop)) {
sourcetype = ZPROP_SRC_LOCAL;
if (zfs_prop_get_userquota(zhp, pl->pl_user_prop,
buf, sizeof (buf), cbp->cb_literal) != 0) {
sourcetype = ZPROP_SRC_NONE;
(void) strlcpy(buf, "-", sizeof (buf));
}
zprop_print_one_property(zfs_get_name(zhp), cbp,
pl->pl_user_prop, buf, sourcetype, source);
} else {
if (nvlist_lookup_nvlist(userprop,
pl->pl_user_prop, &propval) != 0) {
@ -1477,21 +1499,30 @@ upgrade_set_callback(zfs_handle_t *zhp, void *data)
{
upgrade_cbdata_t *cb = data;
int version = zfs_prop_get_int(zhp, ZFS_PROP_VERSION);
int i;
static struct { int zplver; int spaver; } table[] = {
{ZPL_VERSION_FUID, SPA_VERSION_FUID},
{ZPL_VERSION_USERSPACE, SPA_VERSION_USERSPACE},
{0, 0}
};
if (cb->cb_version >= ZPL_VERSION_FUID) {
int spa_version;
if (zfs_spa_version(zhp, &spa_version) < 0)
return (-1);
for (i = 0; table[i].zplver; i++) {
if (cb->cb_version >= table[i].zplver) {
int spa_version;
if (spa_version < SPA_VERSION_FUID) {
/* can't upgrade */
(void) printf(gettext("%s: can not be upgraded; "
"the pool version needs to first be upgraded\nto "
"version %d\n\n"),
zfs_get_name(zhp), SPA_VERSION_FUID);
cb->cb_numfailed++;
return (0);
if (zfs_spa_version(zhp, &spa_version) < 0)
return (-1);
if (spa_version < table[i].spaver) {
/* can't upgrade */
(void) printf(gettext("%s: can not be "
"upgraded; the pool version needs to first "
"be upgraded\nto version %d\n\n"),
zfs_get_name(zhp), table[i].spaver);
cb->cb_numfailed++;
return (0);
}
}
}
@ -1592,6 +1623,8 @@ zfs_do_upgrade(int argc, char **argv)
(void) printf(gettext(" 2 Enhanced directory entries\n"));
(void) printf(gettext(" 3 Case insensitive and File system "
"unique identifer (FUID)\n"));
(void) printf(gettext(" 4 userquota, groupquota "
"properties\n"));
(void) printf(gettext("\nFor more information on a particular "
"version, including supported releases, see:\n\n"));
(void) printf("http://www.opensolaris.org/os/community/zfs/"
@ -1639,6 +1672,84 @@ zfs_do_upgrade(int argc, char **argv)
return (ret);
}
/*
* zfs userspace
*/
static int
userspace_cb(void *arg, const char *domain, uid_t rid, uint64_t space)
{
zfs_userquota_prop_t *typep = arg;
zfs_userquota_prop_t p = *typep;
char *name = NULL;
char *ug, *propname;
char namebuf[32];
char sizebuf[32];
if (domain == NULL || domain[0] == '\0') {
if (p == ZFS_PROP_GROUPUSED || p == ZFS_PROP_GROUPQUOTA) {
struct group *g = getgrgid(rid);
if (g)
name = g->gr_name;
} else {
struct passwd *p = getpwuid(rid);
if (p)
name = p->pw_name;
}
}
if (p == ZFS_PROP_GROUPUSED || p == ZFS_PROP_GROUPQUOTA)
ug = "group";
else
ug = "user";
if (p == ZFS_PROP_USERUSED || p == ZFS_PROP_GROUPUSED)
propname = "used";
else
propname = "quota";
if (name == NULL) {
(void) snprintf(namebuf, sizeof (namebuf),
"%llu", (longlong_t)rid);
name = namebuf;
}
zfs_nicenum(space, sizebuf, sizeof (sizebuf));
(void) printf("%s %s %s%c%s %s\n", propname, ug, domain,
domain[0] ? '-' : ' ', name, sizebuf);
return (0);
}
static int
zfs_do_userspace(int argc, char **argv)
{
zfs_handle_t *zhp;
zfs_userquota_prop_t p;
int error;
/*
* Try the python version. If the execv fails, we'll continue
* and do a simplistic implementation.
*/
(void) execv(pypath, argv-1);
(void) printf("internal error: %s not found\n"
"falling back on built-in implementation, "
"some features will not work\n", pypath);
if ((zhp = zfs_open(g_zfs, argv[argc-1], ZFS_TYPE_DATASET)) == NULL)
return (1);
(void) printf("PROP TYPE NAME VALUE\n");
for (p = 0; p < ZFS_NUM_USERQUOTA_PROPS; p++) {
error = zfs_userspace(zhp, p, userspace_cb, &p);
if (error)
break;
}
return (error);
}
/*
* list [-r][-d max] [-H] [-o property[,property]...] [-t type[,type]...]
* [-s property [-s property]...] [-S property [-S property]...]
@ -1728,7 +1839,6 @@ print_dataset(zfs_handle_t *zhp, zprop_list_t *pl, boolean_t scripted)
first = B_FALSE;
}
right_justify = B_FALSE;
if (pl->pl_prop != ZPROP_INVAL) {
if (zfs_prop_get(zhp, pl->pl_prop, property,
sizeof (property), NULL, NULL, 0, B_FALSE) != 0)
@ -1737,6 +1847,13 @@ print_dataset(zfs_handle_t *zhp, zprop_list_t *pl, boolean_t scripted)
propstr = property;
right_justify = zfs_prop_align_right(pl->pl_prop);
} else if (zfs_prop_userquota(pl->pl_user_prop)) {
if (zfs_prop_get_userquota(zhp, pl->pl_user_prop,
property, sizeof (property), B_FALSE) != 0)
propstr = "-";
else
propstr = property;
right_justify = B_TRUE;
} else {
if (nvlist_lookup_nvlist(userprops,
pl->pl_user_prop, &propval) != 0)
@ -1744,6 +1861,7 @@ print_dataset(zfs_handle_t *zhp, zprop_list_t *pl, boolean_t scripted)
else
verify(nvlist_lookup_string(propval,
ZPROP_VALUE, &propstr) == 0);
right_justify = B_FALSE;
}
width = pl->pl_width;
@ -2281,7 +2399,7 @@ zfs_do_set(int argc, char **argv)
usage(B_FALSE);
}
ret = zfs_for_each(argc - 2, argv + 2, NULL,
ret = zfs_for_each(argc - 2, argv + 2, 0,
ZFS_TYPE_DATASET, NULL, NULL, 0, set_callback, &cb);
return (ret);
@ -2542,388 +2660,6 @@ zfs_do_receive(int argc, char **argv)
return (err != 0);
}
typedef struct allow_cb {
int a_permcnt;
size_t a_treeoffset;
} allow_cb_t;
static void
zfs_print_perms(avl_tree_t *tree)
{
zfs_perm_node_t *permnode;
permnode = avl_first(tree);
while (permnode != NULL) {
(void) printf("%s", permnode->z_pname);
permnode = AVL_NEXT(tree, permnode);
if (permnode)
(void) printf(",");
else
(void) printf("\n");
}
}
/*
* Iterate over user/groups/everyone/... and the call perm_iter
* function to print actual permission when tree has >0 nodes.
*/
static void
zfs_iter_perms(avl_tree_t *tree, const char *banner, allow_cb_t *cb)
{
zfs_allow_node_t *item;
avl_tree_t *ptree;
item = avl_first(tree);
while (item) {
ptree = (void *)((char *)item + cb->a_treeoffset);
if (avl_numnodes(ptree)) {
if (cb->a_permcnt++ == 0)
(void) printf("%s\n", banner);
(void) printf("\t%s", item->z_key);
/*
* Avoid an extra space being printed
* for "everyone" which is keyed with a null
* string
*/
if (item->z_key[0] != '\0')
(void) printf(" ");
zfs_print_perms(ptree);
}
item = AVL_NEXT(tree, item);
}
}
#define LINES "-------------------------------------------------------------\n"
static int
zfs_print_allows(char *ds)
{
zfs_allow_t *curperms, *perms;
zfs_handle_t *zhp;
allow_cb_t allowcb = { 0 };
char banner[MAXPATHLEN];
if (ds[0] == '-')
usage(B_FALSE);
if (strrchr(ds, '@')) {
(void) fprintf(stderr, gettext("Snapshots don't have 'allow'"
" permissions\n"));
return (1);
}
if ((zhp = zfs_open(g_zfs, ds, ZFS_TYPE_DATASET)) == NULL)
return (1);
if (zfs_perm_get(zhp, &perms)) {
(void) fprintf(stderr,
gettext("Failed to retrieve 'allows' on %s\n"), ds);
zfs_close(zhp);
return (1);
}
zfs_close(zhp);
if (perms != NULL)
(void) printf("%s", LINES);
for (curperms = perms; curperms; curperms = curperms->z_next) {
(void) snprintf(banner, sizeof (banner),
"Permission sets on (%s)", curperms->z_setpoint);
allowcb.a_treeoffset =
offsetof(zfs_allow_node_t, z_localdescend);
allowcb.a_permcnt = 0;
zfs_iter_perms(&curperms->z_sets, banner, &allowcb);
(void) snprintf(banner, sizeof (banner),
"Create time permissions on (%s)", curperms->z_setpoint);
allowcb.a_treeoffset =
offsetof(zfs_allow_node_t, z_localdescend);
allowcb.a_permcnt = 0;
zfs_iter_perms(&curperms->z_crperms, banner, &allowcb);
(void) snprintf(banner, sizeof (banner),
"Local permissions on (%s)", curperms->z_setpoint);
allowcb.a_treeoffset = offsetof(zfs_allow_node_t, z_local);
allowcb.a_permcnt = 0;
zfs_iter_perms(&curperms->z_user, banner, &allowcb);
zfs_iter_perms(&curperms->z_group, banner, &allowcb);
zfs_iter_perms(&curperms->z_everyone, banner, &allowcb);
(void) snprintf(banner, sizeof (banner),
"Descendent permissions on (%s)", curperms->z_setpoint);
allowcb.a_treeoffset = offsetof(zfs_allow_node_t, z_descend);
allowcb.a_permcnt = 0;
zfs_iter_perms(&curperms->z_user, banner, &allowcb);
zfs_iter_perms(&curperms->z_group, banner, &allowcb);
zfs_iter_perms(&curperms->z_everyone, banner, &allowcb);
(void) snprintf(banner, sizeof (banner),
"Local+Descendent permissions on (%s)",
curperms->z_setpoint);
allowcb.a_treeoffset =
offsetof(zfs_allow_node_t, z_localdescend);
allowcb.a_permcnt = 0;
zfs_iter_perms(&curperms->z_user, banner, &allowcb);
zfs_iter_perms(&curperms->z_group, banner, &allowcb);
zfs_iter_perms(&curperms->z_everyone, banner, &allowcb);
(void) printf("%s", LINES);
}
zfs_free_allows(perms);
return (0);
}
#define ALLOWOPTIONS "ldcsu:g:e"
#define UNALLOWOPTIONS "ldcsu:g:er"
/*
* Validate options, and build necessary datastructure to display/remove/add
* permissions.
* Returns 0 - If permissions should be added/removed
* Returns 1 - If permissions should be displayed.
* Returns -1 - on failure
*/
int
parse_allow_args(int *argc, char **argv[], boolean_t unallow,
char **ds, int *recurse, nvlist_t **zperms)
{
int c;
char *options = unallow ? UNALLOWOPTIONS : ALLOWOPTIONS;
zfs_deleg_inherit_t deleg_type = ZFS_DELEG_NONE;
zfs_deleg_who_type_t who_type = ZFS_DELEG_WHO_UNKNOWN;
char *who = NULL;
char *perms = NULL;
zfs_handle_t *zhp;
while ((c = getopt(*argc, *argv, options)) != -1) {
switch (c) {
case 'l':
if (who_type == ZFS_DELEG_CREATE ||
who_type == ZFS_DELEG_NAMED_SET)
usage(B_FALSE);
deleg_type |= ZFS_DELEG_PERM_LOCAL;
break;
case 'd':
if (who_type == ZFS_DELEG_CREATE ||
who_type == ZFS_DELEG_NAMED_SET)
usage(B_FALSE);
deleg_type |= ZFS_DELEG_PERM_DESCENDENT;
break;
case 'r':
*recurse = B_TRUE;
break;
case 'c':
if (who_type != ZFS_DELEG_WHO_UNKNOWN)
usage(B_FALSE);
if (deleg_type)
usage(B_FALSE);
who_type = ZFS_DELEG_CREATE;
break;
case 's':
if (who_type != ZFS_DELEG_WHO_UNKNOWN)
usage(B_FALSE);
if (deleg_type)
usage(B_FALSE);
who_type = ZFS_DELEG_NAMED_SET;
break;
case 'u':
if (who_type != ZFS_DELEG_WHO_UNKNOWN)
usage(B_FALSE);
who_type = ZFS_DELEG_USER;
who = optarg;
break;
case 'g':
if (who_type != ZFS_DELEG_WHO_UNKNOWN)
usage(B_FALSE);
who_type = ZFS_DELEG_GROUP;
who = optarg;
break;
case 'e':
if (who_type != ZFS_DELEG_WHO_UNKNOWN)
usage(B_FALSE);
who_type = ZFS_DELEG_EVERYONE;
break;
default:
usage(B_FALSE);
break;
}
}
if (deleg_type == 0)
deleg_type = ZFS_DELEG_PERM_LOCALDESCENDENT;
*argc -= optind;
*argv += optind;
if (unallow == B_FALSE && *argc == 1) {
/*
* Only print permissions if no options were processed
*/
if (optind == 1)
return (1);
else
usage(B_FALSE);
}
/*
* initialize variables for zfs_build_perms based on number
* of arguments.
* 3 arguments ==> zfs [un]allow joe perm,perm,perm <dataset> or
* zfs [un]allow -s @set1 perm,perm <dataset>
* 2 arguments ==> zfs [un]allow -c perm,perm <dataset> or
* zfs [un]allow -u|-g <name> perm <dataset> or
* zfs [un]allow -e perm,perm <dataset>
* zfs unallow joe <dataset>
* zfs unallow -s @set1 <dataset>
* 1 argument ==> zfs [un]allow -e <dataset> or
* zfs [un]allow -c <dataset>
*/
switch (*argc) {
case 3:
perms = (*argv)[1];
who = (*argv)[0];
*ds = (*argv)[2];
/*
* advance argc/argv for do_allow cases.
* for do_allow case make sure who have a know who type
* and its not a permission set.
*/
if (unallow == B_TRUE) {
*argc -= 2;
*argv += 2;
} else if (who_type != ZFS_DELEG_WHO_UNKNOWN &&
who_type != ZFS_DELEG_NAMED_SET)
usage(B_FALSE);
break;
case 2:
if (unallow == B_TRUE && (who_type == ZFS_DELEG_EVERYONE ||
who_type == ZFS_DELEG_CREATE || who != NULL)) {
perms = (*argv)[0];
*ds = (*argv)[1];
} else {
if (unallow == B_FALSE &&
(who_type == ZFS_DELEG_WHO_UNKNOWN ||
who_type == ZFS_DELEG_NAMED_SET))
usage(B_FALSE);
else if (who_type == ZFS_DELEG_WHO_UNKNOWN ||
who_type == ZFS_DELEG_NAMED_SET)
who = (*argv)[0];
else if (who_type != ZFS_DELEG_NAMED_SET)
perms = (*argv)[0];
*ds = (*argv)[1];
}
if (unallow == B_TRUE) {
(*argc)--;
(*argv)++;
}
break;
case 1:
if (unallow == B_FALSE)
usage(B_FALSE);
if (who == NULL && who_type != ZFS_DELEG_CREATE &&
who_type != ZFS_DELEG_EVERYONE)
usage(B_FALSE);
*ds = (*argv)[0];
break;
default:
usage(B_FALSE);
}
if (strrchr(*ds, '@')) {
(void) fprintf(stderr,
gettext("Can't set or remove 'allow' permissions "
"on snapshots.\n"));
return (-1);
}
if ((zhp = zfs_open(g_zfs, *ds, ZFS_TYPE_DATASET)) == NULL)
return (-1);
if ((zfs_build_perms(zhp, who, perms,
who_type, deleg_type, zperms)) != 0) {
zfs_close(zhp);
return (-1);
}
zfs_close(zhp);
return (0);
}
static int
zfs_do_allow(int argc, char **argv)
{
char *ds;
nvlist_t *zperms = NULL;
zfs_handle_t *zhp;
int unused;
int ret;
if ((ret = parse_allow_args(&argc, &argv, B_FALSE, &ds,
&unused, &zperms)) == -1)
return (1);
if (ret == 1)
return (zfs_print_allows(argv[0]));
if ((zhp = zfs_open(g_zfs, ds, ZFS_TYPE_DATASET)) == NULL)
return (1);
if (zfs_perm_set(zhp, zperms)) {
zfs_close(zhp);
nvlist_free(zperms);
return (1);
}
nvlist_free(zperms);
zfs_close(zhp);
return (0);
}
static int
unallow_callback(zfs_handle_t *zhp, void *data)
{
nvlist_t *nvp = (nvlist_t *)data;
int error;
error = zfs_perm_remove(zhp, nvp);
if (error) {
(void) fprintf(stderr, gettext("Failed to remove permissions "
"on %s\n"), zfs_get_name(zhp));
}
return (error);
}
static int
zfs_do_unallow(int argc, char **argv)
{
int recurse = B_FALSE;
char *ds;
int error;
nvlist_t *zperms = NULL;
int flags = 0;
if (parse_allow_args(&argc, &argv, B_TRUE,
&ds, &recurse, &zperms) == -1)
return (1);
if (recurse)
flags |= ZFS_ITER_RECURSE;
error = zfs_for_each(argc, argv, flags,
ZFS_TYPE_FILESYSTEM|ZFS_TYPE_VOLUME, NULL,
NULL, 0, unallow_callback, (void *)zperms);
if (zperms)
nvlist_free(zperms);
return (error);
}
typedef struct get_all_cbdata {
zfs_handle_t **cb_handles;
size_t cb_alloc;
@ -3114,7 +2850,6 @@ share_mount_one(zfs_handle_t *zhp, int op, int flags, char *protocol,
sizeof (shareopts), NULL, NULL, 0, B_FALSE) == 0);
verify(zfs_prop_get(zhp, ZFS_PROP_SHARESMB, smbshareopts,
sizeof (smbshareopts), NULL, NULL, 0, B_FALSE) == 0);
canmount = zfs_prop_get_int(zhp, ZFS_PROP_CANMOUNT);
if (op == OP_SHARE && strcmp(shareopts, "off") == 0 &&
strcmp(smbshareopts, "off") == 0) {
@ -3124,7 +2859,8 @@ share_mount_one(zfs_handle_t *zhp, int op, int flags, char *protocol,
(void) fprintf(stderr, gettext("cannot share '%s': "
"legacy share\n"), zfs_get_name(zhp));
(void) fprintf(stderr, gettext("use share(1M) to "
"share this filesystem\n"));
"share this filesystem, or set "
"sharenfs property on\n"));
return (1);
}
@ -3162,6 +2898,7 @@ share_mount_one(zfs_handle_t *zhp, int op, int flags, char *protocol,
* noauto no return 0
* noauto yes pass through
*/
canmount = zfs_prop_get_int(zhp, ZFS_PROP_CANMOUNT);
if (canmount == ZFS_CANMOUNT_OFF) {
if (!explicit)
return (0);
@ -4055,6 +3792,15 @@ zfs_do_unjail(int argc, char **argv)
return (do_jail(argc, argv, 0));
}
/* ARGSUSED */
static int
zfs_do_python(int argc, char **argv)
{
(void) execv(pypath, argv-1);
(void) printf("internal error: %s not found\n", pypath);
return (-1);
}
/*
* Called when invoked as /etc/fs/zfs/mount. Do the mount if the mountpoint is
* 'legacy'. Otherwise, complain that use should be using 'zfs mount'.
@ -4312,6 +4058,7 @@ main(int argc, char **argv)
/*
* Run the appropriate command.
*/
libzfs_mnttab_cache(g_zfs, B_TRUE);
if (find_command_idx(cmdname, &i) == 0) {
current_command = &command_table[i];
ret = command_table[i].func(argc - 1, argv + 1);
@ -4324,6 +4071,7 @@ main(int argc, char **argv)
"command '%s'\n"), cmdname);
usage(B_FALSE);
}
libzfs_mnttab_cache(g_zfs, B_FALSE);
}
(void) fclose(mnttab_file);

View File

@ -20,7 +20,7 @@
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -378,12 +378,11 @@ add_prop_list(const char *propname, char *propval, nvlist_t **props,
}
normnm = zpool_prop_to_name(prop);
} else {
if ((fprop = zfs_name_to_prop(propname)) == ZPROP_INVAL) {
(void) fprintf(stderr, gettext("property '%s' is "
"not a valid file system property\n"), propname);
return (2);
if ((fprop = zfs_name_to_prop(propname)) != ZPROP_INVAL) {
normnm = zfs_prop_to_name(fprop);
} else {
normnm = propname;
}
normnm = zfs_prop_to_name(fprop);
}
if (nvlist_lookup_string(proplist, normnm, &strval) == 0 &&
@ -1263,7 +1262,7 @@ show_import(nvlist_t *config)
*/
static int
do_import(nvlist_t *config, const char *newname, const char *mntopts,
int force, nvlist_t *props, boolean_t allowfaulted)
int force, nvlist_t *props, boolean_t do_verbatim)
{
zpool_handle_t *zhp;
char *name;
@ -1316,16 +1315,17 @@ do_import(nvlist_t *config, const char *newname, const char *mntopts,
}
}
if (zpool_import_props(g_zfs, config, newname, props,
allowfaulted) != 0)
if (zpool_import_props(g_zfs, config, newname, props, do_verbatim) != 0)
return (1);
if (newname != NULL)
name = (char *)newname;
verify((zhp = zpool_open_canfail(g_zfs, name)) != NULL);
if ((zhp = zpool_open_canfail(g_zfs, name)) == NULL)
return (1);
if (zpool_enable_datasets(zhp, mntopts, 0) != 0) {
if (zpool_get_state(zhp) != POOL_STATE_UNAVAIL &&
zpool_enable_datasets(zhp, mntopts, 0) != 0) {
zpool_close(zhp);
return (1);
}
@ -1359,7 +1359,8 @@ do_import(nvlist_t *config, const char *newname, const char *mntopts,
* -F Import even in the presence of faulted vdevs. This is an
* intentionally undocumented option for testing purposes, and
* treats the pool configuration as complete, leaving any bad
* vdevs in the FAULTED state.
* vdevs in the FAULTED state. In other words, it does verbatim
* import.
*
* -a Import all pools found.
*
@ -1388,7 +1389,7 @@ zpool_do_import(int argc, char **argv)
nvlist_t *found_config;
nvlist_t *props = NULL;
boolean_t first;
boolean_t allow_faulted = B_FALSE;
boolean_t do_verbatim = B_FALSE;
uint64_t pool_state;
char *cachefile = NULL;
@ -1421,7 +1422,7 @@ zpool_do_import(int argc, char **argv)
do_force = B_TRUE;
break;
case 'F':
allow_faulted = B_TRUE;
do_verbatim = B_TRUE;
break;
case 'o':
if ((propval = strchr(optarg, '=')) != NULL) {
@ -1571,7 +1572,7 @@ zpool_do_import(int argc, char **argv)
if (do_all)
err |= do_import(config, NULL, mntopts,
do_force, props, allow_faulted);
do_force, props, do_verbatim);
else
show_import(config);
} else if (searchname != NULL) {
@ -1619,7 +1620,7 @@ zpool_do_import(int argc, char **argv)
err = B_TRUE;
} else {
err |= do_import(found_config, argc == 1 ? NULL :
argv[1], mntopts, do_force, props, allow_faulted);
argv[1], mntopts, do_force, props, do_verbatim);
}
}
@ -2766,7 +2767,7 @@ find_spare(zpool_handle_t *zhp, void *data)
*/
void
print_status_config(zpool_handle_t *zhp, const char *name, nvlist_t *nv,
int namewidth, int depth, boolean_t isspare, boolean_t print_logs)
int namewidth, int depth, boolean_t isspare)
{
nvlist_t **child;
uint_t c, children;
@ -2880,13 +2881,14 @@ print_status_config(zpool_handle_t *zhp, const char *name, nvlist_t *nv,
for (c = 0; c < children; c++) {
uint64_t is_log = B_FALSE;
/* Don't print logs here */
(void) nvlist_lookup_uint64(child[c], ZPOOL_CONFIG_IS_LOG,
&is_log);
if ((is_log && !print_logs) || (!is_log && print_logs))
if (is_log)
continue;
vname = zpool_vdev_name(g_zfs, zhp, child[c]);
print_status_config(zhp, vname, child[c],
namewidth, depth + 2, isspare, B_FALSE);
namewidth, depth + 2, isspare);
free(vname);
}
}
@ -2941,7 +2943,7 @@ print_spares(zpool_handle_t *zhp, nvlist_t **spares, uint_t nspares,
for (i = 0; i < nspares; i++) {
name = zpool_vdev_name(g_zfs, zhp, spares[i]);
print_status_config(zhp, name, spares[i],
namewidth, 2, B_TRUE, B_FALSE);
namewidth, 2, B_TRUE);
free(name);
}
}
@ -2961,7 +2963,40 @@ print_l2cache(zpool_handle_t *zhp, nvlist_t **l2cache, uint_t nl2cache,
for (i = 0; i < nl2cache; i++) {
name = zpool_vdev_name(g_zfs, zhp, l2cache[i]);
print_status_config(zhp, name, l2cache[i],
namewidth, 2, B_FALSE, B_FALSE);
namewidth, 2, B_FALSE);
free(name);
}
}
/*
* Print log vdevs.
* Logs are recorded as top level vdevs in the main pool child array but with
* "is_log" set to 1. We use print_status_config() to print the top level logs
* then any log children (eg mirrored slogs) are printed recursively - which
* works because only the top level vdev is marked "is_log"
*/
static void
print_logs(zpool_handle_t *zhp, nvlist_t *nv, int namewidth)
{
uint_t c, children;
nvlist_t **child;
if (nvlist_lookup_nvlist_array(nv, ZPOOL_CONFIG_CHILDREN, &child,
&children) != 0)
return;
(void) printf(gettext("\tlogs\n"));
for (c = 0; c < children; c++) {
uint64_t is_log = B_FALSE;
char *name;
(void) nvlist_lookup_uint64(child[c], ZPOOL_CONFIG_IS_LOG,
&is_log);
if (!is_log)
continue;
name = zpool_vdev_name(g_zfs, zhp, child[c]);
print_status_config(zhp, name, child[c], namewidth, 2, B_FALSE);
free(name);
}
}
@ -3191,11 +3226,10 @@ status_callback(zpool_handle_t *zhp, void *data)
(void) printf(gettext("\t%-*s %-8s %5s %5s %5s\n"), namewidth,
"NAME", "STATE", "READ", "WRITE", "CKSUM");
print_status_config(zhp, zpool_get_name(zhp), nvroot,
namewidth, 0, B_FALSE, B_FALSE);
if (num_logs(nvroot) > 0)
print_status_config(zhp, "logs", nvroot, namewidth, 0,
B_FALSE, B_TRUE);
namewidth, 0, B_FALSE);
if (num_logs(nvroot) > 0)
print_logs(zhp, nvroot, namewidth);
if (nvlist_lookup_nvlist_array(nvroot, ZPOOL_CONFIG_L2CACHE,
&l2cache, &nl2cache) == 0)
print_l2cache(zhp, l2cache, nl2cache, namewidth);
@ -3496,8 +3530,8 @@ zpool_do_upgrade(int argc, char **argv)
(void) printf(gettext(" 11 Improved scrub performance\n"));
(void) printf(gettext(" 12 Snapshot properties\n"));
(void) printf(gettext(" 13 snapused property\n"));
(void) printf(gettext(" 14 passthrough-x aclinherit "
"support\n"));
(void) printf(gettext(" 14 passthrough-x aclinherit\n"));
(void) printf(gettext(" 15 user/group space accounting\n"));
(void) printf(gettext("For more information on a particular "
"version, including supported releases, see:\n\n"));
(void) printf("http://www.opensolaris.org/os/community/zfs/"

File diff suppressed because it is too large Load Diff

View File

@ -29,6 +29,7 @@
#include <assert.h>
#include <libnvpair.h>
#include <sys/mnttab.h>
#include <sys/param.h>
#include <sys/types.h>
#include <sys/varargs.h>
@ -175,6 +176,14 @@ extern void libzfs_print_on_error(libzfs_handle_t *, boolean_t);
extern int libzfs_errno(libzfs_handle_t *);
extern const char *libzfs_error_action(libzfs_handle_t *);
extern const char *libzfs_error_description(libzfs_handle_t *);
extern void libzfs_mnttab_init(libzfs_handle_t *);
extern void libzfs_mnttab_fini(libzfs_handle_t *);
extern void libzfs_mnttab_cache(libzfs_handle_t *, boolean_t);
extern int libzfs_mnttab_find(libzfs_handle_t *, const char *,
struct mnttab *);
extern void libzfs_mnttab_add(libzfs_handle_t *, const char *,
const char *, const char *);
extern void libzfs_mnttab_remove(libzfs_handle_t *, const char *);
/*
* Basic handle functions
@ -256,9 +265,15 @@ typedef enum {
ZPOOL_STATUS_HOSTID_MISMATCH, /* last accessed by another system */
ZPOOL_STATUS_IO_FAILURE_WAIT, /* failed I/O, failmode 'wait' */
ZPOOL_STATUS_IO_FAILURE_CONTINUE, /* failed I/O, failmode 'continue' */
ZPOOL_STATUS_BAD_LOG, /* cannot read log chain(s) */
/*
* These faults have no corresponding message ID. At the time we are
* checking the status, the original reason for the FMA fault (I/O or
* checksum errors) has been lost.
*/
ZPOOL_STATUS_FAULTED_DEV_R, /* faulted device with replicas */
ZPOOL_STATUS_FAULTED_DEV_NR, /* faulted device with no replicas */
ZPOOL_STATUS_BAD_LOG, /* cannot read log chain(s) */
/*
* The following are not faults per se, but still an error possibly
@ -354,6 +369,10 @@ extern int zfs_prop_get(zfs_handle_t *, zfs_prop_t, char *, size_t,
zprop_source_t *, char *, size_t, boolean_t);
extern int zfs_prop_get_numeric(zfs_handle_t *, zfs_prop_t, uint64_t *,
zprop_source_t *, char *, size_t);
extern int zfs_prop_get_userquota_int(zfs_handle_t *zhp, const char *propname,
uint64_t *propvalue);
extern int zfs_prop_get_userquota(zfs_handle_t *zhp, const char *propname,
char *propbuf, int proplen, boolean_t literal);
extern uint64_t zfs_prop_get_int(zfs_handle_t *, zfs_prop_t);
extern int zfs_prop_inherit(zfs_handle_t *, const char *);
extern const char *zfs_prop_values(zfs_prop_t);
@ -441,6 +460,12 @@ extern int zfs_send(zfs_handle_t *, const char *, const char *,
boolean_t, boolean_t, boolean_t, boolean_t, int);
extern int zfs_promote(zfs_handle_t *);
typedef int (*zfs_userspace_cb_t)(void *arg, const char *domain,
uid_t rid, uint64_t space);
extern int zfs_userspace(zfs_handle_t *zhp, zfs_userquota_prop_t type,
zfs_userspace_cb_t func, void *arg);
typedef struct recvflags {
/* print informational messages (ie, -v was specified) */
int verbose : 1;
@ -478,17 +503,6 @@ extern boolean_t zfs_dataset_exists(libzfs_handle_t *, const char *,
zfs_type_t);
extern int zfs_spa_version(zfs_handle_t *, int *);
/*
* dataset permission functions.
*/
extern int zfs_perm_set(zfs_handle_t *, nvlist_t *);
extern int zfs_perm_remove(zfs_handle_t *, nvlist_t *);
extern int zfs_build_perms(zfs_handle_t *, char *, char *,
zfs_deleg_who_type_t, zfs_deleg_inherit_t, nvlist_t **nvlist_t);
extern int zfs_perm_get(zfs_handle_t *, zfs_allow_t **);
extern void zfs_free_allows(zfs_allow_t *);
extern void zfs_deleg_permissions(void);
/*
* Mount support functions.
*/
@ -525,7 +539,7 @@ extern int zfs_unshare_iscsi(zfs_handle_t *);
#ifdef TODO
extern int zfs_iscsi_perm_check(libzfs_handle_t *, char *, ucred_t *);
#endif
extern int zfs_deleg_share_nfs(libzfs_handle_t *, char *, char *,
extern int zfs_deleg_share_nfs(libzfs_handle_t *, char *, char *, char *,
void *, void *, int, zfs_share_op_t);
/*
@ -570,6 +584,15 @@ extern int zpool_remove_zvol_links(zpool_handle_t *);
/* is this zvol valid for use as a dump device? */
extern int zvol_check_dump_config(char *);
/*
* Management interfaces for SMB ACL files
*/
int zfs_smb_acl_add(libzfs_handle_t *, char *, char *, char *);
int zfs_smb_acl_remove(libzfs_handle_t *, char *, char *, char *);
int zfs_smb_acl_purge(libzfs_handle_t *, char *, char *);
int zfs_smb_acl_rename(libzfs_handle_t *, char *, char *, char *, char *);
/*
* Enable and disable datasets within a pool by mounting/unmounting and
* sharing/unsharing them.

View File

@ -20,7 +20,7 @@
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*
* Portions Copyright 2007 Ramprakash Jelari
@ -218,6 +218,7 @@ changelist_postfix(prop_changelist_t *clp)
boolean_t sharenfs;
boolean_t sharesmb;
boolean_t mounted;
/*
* If we are in the global zone, but this dataset is exported
@ -272,20 +273,29 @@ changelist_postfix(prop_changelist_t *clp)
shareopts, sizeof (shareopts), NULL, NULL, 0,
B_FALSE) == 0) && (strcmp(shareopts, "off") != 0));
if ((cn->cn_mounted || clp->cl_waslegacy || sharenfs ||
sharesmb) && !zfs_is_mounted(cn->cn_handle, NULL) &&
zfs_mount(cn->cn_handle, NULL, 0) != 0)
errors++;
mounted = zfs_is_mounted(cn->cn_handle, NULL);
if (!mounted && (cn->cn_mounted ||
((sharenfs || sharesmb || clp->cl_waslegacy) &&
(zfs_prop_get_int(cn->cn_handle,
ZFS_PROP_CANMOUNT) == ZFS_CANMOUNT_ON)))) {
if (zfs_mount(cn->cn_handle, NULL, 0) != 0)
errors++;
else
mounted = TRUE;
}
/*
* We always re-share even if the filesystem is currently
* shared, so that we can adopt any new options.
* If the file system is mounted we always re-share even
* if the filesystem is currently shared, so that we can
* adopt any new options.
*/
if (sharenfs)
if (sharenfs && mounted)
errors += zfs_share_nfs(cn->cn_handle);
else if (cn->cn_shared || clp->cl_waslegacy)
errors += zfs_unshare_nfs(cn->cn_handle, NULL);
if (sharesmb)
if (sharesmb && mounted)
errors += zfs_share_smb(cn->cn_handle);
else if (cn->cn_shared || clp->cl_waslegacy)
errors += zfs_unshare_smb(cn->cn_handle, NULL);
@ -621,8 +631,6 @@ changelist_gather(zfs_handle_t *zhp, zfs_prop_t prop, int gather_flags,
clp->cl_prop = ZFS_PROP_MOUNTPOINT;
} else if (prop == ZFS_PROP_VOLSIZE) {
clp->cl_prop = ZFS_PROP_MOUNTPOINT;
} else if (prop == ZFS_PROP_VERSION) {
clp->cl_prop = ZFS_PROP_MOUNTPOINT;
} else {
clp->cl_prop = prop;
}

File diff suppressed because it is too large Load Diff

View File

@ -19,12 +19,10 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#pragma ident "%Z%%M% %I% %E% SMI"
/*
* Iterate over all children of the current object. This includes the normal
* dataset hierarchy, but also arbitrary hierarchies due to clones. We want to
@ -399,13 +397,6 @@ iterate_children(libzfs_handle_t *hdl, zfs_graph_t *zgp, const char *dataset)
for ((void) strlcpy(zc.zc_name, dataset, sizeof (zc.zc_name));
ioctl(hdl->libzfs_fd, ZFS_IOC_DATASET_LIST_NEXT, &zc) == 0;
(void) strlcpy(zc.zc_name, dataset, sizeof (zc.zc_name))) {
/*
* Ignore private dataset names.
*/
if (dataset_name_hidden(zc.zc_name))
continue;
/*
* Get statistics for this dataset, to determine the type of the
* dataset and clone statistics. If this fails, the dataset has

View File

@ -63,6 +63,8 @@ struct libzfs_handle {
int libzfs_printerr;
void *libzfs_sharehdl; /* libshare handle */
uint_t libzfs_shareflags;
boolean_t libzfs_mnttab_enable;
avl_tree_t libzfs_mnttab_cache;
};
#define ZFSSHARE_MISS 0x01 /* Didn't find entry in cache */
@ -185,7 +187,7 @@ extern int zfs_init_libshare(libzfs_handle_t *, int);
extern void zfs_uninit_libshare(libzfs_handle_t *);
extern int zfs_parse_options(char *, zfs_share_proto_t);
extern int zfs_unshare_proto(zfs_handle_t *zhp,
extern int zfs_unshare_proto(zfs_handle_t *,
const char *, zfs_share_proto_t *);
#ifdef __FreeBSD__

View File

@ -74,7 +74,6 @@
#include <unistd.h>
#include <zone.h>
#include <sys/mntent.h>
#include <sys/mnttab.h>
#include <sys/mount.h>
#include <sys/stat.h>
@ -243,18 +242,9 @@ dir_is_empty(const char *dirname)
boolean_t
is_mounted(libzfs_handle_t *zfs_hdl, const char *special, char **where)
{
struct mnttab search = { 0 }, entry;
struct mnttab entry;
/*
* Search for the entry in /etc/mnttab. We don't bother getting the
* mountpoint, as we can just search for the special device. This will
* also let us find mounts when the mountpoint is 'legacy'.
*/
search.mnt_special = (char *)special;
search.mnt_fstype = MNTTYPE_ZFS;
rewind(zfs_hdl->libzfs_mnttab);
if (getmntany(zfs_hdl->libzfs_mnttab, &entry, &search) != 0)
if (libzfs_mnttab_find(zfs_hdl, special, &entry) != 0)
return (B_FALSE);
if (where != NULL)
@ -367,12 +357,14 @@ zfs_mount(zfs_handle_t *zhp, const char *options, int flags)
} else {
zfs_error_aux(hdl, strerror(errno));
}
return (zfs_error_fmt(hdl, EZFS_MOUNTFAILED,
dgettext(TEXT_DOMAIN, "cannot mount '%s'"),
zhp->zfs_name));
}
/* add the mounted entry into our cache */
libzfs_mnttab_add(hdl, zfs_get_name(zhp), mountpoint,
mntopts);
return (0);
}
@ -398,26 +390,23 @@ unmount_one(libzfs_handle_t *hdl, const char *mountpoint, int flags)
int
zfs_unmount(zfs_handle_t *zhp, const char *mountpoint, int flags)
{
struct mnttab search = { 0 }, entry;
libzfs_handle_t *hdl = zhp->zfs_hdl;
struct mnttab entry;
char *mntpt = NULL;
/* check to see if need to unmount the filesystem */
search.mnt_special = zhp->zfs_name;
search.mnt_fstype = MNTTYPE_ZFS;
rewind(zhp->zfs_hdl->libzfs_mnttab);
/* check to see if we need to unmount the filesystem */
if (mountpoint != NULL || ((zfs_get_type(zhp) == ZFS_TYPE_FILESYSTEM) &&
getmntany(zhp->zfs_hdl->libzfs_mnttab, &entry, &search) == 0)) {
libzfs_mnttab_find(hdl, zhp->zfs_name, &entry) == 0)) {
/*
* mountpoint may have come from a call to
* getmnt/getmntany if it isn't NULL. If it is NULL,
* we know it comes from getmntany which can then get
* overwritten later. We strdup it to play it safe.
* we know it comes from libzfs_mnttab_find which can
* then get freed later. We strdup it to play it safe.
*/
if (mountpoint == NULL)
mntpt = zfs_strdup(zhp->zfs_hdl, entry.mnt_mountp);
mntpt = zfs_strdup(hdl, entry.mnt_mountp);
else
mntpt = zfs_strdup(zhp->zfs_hdl, mountpoint);
mntpt = zfs_strdup(hdl, mountpoint);
/*
* Unshare and unmount the filesystem
@ -425,11 +414,12 @@ zfs_unmount(zfs_handle_t *zhp, const char *mountpoint, int flags)
if (zfs_unshare_proto(zhp, mntpt, share_all_proto) != 0)
return (-1);
if (unmount_one(zhp->zfs_hdl, mntpt, flags) != 0) {
if (unmount_one(hdl, mntpt, flags) != 0) {
free(mntpt);
(void) zfs_shareall(zhp);
return (-1);
}
libzfs_mnttab_remove(hdl, zhp->zfs_name);
free(mntpt);
}
@ -899,18 +889,17 @@ int
zfs_unshare_proto(zfs_handle_t *zhp, const char *mountpoint,
zfs_share_proto_t *proto)
{
struct mnttab search = { 0 }, entry;
libzfs_handle_t *hdl = zhp->zfs_hdl;
struct mnttab entry;
char *mntpt = NULL;
/* check to see if need to unmount the filesystem */
search.mnt_special = (char *)zfs_get_name(zhp);
search.mnt_fstype = MNTTYPE_ZFS;
rewind(zhp->zfs_hdl->libzfs_mnttab);
if (mountpoint != NULL)
mntpt = zfs_strdup(zhp->zfs_hdl, mountpoint);
mountpoint = mntpt = zfs_strdup(hdl, mountpoint);
if (mountpoint != NULL || ((zfs_get_type(zhp) == ZFS_TYPE_FILESYSTEM) &&
getmntany(zhp->zfs_hdl->libzfs_mnttab, &entry, &search) == 0)) {
libzfs_mnttab_find(hdl, zfs_get_name(zhp), &entry) == 0)) {
zfs_share_proto_t *curr_proto;
if (mountpoint == NULL)
@ -919,8 +908,8 @@ zfs_unshare_proto(zfs_handle_t *zhp, const char *mountpoint,
for (curr_proto = proto; *curr_proto != PROTO_END;
curr_proto++) {
if (is_shared(zhp->zfs_hdl, mntpt, *curr_proto) &&
unshare_one(zhp->zfs_hdl, zhp->zfs_name,
if (is_shared(hdl, mntpt, *curr_proto) &&
unshare_one(hdl, zhp->zfs_name,
mntpt, *curr_proto) != 0) {
if (mntpt != NULL)
free(mntpt);

View File

@ -20,7 +20,7 @@
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -49,6 +49,12 @@
static int read_efi_label(nvlist_t *config, diskaddr_t *sb);
#if defined(__i386) || defined(__amd64)
#define BOOTCMD "installgrub(1M)"
#else
#define BOOTCMD "installboot(1M)"
#endif
/*
* ====================================================================
* zpool property functions
@ -211,12 +217,39 @@ zpool_get_prop(zpool_handle_t *zhp, zpool_prop_t prop, char *buf, size_t len,
uint_t vsc;
if (zpool_get_state(zhp) == POOL_STATE_UNAVAIL) {
if (prop == ZPOOL_PROP_NAME)
switch (prop) {
case ZPOOL_PROP_NAME:
(void) strlcpy(buf, zpool_get_name(zhp), len);
else if (prop == ZPOOL_PROP_HEALTH)
break;
case ZPOOL_PROP_HEALTH:
(void) strlcpy(buf, "FAULTED", len);
else
break;
case ZPOOL_PROP_GUID:
intval = zpool_get_prop_int(zhp, prop, &src);
(void) snprintf(buf, len, "%llu", intval);
break;
case ZPOOL_PROP_ALTROOT:
case ZPOOL_PROP_CACHEFILE:
if (zhp->zpool_props != NULL ||
zpool_get_all_props(zhp) == 0) {
(void) strlcpy(buf,
zpool_get_prop_string(zhp, prop, &src),
len);
if (srctype != NULL)
*srctype = src;
return (0);
}
/* FALLTHROUGH */
default:
(void) strlcpy(buf, "-", len);
break;
}
if (srctype != NULL)
*srctype = src;
return (0);
}
@ -277,6 +310,17 @@ zpool_get_prop(zpool_handle_t *zhp, zpool_prop_t prop, char *buf, size_t len,
return (0);
}
static boolean_t
pool_is_bootable(zpool_handle_t *zhp)
{
char bootfs[ZPOOL_MAXNAMELEN];
return (zpool_get_prop(zhp, ZPOOL_PROP_BOOTFS, bootfs,
sizeof (bootfs), NULL) == 0 && strncmp(bootfs, "-",
sizeof (bootfs)) != 0);
}
/*
* Check if the bootfs name has the same pool name as it is set to.
* Assuming bootfs is a valid dataset name.
@ -296,7 +340,6 @@ bootfs_name_valid(const char *pool, char *bootfs)
return (B_FALSE);
}
#if defined(sun)
/*
* Inspect the configuration to determine if any of the devices contain
* an EFI label.
@ -304,6 +347,7 @@ bootfs_name_valid(const char *pool, char *bootfs)
static boolean_t
pool_uses_efi(nvlist_t *config)
{
#ifdef sun
nvlist_t **child;
uint_t c, children;
@ -315,9 +359,9 @@ pool_uses_efi(nvlist_t *config)
if (pool_uses_efi(child[c]))
return (B_TRUE);
}
#endif /* sun */
return (B_FALSE);
}
#endif
/*
* Given an nvlist of zpool properties to be set, validate that they are
@ -519,9 +563,6 @@ zpool_set_prop(zpool_handle_t *zhp, const char *propname, const char *propval)
dgettext(TEXT_DOMAIN, "cannot set property for '%s'"),
zhp->zpool_name);
if (zhp->zpool_props == NULL && zpool_get_all_props(zhp))
return (zfs_error(zhp->zpool_hdl, EZFS_POOLPROPS, errbuf));
if (nvlist_alloc(&nvl, NV_UNIQUE_NAME, 0) != 0)
return (no_memory(zhp->zpool_hdl));
@ -1012,6 +1053,24 @@ zpool_add(zpool_handle_t *zhp, nvlist_t *nvroot)
return (zfs_error(hdl, EZFS_BADVERSION, msg));
}
if (pool_is_bootable(zhp) && nvlist_lookup_nvlist_array(nvroot,
ZPOOL_CONFIG_SPARES, &spares, &nspares) == 0) {
uint64_t s;
for (s = 0; s < nspares; s++) {
char *path;
if (nvlist_lookup_string(spares[s], ZPOOL_CONFIG_PATH,
&path) == 0 && pool_uses_efi(spares[s])) {
zfs_error_aux(hdl, dgettext(TEXT_DOMAIN,
"device '%s' contains an EFI label and "
"cannot be used on root pools."),
zpool_vdev_name(hdl, NULL, spares[s]));
return (zfs_error(hdl, EZFS_POOL_NOTSUP, msg));
}
}
}
if (zpool_get_prop_int(zhp, ZPOOL_PROP_VERSION, NULL) <
SPA_VERSION_L2CACHE &&
nvlist_lookup_nvlist_array(nvroot, ZPOOL_CONFIG_L2CACHE,
@ -1164,7 +1223,9 @@ zpool_import(libzfs_handle_t *hdl, nvlist_t *config, const char *newname,
}
if (nvlist_add_string(props,
zpool_prop_to_name(ZPOOL_PROP_ALTROOT), altroot) != 0) {
zpool_prop_to_name(ZPOOL_PROP_ALTROOT), altroot) != 0 ||
nvlist_add_string(props,
zpool_prop_to_name(ZPOOL_PROP_CACHEFILE), "none") != 0) {
nvlist_free(props);
return (zfs_error_fmt(hdl, EZFS_NOMEM,
dgettext(TEXT_DOMAIN, "cannot import '%s'"),
@ -1453,7 +1514,6 @@ vdev_online(nvlist_t *nv)
int
zpool_get_physpath(zpool_handle_t *zhp, char *physpath)
{
char bootfs[ZPOOL_MAXNAMELEN];
nvlist_t *vdev_root;
nvlist_t **child;
uint_t count;
@ -1463,8 +1523,7 @@ zpool_get_physpath(zpool_handle_t *zhp, char *physpath)
* Make sure this is a root pool, as phys_path doesn't mean
* anything to a non-root pool.
*/
if (zpool_get_prop(zhp, ZPOOL_PROP_BOOTFS, bootfs,
sizeof (bootfs), NULL) != 0)
if (!pool_is_bootable(zhp))
return (-1);
verify(nvlist_lookup_nvlist(zhp->zpool_config,
@ -1738,6 +1797,7 @@ zpool_vdev_attach(zpool_handle_t *zhp,
uint_t children;
nvlist_t *config_root;
libzfs_handle_t *hdl = zhp->zpool_hdl;
boolean_t rootpool = pool_is_bootable(zhp);
if (replacing)
(void) snprintf(msg, sizeof (msg), dgettext(TEXT_DOMAIN,
@ -1746,6 +1806,16 @@ zpool_vdev_attach(zpool_handle_t *zhp,
(void) snprintf(msg, sizeof (msg), dgettext(TEXT_DOMAIN,
"cannot attach %s to %s"), new_disk, old_disk);
/*
* If this is a root pool, make sure that we're not attaching an
* EFI labeled device.
*/
if (rootpool && pool_uses_efi(nvroot)) {
zfs_error_aux(hdl, dgettext(TEXT_DOMAIN,
"EFI labeled devices are not supported on root pools."));
return (zfs_error(hdl, EZFS_POOL_NOTSUP, msg));
}
(void) strlcpy(zc.zc_name, zhp->zpool_name, sizeof (zc.zc_name));
if ((tgt = zpool_find_vdev(zhp, old_disk, &avail_spare, &l2cache,
&islog)) == 0)
@ -1812,8 +1882,19 @@ zpool_vdev_attach(zpool_handle_t *zhp,
zcmd_free_nvlists(&zc);
if (ret == 0)
if (ret == 0) {
if (rootpool) {
/*
* XXX - This should be removed once we can
* automatically install the bootblocks on the
* newly attached disk.
*/
(void) fprintf(stderr, dgettext(TEXT_DOMAIN, "Please "
"be sure to invoke %s to make '%s' bootable.\n"),
BOOTCMD, new_disk);
}
return (0);
}
switch (errno) {
case ENOTSUP:
@ -2824,6 +2905,13 @@ zpool_label_disk(libzfs_handle_t *hdl, zpool_handle_t *zhp, char *name)
if (zhp) {
nvlist_t *nvroot;
if (pool_is_bootable(zhp)) {
zfs_error_aux(hdl, dgettext(TEXT_DOMAIN,
"EFI labeled devices are not supported on root "
"pools."));
return (zfs_error(hdl, EZFS_POOL_NOTSUP, errbuf));
}
verify(nvlist_lookup_nvlist(zhp->zpool_config,
ZPOOL_CONFIG_VDEV_TREE, &nvroot) == 0);

View File

@ -240,6 +240,8 @@ send_iterate_prop(zfs_handle_t *zhp, nvlist_t *nv)
zfs_prop_t prop = zfs_name_to_prop(propname);
nvlist_t *propnv;
assert(zfs_prop_user(propname) || prop != ZPROP_INVAL);
if (!zfs_prop_user(propname) && zfs_prop_readonly(prop))
continue;
@ -596,12 +598,18 @@ dump_filesystem(zfs_handle_t *zhp, void *arg)
zhp->zfs_name, sdd->fromsnap);
sdd->err = B_TRUE;
} else if (!sdd->seento) {
(void) fprintf(stderr,
"WARNING: could not send %s@%s:\n"
"incremental source (%s@%s) "
"is not earlier than it\n",
zhp->zfs_name, sdd->tosnap,
zhp->zfs_name, sdd->fromsnap);
if (sdd->fromsnap) {
(void) fprintf(stderr,
"WARNING: could not send %s@%s:\n"
"incremental source (%s@%s) "
"is not earlier than it\n",
zhp->zfs_name, sdd->tosnap,
zhp->zfs_name, sdd->fromsnap);
} else {
(void) fprintf(stderr, "WARNING: "
"could not send %s@%s: does not exist\n",
zhp->zfs_name, sdd->tosnap);
}
sdd->err = B_TRUE;
}
} else {
@ -1100,6 +1108,7 @@ recv_incremental_replication(libzfs_handle_t *hdl, const char *tofs,
char newname[ZFS_MAXNAMELEN];
int error;
boolean_t needagain, progress;
char *s1, *s2;
VERIFY(0 == nvlist_lookup_string(stream_nv, "fromsnap", &fromsnap));
VERIFY(0 == nvlist_lookup_string(stream_nv, "tosnap", &tosnap));
@ -1294,12 +1303,13 @@ recv_incremental_replication(libzfs_handle_t *hdl, const char *tofs,
VERIFY(0 == nvlist_lookup_uint64(stream_nvfs,
"parentfromsnap", &stream_parent_fromsnap_guid));
s1 = strrchr(fsname, '/');
s2 = strrchr(stream_fsname, '/');
/* check for rename */
p1 = strrchr(fsname, '/');
p2 = strrchr(stream_fsname, '/');
if ((stream_parent_fromsnap_guid != 0 &&
stream_parent_fromsnap_guid != parent_fromsnap_guid) ||
(p1 != NULL && p2 != NULL && strcmp (p1, p2) != 0)) {
((s1 != NULL) && (s2 != NULL) && strcmp(s1, s2) != 0)) {
nvlist_t *parent;
char tryname[ZFS_MAXNAMELEN];

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -364,6 +364,11 @@ zfs_standard_error_fmt(libzfs_handle_t *hdl, int error, const char *fmt, ...)
case ENOTSUP:
zfs_verror(hdl, EZFS_BADVERSION, fmt, ap);
break;
case EAGAIN:
zfs_error_aux(hdl, dgettext(TEXT_DOMAIN,
"pool I/O is currently suspended"));
zfs_verror(hdl, EZFS_POOLUNAVAIL, fmt, ap);
break;
default:
zfs_error_aux(hdl, strerror(errno));
zfs_verror(hdl, EZFS_UNKNOWN, fmt, ap);
@ -437,6 +442,11 @@ zpool_standard_error_fmt(libzfs_handle_t *hdl, int error, const char *fmt, ...)
case EDQUOT:
zfs_verror(hdl, EZFS_NOSPC, fmt, ap);
return (-1);
case EAGAIN:
zfs_error_aux(hdl, dgettext(TEXT_DOMAIN,
"pool I/O is currently suspended"));
zfs_verror(hdl, EZFS_POOLUNAVAIL, fmt, ap);
break;
default:
zfs_error_aux(hdl, strerror(error));
@ -480,7 +490,6 @@ zfs_realloc(libzfs_handle_t *hdl, void *ptr, size_t oldsize, size_t newsize)
if ((ret = realloc(ptr, newsize)) == NULL) {
(void) no_memory(hdl);
free(ptr);
return (NULL);
}
@ -595,6 +604,7 @@ libzfs_init(void)
zfs_prop_init();
zpool_prop_init();
libzfs_mnttab_init(hdl);
return (hdl);
}
@ -612,6 +622,7 @@ libzfs_fini(libzfs_handle_t *hdl)
(void) free(hdl->libzfs_log_str);
zpool_free_handles(hdl);
namespace_clear(hdl);
libzfs_mnttab_fini(hdl);
free(hdl);
}
@ -802,6 +813,10 @@ zprop_print_headers(zprop_get_cbdata_t *cbp, zfs_type_t type)
cbp->cb_colwidths[GET_COL_SOURCE] = strlen(dgettext(TEXT_DOMAIN,
"SOURCE"));
/* first property is always NAME */
assert(cbp->cb_proplist->pl_prop ==
((type == ZFS_TYPE_POOL) ? ZPOOL_PROP_NAME : ZFS_PROP_NAME));
/*
* Go through and calculate the widths for each column. For the
* 'source' column, we kludge it up by taking the worst-case scenario of
@ -829,9 +844,13 @@ zprop_print_headers(zprop_get_cbdata_t *cbp, zfs_type_t type)
}
/*
* 'VALUE' column
* 'VALUE' column. The first property is always the 'name'
* property that was tacked on either by /sbin/zfs's
* zfs_do_get() or when calling zprop_expand_list(), so we
* ignore its width. If the user specified the name property
* to display, then it will be later in the list in any case.
*/
if ((pl->pl_prop != ZFS_PROP_NAME || !pl->pl_all) &&
if (pl != cbp->cb_proplist &&
pl->pl_width > cbp->cb_colwidths[GET_COL_VALUE])
cbp->cb_colwidths[GET_COL_VALUE] = pl->pl_width;
@ -1016,9 +1035,9 @@ zfs_nicestrtonum(libzfs_handle_t *hdl, const char *value, uint64_t *num)
return (-1);
}
/* Rely on stroll() to process the numeric portion. */
/* Rely on stroull() to process the numeric portion. */
errno = 0;
*num = strtoll(value, &end, 10);
*num = strtoull(value, &end, 10);
/*
* Check for ERANGE, which indicates that the value is too large to fit
@ -1208,7 +1227,7 @@ addlist(libzfs_handle_t *hdl, char *propname, zprop_list_t **listp,
* dataset property,
*/
if (prop == ZPROP_INVAL && (type == ZFS_TYPE_POOL ||
!zfs_prop_user(propname))) {
(!zfs_prop_user(propname) && !zfs_prop_userquota(propname)))) {
zfs_error_aux(hdl, dgettext(TEXT_DOMAIN,
"invalid property '%s'"), propname);
return (zfs_error(hdl, EZFS_BADPROP,

View File

@ -329,6 +329,7 @@ typedef void (task_func_t)(void *);
#define TASKQ_PREPOPULATE 0x0001
#define TASKQ_CPR_SAFE 0x0002 /* Use CPR safe protocol */
#define TASKQ_DYNAMIC 0x0004 /* Use dynamic thread scheduling */
#define TASKQ_THREADS_CPU_PCT 0x0008 /* Use dynamic thread scheduling */
#define TQ_SLEEP KM_SLEEP /* Can block for memory */
#define TQ_NOSLEEP KM_NOSLEEP /* cannot block for memory; may fail */
@ -590,6 +591,8 @@ typedef struct ksiddomain {
ksiddomain_t *ksid_lookupdomain(const char *);
void ksiddomain_rele(ksiddomain_t *);
typedef uint32_t idmap_rid_t;
#define SX_SYSINIT(name, lock, desc)
#define SYSCTL_DECL(...)

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -174,6 +174,19 @@ taskq_create(const char *name, int nthreads, pri_t pri,
taskq_t *tq = kmem_zalloc(sizeof (taskq_t), KM_SLEEP);
int t;
if (flags & TASKQ_THREADS_CPU_PCT) {
int pct;
ASSERT3S(nthreads, >=, 0);
ASSERT3S(nthreads, <=, 100);
pct = MIN(nthreads, 100);
pct = MAX(pct, 0);
nthreads = (sysconf(_SC_NPROCESSORS_ONLN) * pct) / 100;
nthreads = MAX(nthreads, 1); /* need at least 1 thread */
} else {
ASSERT3S(nthreads, >=, 1);
}
rw_init(&tq->tq_threadlock, NULL, RW_DEFAULT, NULL);
mutex_init(&tq->tq_lock, NULL, MUTEX_DEFAULT, NULL);
cv_init(&tq->tq_dispatch_cv, NULL, CV_DEFAULT, NULL);

View File

@ -0,0 +1,28 @@
#! /usr/bin/python2.4
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
# Copyright 2009 Sun Microsystems, Inc. All rights reserved.
# Use is subject to license terms.
#
"""
package which provides an administrative interface to ZFS
"""

View File

@ -0,0 +1,394 @@
#! /usr/bin/python2.4
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
# Copyright 2009 Sun Microsystems, Inc. All rights reserved.
# Use is subject to license terms.
#
"""This module implements the "zfs allow" and "zfs unallow" subcommands.
The only public interface is the zfs.allow.do_allow() function."""
import zfs.util
import zfs.dataset
import optparse
import sys
import pwd
import grp
import errno
_ = zfs.util._
class FSPerms(object):
"""This class represents all the permissions that are set on a
particular filesystem (not including those inherited)."""
__slots__ = "create", "sets", "local", "descend", "ld"
__repr__ = zfs.util.default_repr
def __init__(self, raw):
"""Create a FSPerms based on the dict of raw permissions
from zfs.ioctl.get_fsacl()."""
# set of perms
self.create = set()
# below are { "Ntype name": set(perms) }
# where N is a number that we just use for sorting,
# type is "user", "group", "everyone", or "" (for sets)
# name is a user, group, or set name, or "" (for everyone)
self.sets = dict()
self.local = dict()
self.descend = dict()
self.ld = dict()
# see the comment in dsl_deleg.c for the definition of whokey
for whokey in raw.keys():
perms = raw[whokey].keys()
whotypechr = whokey[0].lower()
ws = whokey[3:]
if whotypechr == "c":
self.create.update(perms)
elif whotypechr == "s":
nwho = "1" + ws
self.sets.setdefault(nwho, set()).update(perms)
else:
if whotypechr == "u":
try:
name = pwd.getpwuid(int(ws)).pw_name
except KeyError:
name = ws
nwho = "1user " + name
elif whotypechr == "g":
try:
name = grp.getgrgid(int(ws)).gr_name
except KeyError:
name = ws
nwho = "2group " + name
elif whotypechr == "e":
nwho = "3everyone"
else:
raise ValueError(whotypechr)
if whokey[1] == "l":
d = self.local
elif whokey[1] == "d":
d = self.descend
else:
raise ValueError(whokey[1])
d.setdefault(nwho, set()).update(perms)
# Find perms that are in both local and descend, and
# move them to ld.
for nwho in self.local:
if nwho not in self.descend:
continue
# note: these are set operations
self.ld[nwho] = self.local[nwho] & self.descend[nwho]
self.local[nwho] -= self.ld[nwho]
self.descend[nwho] -= self.ld[nwho]
@staticmethod
def __ldstr(d, header):
s = ""
for (nwho, perms) in sorted(d.items()):
# local and descend may have entries where perms
# is an empty set, due to consolidating all
# permissions into ld
if perms:
s += "\t%s %s\n" % \
(nwho[1:], ",".join(sorted(perms)))
if s:
s = header + s
return s
def __str__(self):
s = self.__ldstr(self.sets, _("Permission sets:\n"))
if self.create:
s += _("Create time permissions:\n")
s += "\t%s\n" % ",".join(sorted(self.create))
s += self.__ldstr(self.local, _("Local permissions:\n"))
s += self.__ldstr(self.descend, _("Descendent permissions:\n"))
s += self.__ldstr(self.ld, _("Local+Descendent permissions:\n"))
return s.rstrip()
def args_to_perms(parser, options, who, perms):
"""Return a dict of raw perms {"whostr" -> {"perm" -> None}}
based on the command-line input."""
# perms is not set if we are doing a "zfs unallow <who> <fs>" to
# remove all of someone's permissions
if perms:
setperms = dict(((p, None) for p in perms if p[0] == "@"))
baseperms = dict(((canonicalized_perm(p), None)
for p in perms if p[0] != "@"))
else:
setperms = None
baseperms = None
d = dict()
def storeperm(typechr, inheritchr, arg):
assert typechr in "ugecs"
assert inheritchr in "ld-"
def mkwhokey(t):
return "%c%c$%s" % (t, inheritchr, arg)
if baseperms or not perms:
d[mkwhokey(typechr)] = baseperms
if setperms or not perms:
d[mkwhokey(typechr.upper())] = setperms
def decodeid(w, toidfunc, fmt):
try:
return int(w)
except ValueError:
try:
return toidfunc(w)[2]
except KeyError:
parser.error(fmt % w)
if options.set:
storeperm("s", "-", who)
elif options.create:
storeperm("c", "-", "")
else:
for w in who:
if options.user:
id = decodeid(w, pwd.getpwnam,
_("invalid user %s"))
typechr = "u"
elif options.group:
id = decodeid(w, grp.getgrnam,
_("invalid group %s"))
typechr = "g"
elif w == "everyone":
id = ""
typechr = "e"
else:
try:
id = pwd.getpwnam(w)[2]
typechr = "u"
except KeyError:
try:
id = grp.getgrnam(w)[2]
typechr = "g"
except KeyError:
parser.error(_("invalid user/group %s") % w)
if options.local:
storeperm(typechr, "l", id)
if options.descend:
storeperm(typechr, "d", id)
return d
perms_subcmd = dict(
create=_("Must also have the 'mount' ability"),
destroy=_("Must also have the 'mount' ability"),
snapshot=_("Must also have the 'mount' ability"),
rollback=_("Must also have the 'mount' ability"),
clone=_("""Must also have the 'create' ability and 'mount'
\t\t\t\tability in the origin file system"""),
promote=_("""Must also have the 'mount'
\t\t\t\tand 'promote' ability in the origin file system"""),
rename=_("""Must also have the 'mount' and 'create'
\t\t\t\tability in the new parent"""),
receive=_("Must also have the 'mount' and 'create' ability"),
allow=_("Must also have the permission that is being\n\t\t\t\tallowed"),
mount=_("Allows mount/umount of ZFS datasets"),
share=_("Allows sharing file systems over NFS or SMB\n\t\t\t\tprotocols"),
send="",
)
perms_other = dict(
userprop=_("Allows changing any user property"),
userquota=_("Allows accessing any userquota@... property"),
groupquota=_("Allows accessing any groupquota@... property"),
userused=_("Allows reading any userused@... property"),
groupused=_("Allows reading any groupused@... property"),
)
def hasset(ds, setname):
"""Return True if the given setname (string) is defined for this
ds (Dataset)."""
# It would be nice to cache the result of get_fsacl().
for raw in ds.get_fsacl().values():
for whokey in raw.keys():
if whokey[0].lower() == "s" and whokey[3:] == setname:
return True
return False
def canonicalized_perm(permname):
"""Return the canonical name (string) for this permission (string).
Raises ZFSError if it is not a valid permission."""
if permname in perms_subcmd.keys() or permname in perms_other.keys():
return permname
try:
return zfs.dataset.getpropobj(permname).name
except KeyError:
raise zfs.util.ZFSError(errno.EINVAL, permname,
_("invalid permission"))
def print_perms():
"""Print the set of supported permissions."""
print(_("\nThe following permissions are supported:\n"))
fmt = "%-16s %-14s\t%s"
print(fmt % (_("NAME"), _("TYPE"), _("NOTES")))
for (name, note) in sorted(perms_subcmd.iteritems()):
print(fmt % (name, _("subcommand"), note))
for (name, note) in sorted(perms_other.iteritems()):
print(fmt % (name, _("other"), note))
for (name, prop) in sorted(zfs.dataset.proptable.iteritems()):
if prop.visible and prop.delegatable():
print(fmt % (name, _("property"), ""))
def do_allow():
"""Implementes the "zfs allow" and "zfs unallow" subcommands."""
un = (sys.argv[1] == "unallow")
def usage(msg=None):
parser.print_help()
print_perms()
if msg:
print
parser.exit("zfs: error: " + msg)
else:
parser.exit()
if un:
u = _("""unallow [-rldug] <"everyone"|user|group>[,...]
[<perm|@setname>[,...]] <filesystem|volume>
unallow [-rld] -e [<perm|@setname>[,...]] <filesystem|volume>
unallow [-r] -c [<perm|@setname>[,...]] <filesystem|volume>
unallow [-r] -s @setname [<perm|@setname>[,...]] <filesystem|volume>""")
verb = _("remove")
sstr = _("undefine permission set")
else:
u = _("""allow <filesystem|volume>
allow [-ldug] <"everyone"|user|group>[,...] <perm|@setname>[,...]
<filesystem|volume>
allow [-ld] -e <perm|@setname>[,...] <filesystem|volume>
allow -c <perm|@setname>[,...] <filesystem|volume>
allow -s @setname <perm|@setname>[,...] <filesystem|volume>""")
verb = _("set")
sstr = _("define permission set")
parser = optparse.OptionParser(usage=u, prog="zfs")
parser.add_option("-l", action="store_true", dest="local",
help=_("%s permission locally") % verb)
parser.add_option("-d", action="store_true", dest="descend",
help=_("%s permission for descendents") % verb)
parser.add_option("-u", action="store_true", dest="user",
help=_("%s permission for user") % verb)
parser.add_option("-g", action="store_true", dest="group",
help=_("%s permission for group") % verb)
parser.add_option("-e", action="store_true", dest="everyone",
help=_("%s permission for everyone") % verb)
parser.add_option("-c", action="store_true", dest="create",
help=_("%s create time permissions") % verb)
parser.add_option("-s", action="store_true", dest="set", help=sstr)
if un:
parser.add_option("-r", action="store_true", dest="recursive",
help=_("remove permissions recursively"))
if len(sys.argv) == 3 and not un:
# just print the permissions on this fs
if sys.argv[2] == "-h":
# hack to make "zfs allow -h" work
usage()
ds = zfs.dataset.Dataset(sys.argv[2])
p = dict()
for (fs, raw) in ds.get_fsacl().items():
p[fs] = FSPerms(raw)
for fs in sorted(p.keys(), reverse=True):
s = _("---- Permissions on %s ") % fs
print(s + "-" * (70-len(s)))
print(p[fs])
return
(options, args) = parser.parse_args(sys.argv[2:])
if sum((bool(options.everyone), bool(options.user),
bool(options.group))) > 1:
parser.error(_("-u, -g, and -e are mutually exclusive"))
def mungeargs(expected_len):
if un and len(args) == expected_len-1:
return (None, args[expected_len-2])
elif len(args) == expected_len:
return (args[expected_len-2].split(","),
args[expected_len-1])
else:
usage(_("wrong number of parameters"))
if options.set:
if options.local or options.descend or options.user or \
options.group or options.everyone or options.create:
parser.error(_("invalid option combined with -s"))
if args[0][0] != "@":
parser.error(_("invalid set name: missing '@' prefix"))
(perms, fsname) = mungeargs(3)
who = args[0]
elif options.create:
if options.local or options.descend or options.user or \
options.group or options.everyone or options.set:
parser.error(_("invalid option combined with -c"))
(perms, fsname) = mungeargs(2)
who = None
elif options.everyone:
if options.user or options.group or \
options.create or options.set:
parser.error(_("invalid option combined with -e"))
(perms, fsname) = mungeargs(2)
who = ["everyone"]
else:
(perms, fsname) = mungeargs(3)
who = args[0].split(",")
if not options.local and not options.descend:
options.local = True
options.descend = True
d = args_to_perms(parser, options, who, perms)
ds = zfs.dataset.Dataset(fsname, snaps=False)
if not un and perms:
for p in perms:
if p[0] == "@" and not hasset(ds, p):
parser.error(_("set %s is not defined") % p)
ds.set_fsacl(un, d)
if un and options.recursive:
for child in ds.descendents():
child.set_fsacl(un, d)

View File

@ -0,0 +1,205 @@
#! /usr/bin/python2.4
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
# Copyright 2009 Sun Microsystems, Inc. All rights reserved.
# Use is subject to license terms.
#
"""Implements the Dataset class, providing methods for manipulating ZFS
datasets. Also implements the Property class, which describes ZFS
properties."""
import zfs.ioctl
import zfs.util
import errno
_ = zfs.util._
class Property(object):
"""This class represents a ZFS property. It contains
information about the property -- if it's readonly, a number vs
string vs index, etc. Only native properties are represented by
this class -- not user properties (eg "user:prop") or userspace
properties (eg "userquota@joe")."""
__slots__ = "name", "number", "type", "default", "attr", "validtypes", \
"values", "colname", "rightalign", "visible", "indextable"
__repr__ = zfs.util.default_repr
def __init__(self, t):
"""t is the tuple of information about this property
from zfs.ioctl.get_proptable, which should match the
members of zprop_desc_t (see zfs_prop.h)."""
self.name = t[0]
self.number = t[1]
self.type = t[2]
if self.type == "string":
self.default = t[3]
else:
self.default = t[4]
self.attr = t[5]
self.validtypes = t[6]
self.values = t[7]
self.colname = t[8]
self.rightalign = t[9]
self.visible = t[10]
self.indextable = t[11]
def delegatable(self):
"""Return True if this property can be delegated with
"zfs allow"."""
return self.attr != "readonly"
proptable = dict()
for name, t in zfs.ioctl.get_proptable().iteritems():
proptable[name] = Property(t)
del name, t
def getpropobj(name):
"""Return the Property object that is identified by the given
name string. It can be the full name, or the column name."""
try:
return proptable[name]
except KeyError:
for p in proptable.itervalues():
if p.colname and p.colname.lower() == name:
return p
raise
class Dataset(object):
"""Represents a ZFS dataset (filesystem, snapshot, zvol, clone, etc).
Generally, this class provides interfaces to the C functions in
zfs.ioctl which actually interface with the kernel to manipulate
datasets.
Unless otherwise noted, any method can raise a ZFSError to
indicate failure."""
__slots__ = "name", "__props"
__repr__ = zfs.util.default_repr
def __init__(self, name, props=None,
types=("filesystem", "volume"), snaps=True):
"""Open the named dataset, checking that it exists and
is of the specified type.
name is the string name of this dataset.
props is the property settings dict from zfs.ioctl.next_dataset.
types is an iterable of strings specifying which types
of datasets are permitted. Accepted strings are
"filesystem" and "volume". Defaults to acceptying all
types.
snaps is a boolean specifying if snapshots are acceptable.
Raises a ZFSError if the dataset can't be accessed (eg
doesn't exist) or is not of the specified type.
"""
self.name = name
e = zfs.util.ZFSError(errno.EINVAL,
_("cannot open %s") % name,
_("operation not applicable to datasets of this type"))
if "@" in name and not snaps:
raise e
if not props:
props = zfs.ioctl.dataset_props(name)
self.__props = props
if "volume" not in types and self.getprop("type") == 3:
raise e
if "filesystem" not in types and self.getprop("type") == 2:
raise e
def getprop(self, propname):
"""Return the value of the given property for this dataset.
Currently only works for native properties (those with a
Property object.)
Raises KeyError if propname does not specify a native property.
Does not raise ZFSError.
"""
p = getpropobj(propname)
try:
return self.__props[p.name]["value"]
except KeyError:
return p.default
def parent(self):
"""Return a Dataset representing the parent of this one."""
return Dataset(self.name[:self.name.rindex("/")])
def descendents(self):
"""A generator function which iterates over all
descendent Datasets (not including snapshots."""
cookie = 0
while True:
# next_dataset raises StopIteration when done
(name, cookie, props) = \
zfs.ioctl.next_dataset(self.name, False, cookie)
ds = Dataset(name, props)
yield ds
for child in ds.descendents():
yield child
def userspace(self, prop):
"""A generator function which iterates over a
userspace-type property.
prop specifies which property ("userused@",
"userquota@", "groupused@", or "groupquota@").
returns 3-tuple of domain (string), rid (int), and space (int).
"""
d = zfs.ioctl.userspace_many(self.name, prop)
for ((domain, rid), space) in d.iteritems():
yield (domain, rid, space)
def userspace_upgrade(self):
"""Initialize the accounting information for
userused@... and groupused@... properties."""
return zfs.ioctl.userspace_upgrade(self.name)
def set_fsacl(self, un, d):
"""Add to the "zfs allow"-ed permissions on this Dataset.
un is True if the specified permissions should be removed.
d is a dict specifying which permissions to add/remove:
{ "whostr" -> None # remove all perms for this entity
"whostr" -> { "perm" -> None} # add/remove these perms
} """
return zfs.ioctl.set_fsacl(self.name, un, d)
def get_fsacl(self):
"""Get the "zfs allow"-ed permissions on the Dataset.
Return a dict("whostr": { "perm" -> None })."""
return zfs.ioctl.get_fsacl(self.name)

View File

@ -0,0 +1,29 @@
#! /usr/bin/python2.4
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
# Copyright 2009 Sun Microsystems, Inc. All rights reserved.
# Use is subject to license terms.
#
import zfs.userspace
do_groupspace = zfs.userspace.do_userspace

View File

@ -0,0 +1,610 @@
/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License (the "License").
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/*
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#include <Python.h>
#include <sys/zfs_ioctl.h>
#include <sys/fs/zfs.h>
#include <strings.h>
#include <unistd.h>
#include <libnvpair.h>
#include <idmap.h>
#include <zone.h>
#include <libintl.h>
#include <libzfs.h>
#include "zfs_prop.h"
static PyObject *ZFSError;
static int zfsdevfd;
#ifdef __lint
#define dgettext(x, y) y
#endif
#define _(s) dgettext(TEXT_DOMAIN, s)
#ifdef sun
extern int sid_to_id(char *sid, boolean_t user, uid_t *id);
#endif /* sun */
/*PRINTFLIKE1*/
static void
seterr(char *fmt, ...)
{
char errstr[1024];
va_list v;
va_start(v, fmt);
(void) vsnprintf(errstr, sizeof (errstr), fmt, v);
va_end(v);
PyErr_SetObject(ZFSError, Py_BuildValue("is", errno, errstr));
}
static char cmdstr[HIS_MAX_RECORD_LEN];
static int
ioctl_with_cmdstr(unsigned long ioc, zfs_cmd_t *zc)
{
int err;
if (cmdstr[0])
zc->zc_history = (uint64_t)(uintptr_t)cmdstr;
err = ioctl(zfsdevfd, ioc, zc);
cmdstr[0] = '\0';
return (err);
}
static PyObject *
nvl2py(nvlist_t *nvl)
{
PyObject *pyo;
nvpair_t *nvp;
pyo = PyDict_New();
for (nvp = nvlist_next_nvpair(nvl, NULL); nvp;
nvp = nvlist_next_nvpair(nvl, nvp)) {
PyObject *pyval;
char *sval;
uint64_t ival;
boolean_t bval;
nvlist_t *nval;
switch (nvpair_type(nvp)) {
case DATA_TYPE_STRING:
(void) nvpair_value_string(nvp, &sval);
pyval = Py_BuildValue("s", sval);
break;
case DATA_TYPE_UINT64:
(void) nvpair_value_uint64(nvp, &ival);
pyval = Py_BuildValue("K", ival);
break;
case DATA_TYPE_NVLIST:
(void) nvpair_value_nvlist(nvp, &nval);
pyval = nvl2py(nval);
break;
case DATA_TYPE_BOOLEAN:
Py_INCREF(Py_None);
pyval = Py_None;
break;
case DATA_TYPE_BOOLEAN_VALUE:
(void) nvpair_value_boolean_value(nvp, &bval);
pyval = Py_BuildValue("i", bval);
break;
default:
PyErr_SetNone(PyExc_ValueError);
Py_DECREF(pyo);
return (NULL);
}
PyDict_SetItemString(pyo, nvpair_name(nvp), pyval);
Py_DECREF(pyval);
}
return (pyo);
}
static nvlist_t *
dict2nvl(PyObject *d)
{
nvlist_t *nvl;
int err;
PyObject *key, *value;
// int pos = 0;
Py_ssize_t pos = 0;
if (!PyDict_Check(d)) {
PyErr_SetObject(PyExc_ValueError, d);
return (NULL);
}
err = nvlist_alloc(&nvl, NV_UNIQUE_NAME, 0);
assert(err == 0);
while (PyDict_Next(d, &pos, &key, &value)) {
char *keystr = PyString_AsString(key);
if (keystr == NULL) {
PyErr_SetObject(PyExc_KeyError, key);
nvlist_free(nvl);
return (NULL);
}
if (PyDict_Check(value)) {
nvlist_t *valnvl = dict2nvl(value);
err = nvlist_add_nvlist(nvl, keystr, valnvl);
nvlist_free(valnvl);
} else if (value == Py_None) {
err = nvlist_add_boolean(nvl, keystr);
} else if (PyString_Check(value)) {
char *valstr = PyString_AsString(value);
err = nvlist_add_string(nvl, keystr, valstr);
} else if (PyInt_Check(value)) {
uint64_t valint = PyInt_AsUnsignedLongLongMask(value);
err = nvlist_add_uint64(nvl, keystr, valint);
} else if (PyBool_Check(value)) {
boolean_t valbool = value == Py_True ? B_TRUE : B_FALSE;
err = nvlist_add_boolean_value(nvl, keystr, valbool);
} else {
PyErr_SetObject(PyExc_ValueError, value);
nvlist_free(nvl);
return (NULL);
}
assert(err == 0);
}
return (nvl);
}
static PyObject *
fakepropval(uint64_t value)
{
PyObject *d = PyDict_New();
PyDict_SetItemString(d, "value", Py_BuildValue("K", value));
return (d);
}
static void
add_ds_props(zfs_cmd_t *zc, PyObject *nvl)
{
dmu_objset_stats_t *s = &zc->zc_objset_stats;
PyDict_SetItemString(nvl, "numclones",
fakepropval(s->dds_num_clones));
PyDict_SetItemString(nvl, "issnap",
fakepropval(s->dds_is_snapshot));
PyDict_SetItemString(nvl, "inconsistent",
fakepropval(s->dds_inconsistent));
}
/* On error, returns NULL but does not set python exception. */
static PyObject *
ioctl_with_dstnv(unsigned long ioc, zfs_cmd_t *zc)
{
int nvsz = 2048;
void *nvbuf;
PyObject *pynv = NULL;
again:
nvbuf = malloc(nvsz);
zc->zc_nvlist_dst_size = nvsz;
zc->zc_nvlist_dst = (uintptr_t)nvbuf;
if (ioctl(zfsdevfd, ioc, zc) == 0) {
nvlist_t *nvl;
errno = nvlist_unpack(nvbuf, zc->zc_nvlist_dst_size, &nvl, 0);
if (errno == 0) {
pynv = nvl2py(nvl);
nvlist_free(nvl);
}
} else if (errno == ENOMEM) {
free(nvbuf);
nvsz = zc->zc_nvlist_dst_size;
goto again;
}
free(nvbuf);
return (pynv);
}
static PyObject *
py_next_dataset(PyObject *self, PyObject *args)
{
unsigned long ioc;
uint64_t cookie;
zfs_cmd_t zc = { 0 };
int snaps;
char *name;
PyObject *nvl;
PyObject *ret = NULL;
if (!PyArg_ParseTuple(args, "siK", &name, &snaps, &cookie))
return (NULL);
(void) strlcpy(zc.zc_name, name, sizeof (zc.zc_name));
zc.zc_cookie = cookie;
if (snaps)
ioc = ZFS_IOC_SNAPSHOT_LIST_NEXT;
else
ioc = ZFS_IOC_DATASET_LIST_NEXT;
nvl = ioctl_with_dstnv(ioc, &zc);
if (nvl) {
add_ds_props(&zc, nvl);
ret = Py_BuildValue("sKO", zc.zc_name, zc.zc_cookie, nvl);
Py_DECREF(nvl);
} else if (errno == ESRCH) {
PyErr_SetNone(PyExc_StopIteration);
} else {
if (snaps)
seterr(_("cannot get snapshots of %s"), name);
else
seterr(_("cannot get child datasets of %s"), name);
}
return (ret);
}
static PyObject *
py_dataset_props(PyObject *self, PyObject *args)
{
zfs_cmd_t zc = { 0 };
int snaps;
char *name;
PyObject *nvl;
if (!PyArg_ParseTuple(args, "s", &name))
return (NULL);
(void) strlcpy(zc.zc_name, name, sizeof (zc.zc_name));
nvl = ioctl_with_dstnv(ZFS_IOC_OBJSET_STATS, &zc);
if (nvl) {
add_ds_props(&zc, nvl);
} else {
seterr(_("cannot access dataset %s"), name);
}
return (nvl);
}
static PyObject *
py_get_fsacl(PyObject *self, PyObject *args)
{
zfs_cmd_t zc = { 0 };
char *name;
PyObject *nvl;
if (!PyArg_ParseTuple(args, "s", &name))
return (NULL);
(void) strlcpy(zc.zc_name, name, sizeof (zc.zc_name));
nvl = ioctl_with_dstnv(ZFS_IOC_GET_FSACL, &zc);
if (nvl == NULL)
seterr(_("cannot get permissions on %s"), name);
return (nvl);
}
static PyObject *
py_set_fsacl(PyObject *self, PyObject *args)
{
int un;
size_t nvsz;
zfs_cmd_t zc = { 0 };
char *name, *nvbuf;
PyObject *dict, *file;
nvlist_t *nvl;
int err;
if (!PyArg_ParseTuple(args, "siO!", &name, &un,
&PyDict_Type, &dict))
return (NULL);
nvl = dict2nvl(dict);
if (nvl == NULL)
return (NULL);
err = nvlist_size(nvl, &nvsz, NV_ENCODE_NATIVE);
assert(err == 0);
nvbuf = malloc(nvsz);
err = nvlist_pack(nvl, &nvbuf, &nvsz, NV_ENCODE_NATIVE, 0);
assert(err == 0);
(void) strlcpy(zc.zc_name, name, sizeof (zc.zc_name));
zc.zc_nvlist_src_size = nvsz;
zc.zc_nvlist_src = (uintptr_t)nvbuf;
zc.zc_perm_action = un;
err = ioctl_with_cmdstr(ZFS_IOC_SET_FSACL, &zc);
free(nvbuf);
if (err) {
seterr(_("cannot set permissions on %s"), name);
return (NULL);
}
Py_RETURN_NONE;
}
static PyObject *
py_userspace_many(PyObject *self, PyObject *args)
{
zfs_cmd_t zc = { 0 };
zfs_userquota_prop_t type;
char *name, *propname;
int bufsz = 1<<20;
void *buf;
PyObject *dict, *file;
int error;
if (!PyArg_ParseTuple(args, "ss", &name, &propname))
return (NULL);
for (type = 0; type < ZFS_NUM_USERQUOTA_PROPS; type++)
if (strcmp(propname, zfs_userquota_prop_prefixes[type]) == 0)
break;
if (type == ZFS_NUM_USERQUOTA_PROPS) {
PyErr_SetString(PyExc_KeyError, propname);
return (NULL);
}
dict = PyDict_New();
buf = malloc(bufsz);
(void) strlcpy(zc.zc_name, name, sizeof (zc.zc_name));
zc.zc_objset_type = type;
zc.zc_cookie = 0;
while (1) {
zfs_useracct_t *zua = buf;
zc.zc_nvlist_dst = (uintptr_t)buf;
zc.zc_nvlist_dst_size = bufsz;
error = ioctl(zfsdevfd, ZFS_IOC_USERSPACE_MANY, &zc);
if (error || zc.zc_nvlist_dst_size == 0)
break;
while (zc.zc_nvlist_dst_size > 0) {
PyObject *pykey, *pyval;
pykey = Py_BuildValue("sI",
zua->zu_domain, zua->zu_rid);
pyval = Py_BuildValue("K", zua->zu_space);
PyDict_SetItem(dict, pykey, pyval);
Py_DECREF(pykey);
Py_DECREF(pyval);
zua++;
zc.zc_nvlist_dst_size -= sizeof (zfs_useracct_t);
}
}
free(buf);
if (error != 0) {
Py_DECREF(dict);
seterr(_("cannot get %s property on %s"), propname, name);
return (NULL);
}
return (dict);
}
static PyObject *
py_userspace_upgrade(PyObject *self, PyObject *args)
{
zfs_cmd_t zc = { 0 };
char *name;
int error;
if (!PyArg_ParseTuple(args, "s", &name))
return (NULL);
(void) strlcpy(zc.zc_name, name, sizeof (zc.zc_name));
error = ioctl(zfsdevfd, ZFS_IOC_USERSPACE_UPGRADE, &zc);
if (error != 0) {
seterr(_("cannot initialize user accounting information on %s"),
name);
return (NULL);
}
Py_RETURN_NONE;
}
static PyObject *
py_sid_to_id(PyObject *self, PyObject *args)
{
#ifdef sun
char *sid;
int err, isuser;
uid_t id;
if (!PyArg_ParseTuple(args, "si", &sid, &isuser))
return (NULL);
err = sid_to_id(sid, isuser, &id);
if (err) {
PyErr_SetString(PyExc_KeyError, sid);
return (NULL);
}
return (Py_BuildValue("I", id));
#else /* sun */
return (NULL);
#endif /* sun */
}
/*
* Translate the sid string ("S-1-...") to the user@domain name, if
* possible. There should be a better way to do this, but for now we
* just translate to the (possibly ephemeral) uid and then back again.
*/
static PyObject *
py_sid_to_name(PyObject *self, PyObject *args)
{
#ifdef sun
char *sid;
int err, isuser;
uid_t id;
char *name, *domain;
char buf[256];
if (!PyArg_ParseTuple(args, "si", &sid, &isuser))
return (NULL);
err = sid_to_id(sid, isuser, &id);
if (err) {
PyErr_SetString(PyExc_KeyError, sid);
return (NULL);
}
if (isuser) {
err = idmap_getwinnamebyuid(id,
IDMAP_REQ_FLG_USE_CACHE, &name, &domain);
} else {
err = idmap_getwinnamebygid(id,
IDMAP_REQ_FLG_USE_CACHE, &name, &domain);
}
if (err != IDMAP_SUCCESS) {
PyErr_SetString(PyExc_KeyError, sid);
return (NULL);
}
(void) snprintf(buf, sizeof (buf), "%s@%s", name, domain);
free(name);
free(domain);
return (Py_BuildValue("s", buf));
#else /* sun */
return(NULL);
#endif /* sun */
}
static PyObject *
py_isglobalzone(PyObject *self, PyObject *args)
{
return (Py_BuildValue("i", getzoneid() == GLOBAL_ZONEID));
}
static PyObject *
py_set_cmdstr(PyObject *self, PyObject *args)
{
char *str;
if (!PyArg_ParseTuple(args, "s", &str))
return (NULL);
(void) strlcpy(cmdstr, str, sizeof (cmdstr));
Py_RETURN_NONE;
}
static PyObject *
py_get_proptable(PyObject *self, PyObject *args)
{
zprop_desc_t *t = zfs_prop_get_table();
PyObject *d = PyDict_New();
zfs_prop_t i;
for (i = 0; i < ZFS_NUM_PROPS; i++) {
zprop_desc_t *p = &t[i];
PyObject *tuple;
static const char *typetable[] =
{"number", "string", "index"};
static const char *attrtable[] =
{"default", "readonly", "inherit", "onetime"};
PyObject *indextable;
if (p->pd_proptype == PROP_TYPE_INDEX) {
const zprop_index_t *it = p->pd_table;
indextable = PyDict_New();
int j;
for (j = 0; it[j].pi_name; j++) {
PyDict_SetItemString(indextable,
it[j].pi_name,
Py_BuildValue("K", it[j].pi_value));
}
} else {
Py_INCREF(Py_None);
indextable = Py_None;
}
tuple = Py_BuildValue("sissKsissiiO",
p->pd_name, p->pd_propnum, typetable[p->pd_proptype],
p->pd_strdefault, p->pd_numdefault,
attrtable[p->pd_attr], p->pd_types,
p->pd_values, p->pd_colname,
p->pd_rightalign, p->pd_visible, indextable);
PyDict_SetItemString(d, p->pd_name, tuple);
Py_DECREF(tuple);
}
return (d);
}
static PyMethodDef zfsmethods[] = {
{"next_dataset", py_next_dataset, METH_VARARGS,
"Get next child dataset or snapshot."},
{"get_fsacl", py_get_fsacl, METH_VARARGS, "Get allowed permissions."},
{"set_fsacl", py_set_fsacl, METH_VARARGS, "Set allowed permissions."},
{"userspace_many", py_userspace_many, METH_VARARGS,
"Get user space accounting."},
{"userspace_upgrade", py_userspace_upgrade, METH_VARARGS,
"Upgrade fs to enable user space accounting."},
{"set_cmdstr", py_set_cmdstr, METH_VARARGS,
"Set command string for history logging."},
{"dataset_props", py_dataset_props, METH_VARARGS,
"Get dataset properties."},
{"get_proptable", py_get_proptable, METH_NOARGS,
"Get property table."},
/* Below are not really zfs-specific: */
{"sid_to_id", py_sid_to_id, METH_VARARGS, "Map SID to UID/GID."},
{"sid_to_name", py_sid_to_name, METH_VARARGS,
"Map SID to name@domain."},
{"isglobalzone", py_isglobalzone, METH_NOARGS,
"Determine if this is the global zone."},
{NULL, NULL, 0, NULL}
};
void
initioctl(void)
{
PyObject *zfs_ioctl = Py_InitModule("zfs.ioctl", zfsmethods);
PyObject *zfs_util = PyImport_ImportModule("zfs.util");
PyObject *devfile;
if (zfs_util == NULL)
return;
ZFSError = PyObject_GetAttrString(zfs_util, "ZFSError");
devfile = PyObject_GetAttrString(zfs_util, "dev");
zfsdevfd = PyObject_AsFileDescriptor(devfile);
zfs_prop_init();
}

View File

@ -0,0 +1,28 @@
#! /usr/bin/python2.4
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
# Copyright 2009 Sun Microsystems, Inc. All rights reserved.
# Use is subject to license terms.
#
import zfs.allow
do_unallow = zfs.allow.do_allow

View File

@ -0,0 +1,277 @@
#! /usr/bin/python2.4
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
# Copyright 2009 Sun Microsystems, Inc. All rights reserved.
# Use is subject to license terms.
#
"""This module implements the "zfs userspace" and "zfs groupspace" subcommands.
The only public interface is the zfs.userspace.do_userspace() function."""
import zfs.util
import zfs.ioctl
import zfs.dataset
import optparse
import sys
import pwd
import grp
import errno
_ = zfs.util._
# map from property name prefix -> (field name, isgroup)
props = {
"userused@": ("used", False),
"userquota@": ("quota", False),
"groupused@": ("used", True),
"groupquota@": ("quota", True),
}
def skiptype(options, prop):
"""Return True if this property (eg "userquota@") should be skipped."""
(field, isgroup) = props[prop]
if field not in options.fields:
return True
if isgroup and "posixgroup" not in options.types and \
"smbgroup" not in options.types:
return True
if not isgroup and "posixuser" not in options.types and \
"smbuser" not in options.types:
return True
return False
def updatemax(d, k, v):
d[k] = max(d.get(k, None), v)
def new_entry(options, isgroup, domain, rid):
"""Return a dict("field": value) for this domain (string) + rid (int)"""
if domain:
idstr = "%s-%u" % (domain, rid)
else:
idstr = "%u" % rid
(typename, mapfunc) = {
(1, 1): ("SMB Group", lambda id: zfs.ioctl.sid_to_name(id, 0)),
(1, 0): ("POSIX Group", lambda id: grp.getgrgid(int(id)).gr_name),
(0, 1): ("SMB User", lambda id: zfs.ioctl.sid_to_name(id, 1)),
(0, 0): ("POSIX User", lambda id: pwd.getpwuid(int(id)).pw_name)
}[isgroup, bool(domain)]
if typename.lower().replace(" ", "") not in options.types:
return None
v = dict()
v["type"] = typename
# python's getpwuid/getgrgid is confused by ephemeral uids
if not options.noname and rid < 1<<31:
try:
v["name"] = mapfunc(idstr)
except KeyError:
pass
if "name" not in v:
v["name"] = idstr
if not domain:
# it's just a number, so pad it with spaces so
# that it will sort numerically
v["name.sort"] = "%20d" % rid
# fill in default values
v["used"] = "0"
v["used.sort"] = 0
v["quota"] = "none"
v["quota.sort"] = 0
return v
def process_one_raw(acct, maxfieldlen, options, prop, elem):
"""Update the acct and maxfieldlen dicts to incorporate the
information from this elem from Dataset.userspace(prop)."""
(domain, rid, value) = elem
(field, isgroup) = props[prop]
if options.translate and domain:
try:
rid = zfs.ioctl.sid_to_id("%s-%u" % (domain, rid),
not isgroup)
domain = None
except KeyError:
pass;
key = (isgroup, domain, rid)
try:
v = acct[key]
except KeyError:
v = new_entry(options, isgroup, domain, rid)
if not v:
return
acct[key] = v
# Add our value to an existing value, which may be present if
# options.translate is set.
value = v[field + ".sort"] = value + v[field + ".sort"]
if options.parsable:
v[field] = str(value)
else:
v[field] = zfs.util.nicenum(value)
for k in v.keys():
# some of the .sort fields are integers, so have no len()
if isinstance(v[k], str):
updatemax(maxfieldlen, k, len(v[k]))
def do_userspace():
"""Implements the "zfs userspace" and "zfs groupspace" subcommands."""
def usage(msg=None):
parser.print_help()
if msg:
print
parser.exit("zfs: error: " + msg)
else:
parser.exit()
if sys.argv[1] == "userspace":
defaulttypes = "posixuser,smbuser"
else:
defaulttypes = "posixgroup,smbgroup"
fields = ("type", "name", "used", "quota")
ljustfields = ("type", "name")
types = ("all", "posixuser", "smbuser", "posixgroup", "smbgroup")
u = _("%s [-niHp] [-o field[,...]] [-sS field] ... \n") % sys.argv[1]
u += _(" [-t type[,...]] <filesystem|snapshot>")
parser = optparse.OptionParser(usage=u, prog="zfs")
parser.add_option("-n", action="store_true", dest="noname",
help=_("Print numeric ID instead of user/group name"))
parser.add_option("-i", action="store_true", dest="translate",
help=_("translate SID to posix (possibly ephemeral) ID"))
parser.add_option("-H", action="store_true", dest="noheaders",
help=_("no headers, tab delimited output"))
parser.add_option("-p", action="store_true", dest="parsable",
help=_("exact (parsable) numeric output"))
parser.add_option("-o", dest="fields", metavar="field[,...]",
default="type,name,used,quota",
help=_("print only these fields (eg type,name,used,quota)"))
parser.add_option("-s", dest="sortfields", metavar="field",
type="choice", choices=fields, default=list(),
action="callback", callback=zfs.util.append_with_opt,
help=_("sort field"))
parser.add_option("-S", dest="sortfields", metavar="field",
type="choice", choices=fields, #-s sets the default
action="callback", callback=zfs.util.append_with_opt,
help=_("reverse sort field"))
parser.add_option("-t", dest="types", metavar="type[,...]",
default=defaulttypes,
help=_("print only these types (eg posixuser,smbuser,posixgroup,smbgroup,all)"))
(options, args) = parser.parse_args(sys.argv[2:])
if len(args) != 1:
usage(_("wrong number of arguments"))
dsname = args[0]
options.fields = options.fields.split(",")
for f in options.fields:
if f not in fields:
usage(_("invalid field %s") % f)
options.types = options.types.split(",")
for t in options.types:
if t not in types:
usage(_("invalid type %s") % t)
if not options.sortfields:
options.sortfields = [("-s", "type"), ("-s", "name")]
if "all" in options.types:
options.types = types[1:]
ds = zfs.dataset.Dataset(dsname, types=("filesystem"))
if ds.getprop("jailed") and zfs.ioctl.isglobalzone():
options.noname = True
if not ds.getprop("useraccounting"):
print(_("Initializing accounting information on old filesystem, please wait..."))
ds.userspace_upgrade()
acct = dict()
maxfieldlen = dict()
# gather and process accounting information
for prop in props.keys():
if skiptype(options, prop):
continue;
for elem in ds.userspace(prop):
process_one_raw(acct, maxfieldlen, options, prop, elem)
# print out headers
if not options.noheaders:
line = str()
for field in options.fields:
# make sure the field header will fit
updatemax(maxfieldlen, field, len(field))
if field in ljustfields:
fmt = "%-*s "
else:
fmt = "%*s "
line += fmt % (maxfieldlen[field], field.upper())
print(line)
# custom sorting func
def cmpkey(val):
l = list()
for (opt, field) in options.sortfields:
try:
n = val[field + ".sort"]
except KeyError:
n = val[field]
if opt == "-S":
# reverse sorting
try:
n = -n
except TypeError:
# it's a string; decompose it
# into an array of integers,
# each one the negative of that
# character
n = [-ord(c) for c in n]
l.append(n)
return l
# print out data lines
for val in sorted(acct.itervalues(), key=cmpkey):
line = str()
for field in options.fields:
if options.noheaders:
line += val[field]
line += "\t"
else:
if field in ljustfields:
fmt = "%-*s "
else:
fmt = "%*s "
line += fmt % (maxfieldlen[field], val[field])
print(line)

View File

@ -0,0 +1,138 @@
#! /usr/bin/python2.4
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
# Copyright 2009 Sun Microsystems, Inc. All rights reserved.
# Use is subject to license terms.
#
"""This module provides utility functions for ZFS.
zfs.util.dev -- a file object of /dev/zfs """
import gettext
import errno
import os
# Note: this module (zfs.util) should not import zfs.ioctl, because that
# would introduce a circular dependency
errno.ECANCELED = 47
errno.ENOTSUP = 48
dev = open("/dev/zfs", "w")
_ = gettext.translation("SUNW_OST_OSLIB", "/usr/lib/locale",
fallback=True).gettext
def default_repr(self):
"""A simple __repr__ function."""
if self.__slots__:
str = "<" + self.__class__.__name__
for v in self.__slots__:
str += " %s: %r" % (v, getattr(self, v))
return str + ">"
else:
return "<%s %s>" % \
(self.__class__.__name__, repr(self.__dict__))
class ZFSError(StandardError):
"""This exception class represents a potentially user-visible
ZFS error. If uncaught, it will be printed and the process will
exit with exit code 1.
errno -- the error number (eg, from ioctl(2))."""
__slots__ = "why", "task", "errno"
__repr__ = default_repr
def __init__(self, eno, task=None, why=None):
"""Create a ZFS exception.
eno -- the error number (errno)
task -- a string describing the task that failed
why -- a string describing why it failed (defaults to
strerror(eno))"""
self.errno = eno
self.task = task
self.why = why
def __str__(self):
s = ""
if self.task:
s += self.task + ": "
if self.why:
s += self.why
else:
s += self.strerror
return s
__strs = {
errno.EPERM: _("permission denied"),
errno.ECANCELED:
_("delegated administration is disabled on pool"),
errno.EINTR: _("signal received"),
errno.EIO: _("I/O error"),
errno.ENOENT: _("dataset does not exist"),
errno.ENOSPC: _("out of space"),
errno.EEXIST: _("dataset already exists"),
errno.EBUSY: _("dataset is busy"),
errno.EROFS:
_("snapshot permissions cannot be modified"),
errno.ENAMETOOLONG: _("dataset name is too long"),
errno.ENOTSUP: _("unsupported version"),
errno.EAGAIN: _("pool I/O is currently suspended"),
}
__strs[errno.EACCES] = __strs[errno.EPERM]
__strs[errno.ENXIO] = __strs[errno.EIO]
__strs[errno.ENODEV] = __strs[errno.EIO]
__strs[errno.EDQUOT] = __strs[errno.ENOSPC]
@property
def strerror(self):
return ZFSError.__strs.get(self.errno, os.strerror(self.errno))
def nicenum(num):
"""Return a nice string (eg "1.23M") for this integer."""
index = 0;
n = num;
while n >= 1024:
n /= 1024
index += 1
u = " KMGTPE"[index]
if index == 0:
return "%u" % n;
elif n >= 100 or num & ((1024*index)-1) == 0:
# it's an exact multiple of its index, or it wouldn't
# fit as floating point, so print as an integer
return "%u%c" % (n, u)
else:
# due to rounding, it's tricky to tell what precision to
# use; try each precision and see which one fits
for i in (2, 1, 0):
s = "%.*f%c" % (i, float(num) / (1<<(10*index)), u)
if len(s) <= 5:
return s
def append_with_opt(option, opt, value, parser):
"""A function for OptionParser which appends a tuple (opt, value)."""
getattr(parser.values, option.dest).append((opt, value))

View File

@ -49,7 +49,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2007 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -317,8 +317,9 @@ typedef struct zio_block_tail {
zio_cksum_t zbt_cksum; /* 256-bit checksum */
} zio_block_tail_t;
#define VDEV_SKIP_SIZE (8 << 10)
#define VDEV_BOOT_HEADER_SIZE (8 << 10)
#define VDEV_PAD_SIZE (8 << 10)
/* 2 padding areas (vl_pad1 and vl_pad2) to skip */
#define VDEV_SKIP_SIZE VDEV_PAD_SIZE * 2
#define VDEV_PHYS_SIZE (112 << 10)
#define VDEV_UBERBLOCK_RING (128 << 10)
@ -330,26 +331,14 @@ typedef struct zio_block_tail {
offsetof(vdev_label_t, vl_uberblock[(n) << VDEV_UBERBLOCK_SHIFT(vd)])
#define VDEV_UBERBLOCK_SIZE(vd) (1ULL << VDEV_UBERBLOCK_SHIFT(vd))
/* ZFS boot block */
#define VDEV_BOOT_MAGIC 0x2f5b007b10cULL
#define VDEV_BOOT_VERSION 1 /* version number */
typedef struct vdev_boot_header {
uint64_t vb_magic; /* VDEV_BOOT_MAGIC */
uint64_t vb_version; /* VDEV_BOOT_VERSION */
uint64_t vb_offset; /* start offset (bytes) */
uint64_t vb_size; /* size (bytes) */
char vb_pad[VDEV_BOOT_HEADER_SIZE - 4 * sizeof (uint64_t)];
} vdev_boot_header_t;
typedef struct vdev_phys {
char vp_nvlist[VDEV_PHYS_SIZE - sizeof (zio_block_tail_t)];
zio_block_tail_t vp_zbt;
} vdev_phys_t;
typedef struct vdev_label {
char vl_pad[VDEV_SKIP_SIZE]; /* 8K */
vdev_boot_header_t vl_boot_header; /* 8K */
char vl_pad1[VDEV_PAD_SIZE]; /* 8K */
char vl_pad2[VDEV_PAD_SIZE]; /* 8K */
vdev_phys_t vl_vdev_phys; /* 112K */
char vl_uberblock[VDEV_UBERBLOCK_RING]; /* 128K */
} vdev_label_t; /* 256K total */
@ -480,13 +469,14 @@ typedef enum {
#define SPA_VERSION_12 12ULL
#define SPA_VERSION_13 13ULL
#define SPA_VERSION_14 14ULL
#define SPA_VERSION_15 15ULL
/*
* When bumping up SPA_VERSION, make sure GRUB ZFS understand the on-disk
* format change. Go to usr/src/grub/grub-0.95/stage2/{zfs-include/, fsys_zfs*},
* and do the appropriate changes.
*/
#define SPA_VERSION SPA_VERSION_14
#define SPA_VERSION_STRING "14"
#define SPA_VERSION SPA_VERSION_15
#define SPA_VERSION_STRING "15"
/*
* Symbolic names for the changes that caused a SPA_VERSION switch.
@ -522,6 +512,7 @@ typedef enum {
#define SPA_VERSION_SNAP_PROPS SPA_VERSION_12
#define SPA_VERSION_USED_BREAKDOWN SPA_VERSION_13
#define SPA_VERSION_PASSTHROUGH_X SPA_VERSION_14
#define SPA_VERSION_USERSPACE SPA_VERSION_15
/*
* The following are configuration names used in the nvlist describing a pool's
@ -799,8 +790,11 @@ typedef struct objset_phys {
dnode_phys_t os_meta_dnode;
zil_header_t os_zil_header;
uint64_t os_type;
char os_pad[1024 - sizeof (dnode_phys_t) - sizeof (zil_header_t) -
sizeof (uint64_t)];
uint64_t os_flags;
char os_pad[2048 - sizeof (dnode_phys_t)*3 -
sizeof (zil_header_t) - sizeof (uint64_t)*2];
dnode_phys_t os_userused_dnode;
dnode_phys_t os_groupused_dnode;
} objset_phys_t;
typedef struct dsl_dir_phys {

View File

@ -239,9 +239,8 @@ secpolicy_vnode_create_gid(struct ucred *cred)
}
int
secpolicy_vnode_setids_setgids(struct vnode *vp, struct ucred *cred, gid_t gid)
secpolicy_vnode_setids_setgids(vnode_t *vp, struct ucred *cred, gid_t gid)
{
if (groupmember(gid, cred))
return (0);
if (secpolicy_fs_owner(vp->v_mount, cred) == 0)
@ -366,3 +365,10 @@ secpolicy_xvattr(struct vnode *vp, xvattr_t *xvap, uid_t owner, cred_t *cr,
return (0);
return (priv_check_cred(cr, PRIV_VFS_SYSFLAGS, 0));
}
int
secpolicy_smb(cred_t *cr)
{
return (priv_check_cred(cr, PRIV_NETSMB, 0));
}

View File

@ -0,0 +1,112 @@
/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License (the "License").
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/*
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
/* Copyright (c) 1984, 1986, 1987, 1988, 1989 AT&T */
/* All Rights Reserved */
/*
* University Copyright- Copyright (c) 1982, 1986, 1988
* The Regents of the University of California
* All Rights Reserved
*
* University Acknowledgment- Portions of this document are derived from
* software developed by the University of California, Berkeley, and its
* contributors.
*/
/*
* $FreeBSD$
*/
#include <sys/types.h>
#include <sys/uio.h>
/*
* same as uiomove() but doesn't modify uio structure.
* return in cbytes how many bytes were copied.
*/
int
uiocopy(void *p, size_t n, enum uio_rw rw, struct uio *uio, size_t *cbytes)
{
struct iovec *iov;
ulong_t cnt;
int error, iovcnt;
iovcnt = uio->uio_iovcnt;
*cbytes = 0;
for (iov = uio->uio_iov; n > 0 && iovcnt > 0; iov++, iovcnt--) {
cnt = MIN(iov->iov_len, n);
if (cnt == 0)
continue;
switch (uio->uio_segflg) {
case UIO_USERSPACE:
if (rw == UIO_READ)
error = copyout(p, iov->iov_base, cnt);
else
error = copyin(iov->iov_base, p, cnt);
if (error)
return (error);
break;
case UIO_SYSSPACE:
if (uio->uio_rw == UIO_READ)
bcopy(p, iov->iov_base, cnt);
else
bcopy(iov->iov_base, p, cnt);
break;
}
p = (caddr_t)p + cnt;
n -= cnt;
*cbytes += cnt;
}
return (0);
}
/*
* Drop the next n chars out of *uiop.
*/
void
uioskip(uio_t *uiop, size_t n)
{
if (n > uiop->uio_resid)
return;
while (n != 0) {
register iovec_t *iovp = uiop->uio_iov;
register size_t niovb = MIN(iovp->iov_len, n);
if (niovb == 0) {
uiop->uio_iov++;
uiop->uio_iovcnt--;
continue;
}
iovp->iov_base += niovb;
uiop->uio_loffset += niovb;
iovp->iov_len -= niovb;
uiop->uio_resid -= niovb;
n -= niovb;
}
}

View File

@ -43,10 +43,13 @@
#define _FIO_SEEK_DATA FIOSEEKDATA
#define _FIO_SEEK_HOLE FIOSEEKHOLE
#ifdef _KERNEL
struct opensolaris_utsname {
char *nodename;
};
extern char hw_serial[11];
extern struct opensolaris_utsname utsname;
#endif
#endif /* _OPENSOLARIS_SYS_MISC_H_ */

View File

@ -72,6 +72,7 @@ int secpolicy_fs_mount(cred_t *cr, vnode_t *mvp, struct mount *vfsp);
void secpolicy_fs_mount_clearopts(cred_t *cr, struct mount *vfsp);
int secpolicy_xvattr(struct vnode *vp, xvattr_t *xvap, uid_t owner,
cred_t *cr, vtype_t vtype);
int secpolicy_smb(cred_t *cr);
#endif /* _KERNEL */

View File

@ -51,4 +51,11 @@ ksiddomain_rele(ksiddomain_t *kd)
kmem_free(kd, sizeof(*kd));
}
static __inline int
ksid_getid(void *ksid)
{
panic("%s has been unexpectedly called", __func__);
}
#endif /* _OPENSOLARIS_SYS_SID_H_ */

View File

@ -1,5 +1,5 @@
/*-
* Copyright (c) 2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>
* Copyright (c) 2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -60,6 +60,9 @@ zfs_uiomove(void *cp, size_t n, enum uio_rw dir, uio_t *uio)
return (uiomove(cp, (int)n, uio));
}
#define uiomove(cp, n, dir, uio) zfs_uiomove((cp), (n), (dir), (uio))
int uiocopy(void *p, size_t n, enum uio_rw rw, struct uio *uio, size_t *cbytes);
void uioskip(uio_t *uiop, size_t n);
#endif /* BUILDING_ZFS */
#endif /* !_OPENSOLARIS_SYS_UIO_H_ */

View File

@ -49,6 +49,7 @@ enum symfollow { NO_FOLLOW = NOFOLLOW };
#include <sys/syscallsubr.h>
typedef struct vop_vector vnodeops_t;
#define VOP_FID VOP_VPTOFH
#define vop_fid vop_vptofh
#define vop_fid_args vop_vptofh_args
#define a_fid a_fhp

View File

@ -19,13 +19,10 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#pragma ident "%Z%%M% %I% %E% SMI"
#if defined(_KERNEL)
#include <sys/systm.h>
#include <sys/sunddi.h>
@ -66,6 +63,10 @@ zfs_deleg_perm_tab_t zfs_deleg_perm_tab[] = {
{ZFS_DELEG_PERM_SHARE, ZFS_DELEG_NOTE_SHARE },
{ZFS_DELEG_PERM_SEND, ZFS_DELEG_NOTE_NONE },
{ZFS_DELEG_PERM_USERPROP, ZFS_DELEG_NOTE_USERPROP },
{ZFS_DELEG_PERM_USERQUOTA, ZFS_DELEG_NOTE_USERQUOTA },
{ZFS_DELEG_PERM_GROUPQUOTA, ZFS_DELEG_NOTE_GROUPQUOTA },
{ZFS_DELEG_PERM_USERUSED, ZFS_DELEG_NOTE_USERUSED },
{ZFS_DELEG_PERM_GROUPUSED, ZFS_DELEG_NOTE_GROUPUSED },
{NULL, ZFS_DELEG_NOTE_NONE }
};

View File

@ -19,15 +19,13 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#ifndef _ZFS_DELEG_H
#define _ZFS_DELEG_H
#pragma ident "%Z%%M% %I% %E% SMI"
#include <sys/fs/zfs.h>
#ifdef __cplusplus
@ -59,6 +57,10 @@ typedef enum {
ZFS_DELEG_NOTE_USERPROP,
ZFS_DELEG_NOTE_MOUNT,
ZFS_DELEG_NOTE_SHARE,
ZFS_DELEG_NOTE_USERQUOTA,
ZFS_DELEG_NOTE_GROUPQUOTA,
ZFS_DELEG_NOTE_USERUSED,
ZFS_DELEG_NOTE_GROUPUSED,
ZFS_DELEG_NOTE_NONE
} zfs_deleg_note_t;

View File

@ -19,12 +19,10 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#pragma ident "%Z%%M% %I% %E% SMI"
/*
* Common name validation routines for ZFS. These routines are shared by the
* userland code as well as the ioctl() layer to ensure that we don't
@ -345,19 +343,3 @@ pool_namecheck(const char *pool, namecheck_err_t *why, char *what)
return (0);
}
/*
* Check if the dataset name is private for internal usage.
* '$' is reserved for internal dataset names. e.g. "$MOS"
*
* Return 1 if the given name is used internally.
* Return 0 if it is not.
*/
int
dataset_name_hidden(const char *name)
{
if (strchr(name, '$') != NULL)
return (1);
return (0);
}

View File

@ -19,15 +19,13 @@
* CDDL HEADER END
*/
/*
* Copyright 2007 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#ifndef _ZFS_NAMECHECK_H
#define _ZFS_NAMECHECK_H
#pragma ident "%Z%%M% %I% %E% SMI"
#ifdef __cplusplus
extern "C" {
#endif
@ -50,7 +48,6 @@ typedef enum {
int pool_namecheck(const char *, namecheck_err_t *, char *);
int dataset_namecheck(const char *, namecheck_err_t *, char *);
int mountpoint_namecheck(const char *, namecheck_err_t *);
int dataset_name_hidden(const char *);
int snapshot_namecheck(const char *, namecheck_err_t *, char *);
int permset_namecheck(const char *, namecheck_err_t *, char *);

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -43,6 +43,14 @@
static zprop_desc_t zfs_prop_table[ZFS_NUM_PROPS];
/* Note this is indexed by zfs_userquota_prop_t, keep the order the same */
const char *zfs_userquota_prop_prefixes[] = {
"userused@",
"userquota@",
"groupused@",
"groupquota@"
};
zprop_desc_t *
zfs_prop_get_table(void)
{
@ -133,6 +141,7 @@ zfs_prop_init(void)
{ "1", 1 },
{ "2", 2 },
{ "3", 3 },
{ "4", 4 },
{ "current", ZPL_VERSION },
{ NULL }
};
@ -218,7 +227,7 @@ zfs_prop_init(void)
/* default index properties */
register_index(ZFS_PROP_VERSION, "version", 0, PROP_DEFAULT,
ZFS_TYPE_FILESYSTEM | ZFS_TYPE_SNAPSHOT,
"1 | 2 | 3 | current", "VERSION", version_table);
"1 | 2 | 3 | 4 | current", "VERSION", version_table);
register_index(ZFS_PROP_CANMOUNT, "canmount", ZFS_CANMOUNT_ON,
PROP_DEFAULT, ZFS_TYPE_FILESYSTEM, "on | off | noauto",
"CANMOUNT", canmount_table);
@ -307,6 +316,8 @@ zfs_prop_init(void)
PROP_INHERIT, ZFS_TYPE_VOLUME, "ISCSIOPTIONS");
register_hidden(ZFS_PROP_GUID, "guid", PROP_TYPE_NUMBER, PROP_READONLY,
ZFS_TYPE_DATASET, "GUID");
register_hidden(ZFS_PROP_USERACCOUNTING, "useraccounting",
PROP_TYPE_NUMBER, PROP_READONLY, ZFS_TYPE_DATASET, NULL);
/* oddball properties */
register_impl(ZFS_PROP_CREATION, "creation", PROP_TYPE_NUMBER, 0, NULL,
@ -330,7 +341,6 @@ zfs_name_to_prop(const char *propname)
return (zprop_name_to_prop(propname, ZFS_TYPE_DATASET));
}
/*
* For user property names, we allow all lowercase alphanumeric characters, plus
* a few useful punctuation characters.
@ -367,6 +377,26 @@ zfs_prop_user(const char *name)
return (B_TRUE);
}
/*
* Returns true if this is a valid userspace-type property (one with a '@').
* Note that after the @, any character is valid (eg, another @, for SID
* user@domain).
*/
boolean_t
zfs_prop_userquota(const char *name)
{
zfs_userquota_prop_t prop;
for (prop = 0; prop < ZFS_NUM_USERQUOTA_PROPS; prop++) {
if (strncmp(name, zfs_userquota_prop_prefixes[prop],
strlen(zfs_userquota_prop_prefixes[prop])) == 0) {
return (B_TRUE);
}
}
return (B_FALSE);
}
/*
* Tables of index types, plus functions to convert between the user view
* (strings) and internal representation (uint64_t).

View File

@ -19,12 +19,10 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#pragma ident "%Z%%M% %I% %E% SMI"
/*
* Common routines used by zfs and zpool property management.
*/
@ -205,9 +203,6 @@ propname_match(const char *p, size_t len, zprop_desc_t *prop_entry)
#ifndef _KERNEL
const char *colname = prop_entry->pd_colname;
int c;
if (colname == NULL)
return (B_FALSE);
#endif
if (len == strlen(propname) &&
@ -215,7 +210,7 @@ propname_match(const char *p, size_t len, zprop_desc_t *prop_entry)
return (B_TRUE);
#ifndef _KERNEL
if (len != strlen(colname))
if (colname == NULL || len != strlen(colname))
return (B_FALSE);
for (c = 0; c < len; c++)

View File

@ -462,6 +462,7 @@ static arc_state_t *arc_l2c_only;
static int arc_no_grow; /* Don't try to grow cache size */
static uint64_t arc_tempreserve;
static uint64_t arc_loaned_bytes;
static uint64_t arc_meta_used;
static uint64_t arc_meta_limit;
static uint64_t arc_meta_max = 0;
@ -511,7 +512,7 @@ struct arc_buf_hdr {
/* immutable */
arc_buf_contents_t b_type;
uint64_t b_size;
spa_t *b_spa;
uint64_t b_spa;
/* protected by arc state mutex */
arc_state_t *b_state;
@ -533,9 +534,9 @@ static arc_buf_hdr_t arc_eviction_hdr;
static void arc_get_data_buf(arc_buf_t *buf);
static void arc_access(arc_buf_hdr_t *buf, kmutex_t *hash_lock);
static int arc_evict_needed(arc_buf_contents_t type);
static void arc_evict_ghost(arc_state_t *state, spa_t *spa, int64_t bytes);
static void arc_evict_ghost(arc_state_t *state, uint64_t spa, int64_t bytes);
static boolean_t l2arc_write_eligible(spa_t *spa, arc_buf_hdr_t *ab);
static boolean_t l2arc_write_eligible(uint64_t spa_guid, arc_buf_hdr_t *ab);
#define GHOST_STATE(state) \
((state) == arc_mru_ghost || (state) == arc_mfu_ghost || \
@ -761,9 +762,8 @@ static void l2arc_hdr_stat_add(void);
static void l2arc_hdr_stat_remove(void);
static uint64_t
buf_hash(spa_t *spa, const dva_t *dva, uint64_t birth)
buf_hash(uint64_t spa, const dva_t *dva, uint64_t birth)
{
uintptr_t spav = (uintptr_t)spa;
uint8_t *vdva = (uint8_t *)dva;
uint64_t crc = -1ULL;
int i;
@ -773,7 +773,7 @@ buf_hash(spa_t *spa, const dva_t *dva, uint64_t birth)
for (i = 0; i < sizeof (dva_t); i++)
crc = (crc >> 8) ^ zfs_crc64_table[(crc ^ vdva[i]) & 0xFF];
crc ^= (spav>>8) ^ birth;
crc ^= (spa>>8) ^ birth;
return (crc);
}
@ -789,7 +789,7 @@ buf_hash(spa_t *spa, const dva_t *dva, uint64_t birth)
((buf)->b_birth == birth) && ((buf)->b_spa == spa)
static arc_buf_hdr_t *
buf_hash_find(spa_t *spa, const dva_t *dva, uint64_t birth, kmutex_t **lockp)
buf_hash_find(uint64_t spa, const dva_t *dva, uint64_t birth, kmutex_t **lockp)
{
uint64_t idx = BUF_HASH_INDEX(spa, dva, birth);
kmutex_t *hash_lock = BUF_HASH_LOCK(idx);
@ -1345,7 +1345,7 @@ arc_buf_alloc(spa_t *spa, int size, void *tag, arc_buf_contents_t type)
ASSERT(BUF_EMPTY(hdr));
hdr->b_size = size;
hdr->b_type = type;
hdr->b_spa = spa;
hdr->b_spa = spa_guid(spa);
hdr->b_state = arc_anon;
hdr->b_arc_access = 0;
buf = kmem_cache_alloc(buf_cache, KM_PUSHPAGE);
@ -1364,6 +1364,41 @@ arc_buf_alloc(spa_t *spa, int size, void *tag, arc_buf_contents_t type)
return (buf);
}
static char *arc_onloan_tag = "onloan";
/*
* Loan out an anonymous arc buffer. Loaned buffers are not counted as in
* flight data by arc_tempreserve_space() until they are "returned". Loaned
* buffers must be returned to the arc before they can be used by the DMU or
* freed.
*/
arc_buf_t *
arc_loan_buf(spa_t *spa, int size)
{
arc_buf_t *buf;
buf = arc_buf_alloc(spa, size, arc_onloan_tag, ARC_BUFC_DATA);
atomic_add_64(&arc_loaned_bytes, size);
return (buf);
}
/*
* Return a loaned arc buffer to the arc.
*/
void
arc_return_buf(arc_buf_t *buf, void *tag)
{
arc_buf_hdr_t *hdr = buf->b_hdr;
ASSERT(hdr->b_state == arc_anon);
ASSERT(buf->b_data != NULL);
VERIFY(refcount_remove(&hdr->b_refcnt, arc_onloan_tag) == 0);
VERIFY(refcount_add(&hdr->b_refcnt, tag) == 1);
atomic_add_64(&arc_loaned_bytes, -hdr->b_size);
}
static arc_buf_t *
arc_buf_clone(arc_buf_t *from)
{
@ -1661,7 +1696,7 @@ arc_buf_size(arc_buf_t *buf)
* It may also return without evicting as much space as requested.
*/
static void *
arc_evict(arc_state_t *state, spa_t *spa, int64_t bytes, boolean_t recycle,
arc_evict(arc_state_t *state, uint64_t spa, int64_t bytes, boolean_t recycle,
arc_buf_contents_t type)
{
arc_state_t *evicted_state;
@ -1830,12 +1865,12 @@ arc_evict(arc_state_t *state, spa_t *spa, int64_t bytes, boolean_t recycle,
if (mru_over > 0 && arc_mru_ghost->arcs_lsize[type] > 0) {
int64_t todelete =
MIN(arc_mru_ghost->arcs_lsize[type], mru_over);
arc_evict_ghost(arc_mru_ghost, NULL, todelete);
arc_evict_ghost(arc_mru_ghost, 0, todelete);
} else if (arc_mfu_ghost->arcs_lsize[type] > 0) {
int64_t todelete = MIN(arc_mfu_ghost->arcs_lsize[type],
arc_mru_ghost->arcs_size +
arc_mfu_ghost->arcs_size - arc_c);
arc_evict_ghost(arc_mfu_ghost, NULL, todelete);
arc_evict_ghost(arc_mfu_ghost, 0, todelete);
}
}
if (stolen)
@ -1849,7 +1884,7 @@ arc_evict(arc_state_t *state, spa_t *spa, int64_t bytes, boolean_t recycle,
* bytes. Destroy the buffers that are removed.
*/
static void
arc_evict_ghost(arc_state_t *state, spa_t *spa, int64_t bytes)
arc_evict_ghost(arc_state_t *state, uint64_t spa, int64_t bytes)
{
arc_buf_hdr_t *ab, *ab_prev;
list_t *list, *list_start;
@ -1955,13 +1990,13 @@ arc_adjust(void)
if (adjustment > 0 && arc_mru->arcs_lsize[ARC_BUFC_DATA] > 0) {
delta = MIN(arc_mru->arcs_lsize[ARC_BUFC_DATA], adjustment);
(void) arc_evict(arc_mru, NULL, delta, FALSE, ARC_BUFC_DATA);
(void) arc_evict(arc_mru, 0, delta, FALSE, ARC_BUFC_DATA);
adjustment -= delta;
}
if (adjustment > 0 && arc_mru->arcs_lsize[ARC_BUFC_METADATA] > 0) {
delta = MIN(arc_mru->arcs_lsize[ARC_BUFC_METADATA], adjustment);
(void) arc_evict(arc_mru, NULL, delta, FALSE,
(void) arc_evict(arc_mru, 0, delta, FALSE,
ARC_BUFC_METADATA);
}
@ -1973,14 +2008,14 @@ arc_adjust(void)
if (adjustment > 0 && arc_mfu->arcs_lsize[ARC_BUFC_DATA] > 0) {
delta = MIN(adjustment, arc_mfu->arcs_lsize[ARC_BUFC_DATA]);
(void) arc_evict(arc_mfu, NULL, delta, FALSE, ARC_BUFC_DATA);
(void) arc_evict(arc_mfu, 0, delta, FALSE, ARC_BUFC_DATA);
adjustment -= delta;
}
if (adjustment > 0 && arc_mfu->arcs_lsize[ARC_BUFC_METADATA] > 0) {
int64_t delta = MIN(adjustment,
arc_mfu->arcs_lsize[ARC_BUFC_METADATA]);
(void) arc_evict(arc_mfu, NULL, delta, FALSE,
(void) arc_evict(arc_mfu, 0, delta, FALSE,
ARC_BUFC_METADATA);
}
@ -1992,7 +2027,7 @@ arc_adjust(void)
if (adjustment > 0 && arc_mru_ghost->arcs_size > 0) {
delta = MIN(arc_mru_ghost->arcs_size, adjustment);
arc_evict_ghost(arc_mru_ghost, NULL, delta);
arc_evict_ghost(arc_mru_ghost, 0, delta);
}
adjustment =
@ -2000,7 +2035,7 @@ arc_adjust(void)
if (adjustment > 0 && arc_mfu_ghost->arcs_size > 0) {
delta = MIN(arc_mfu_ghost->arcs_size, adjustment);
arc_evict_ghost(arc_mfu_ghost, NULL, delta);
arc_evict_ghost(arc_mfu_ghost, 0, delta);
}
}
@ -2044,29 +2079,34 @@ arc_do_user_evicts(void)
void
arc_flush(spa_t *spa)
{
uint64_t guid = 0;
if (spa)
guid = spa_guid(spa);
while (arc_mru->arcs_lsize[ARC_BUFC_DATA]) {
(void) arc_evict(arc_mru, spa, -1, FALSE, ARC_BUFC_DATA);
(void) arc_evict(arc_mru, guid, -1, FALSE, ARC_BUFC_DATA);
if (spa)
break;
}
while (arc_mru->arcs_lsize[ARC_BUFC_METADATA]) {
(void) arc_evict(arc_mru, spa, -1, FALSE, ARC_BUFC_METADATA);
(void) arc_evict(arc_mru, guid, -1, FALSE, ARC_BUFC_METADATA);
if (spa)
break;
}
while (arc_mfu->arcs_lsize[ARC_BUFC_DATA]) {
(void) arc_evict(arc_mfu, spa, -1, FALSE, ARC_BUFC_DATA);
(void) arc_evict(arc_mfu, guid, -1, FALSE, ARC_BUFC_DATA);
if (spa)
break;
}
while (arc_mfu->arcs_lsize[ARC_BUFC_METADATA]) {
(void) arc_evict(arc_mfu, spa, -1, FALSE, ARC_BUFC_METADATA);
(void) arc_evict(arc_mfu, guid, -1, FALSE, ARC_BUFC_METADATA);
if (spa)
break;
}
arc_evict_ghost(arc_mru_ghost, spa, -1);
arc_evict_ghost(arc_mfu_ghost, spa, -1);
arc_evict_ghost(arc_mru_ghost, guid, -1);
arc_evict_ghost(arc_mfu_ghost, guid, -1);
mutex_enter(&arc_reclaim_thr_lock);
arc_do_user_evicts();
@ -2463,7 +2503,7 @@ arc_get_data_buf(arc_buf_t *buf)
state = (arc_mru->arcs_lsize[type] >= size &&
mfu_space > arc_mfu->arcs_size) ? arc_mru : arc_mfu;
}
if ((buf->b_data = arc_evict(state, NULL, size, TRUE, type)) == NULL) {
if ((buf->b_data = arc_evict(state, 0, size, TRUE, type)) == NULL) {
if (type == ARC_BUFC_METADATA) {
buf->b_data = zio_buf_alloc(size);
arc_space_consume(size, ARC_SPACE_DATA);
@ -2673,7 +2713,7 @@ arc_read_done(zio_t *zio)
* reason for it not to be found is if we were freed during the
* read.
*/
found = buf_hash_find(zio->io_spa, &hdr->b_dva, hdr->b_birth,
found = buf_hash_find(hdr->b_spa, &hdr->b_dva, hdr->b_birth,
&hash_lock);
ASSERT((found == NULL && HDR_FREED_IN_READ(hdr) && hash_lock == NULL) ||
@ -2817,9 +2857,10 @@ arc_read_nolock(zio_t *pio, spa_t *spa, blkptr_t *bp,
arc_buf_t *buf;
kmutex_t *hash_lock;
zio_t *rzio;
uint64_t guid = spa_guid(spa);
top:
hdr = buf_hash_find(spa, BP_IDENTITY(bp), bp->blk_birth, &hash_lock);
hdr = buf_hash_find(guid, BP_IDENTITY(bp), bp->blk_birth, &hash_lock);
if (hdr && hdr->b_datacnt > 0) {
*arc_flags |= ARC_CACHED;
@ -2842,7 +2883,7 @@ arc_read_nolock(zio_t *pio, spa_t *spa, blkptr_t *bp,
acb->acb_private = private;
if (pio != NULL)
acb->acb_zio_dummy = zio_null(pio,
spa, NULL, NULL, zio_flags);
spa, NULL, NULL, NULL, zio_flags);
ASSERT(acb->acb_done != NULL);
acb->acb_next = hdr->b_acb;
@ -3084,9 +3125,10 @@ arc_tryread(spa_t *spa, blkptr_t *bp, void *data)
{
arc_buf_hdr_t *hdr;
kmutex_t *hash_mtx;
uint64_t guid = spa_guid(spa);
int rc = 0;
hdr = buf_hash_find(spa, BP_IDENTITY(bp), bp->blk_birth, &hash_mtx);
hdr = buf_hash_find(guid, BP_IDENTITY(bp), bp->blk_birth, &hash_mtx);
if (hdr && hdr->b_datacnt > 0 && !HDR_IO_IN_PROGRESS(hdr)) {
arc_buf_t *buf = hdr->b_buf;
@ -3254,7 +3296,7 @@ arc_release(arc_buf_t *buf, void *tag)
arc_buf_hdr_t *nhdr;
arc_buf_t **bufp;
uint64_t blksz = hdr->b_size;
spa_t *spa = hdr->b_spa;
uint64_t spa = hdr->b_spa;
arc_buf_contents_t type = hdr->b_type;
uint32_t flags = hdr->b_flags;
@ -3539,12 +3581,13 @@ arc_free(zio_t *pio, spa_t *spa, uint64_t txg, blkptr_t *bp,
arc_buf_hdr_t *ab;
kmutex_t *hash_lock;
zio_t *zio;
uint64_t guid = spa_guid(spa);
/*
* If this buffer is in the cache, release it, so it
* can be re-used.
*/
ab = buf_hash_find(spa, BP_IDENTITY(bp), bp->blk_birth, &hash_lock);
ab = buf_hash_find(guid, BP_IDENTITY(bp), bp->blk_birth, &hash_lock);
if (ab != NULL) {
/*
* The checksum of blocks to free is not always
@ -3607,10 +3650,9 @@ arc_free(zio_t *pio, spa_t *spa, uint64_t txg, blkptr_t *bp,
}
static int
arc_memory_throttle(uint64_t reserve, uint64_t txg)
arc_memory_throttle(uint64_t reserve, uint64_t inflight_data, uint64_t txg)
{
#ifdef _KERNEL
uint64_t inflight_data = arc_anon->arcs_size;
uint64_t available_memory = ptoa((uintmax_t)cnt.v_free_count);
static uint64_t page_load = 0;
static uint64_t last_txg = 0;
@ -3674,6 +3716,7 @@ int
arc_tempreserve_space(uint64_t reserve, uint64_t txg)
{
int error;
uint64_t anon_size;
#ifdef ZFS_DEBUG
/*
@ -3689,12 +3732,19 @@ arc_tempreserve_space(uint64_t reserve, uint64_t txg)
if (reserve > arc_c)
return (ENOMEM);
/*
* Don't count loaned bufs as in flight dirty data to prevent long
* network delays from blocking transactions that are ready to be
* assigned to a txg.
*/
anon_size = MAX((int64_t)(arc_anon->arcs_size - arc_loaned_bytes), 0);
/*
* Writes will, almost always, require additional memory allocations
* in order to compress/encrypt/etc the data. We therefor need to
* make sure that there is sufficient available memory for this.
*/
if (error = arc_memory_throttle(reserve, txg))
if (error = arc_memory_throttle(reserve, anon_size, txg))
return (error);
/*
@ -3704,8 +3754,9 @@ arc_tempreserve_space(uint64_t reserve, uint64_t txg)
* Note: if two requests come in concurrently, we might let them
* both succeed, when one of them should fail. Not a huge deal.
*/
if (reserve + arc_tempreserve + arc_anon->arcs_size > arc_c / 2 &&
arc_anon->arcs_size > arc_c / 4) {
if (reserve + arc_tempreserve + anon_size > arc_c / 2 &&
anon_size > arc_c / 4) {
dprintf("failing, arc_tempreserve=%lluK anon_meta=%lluK "
"anon_data=%lluK tempreserve=%lluK arc_c=%lluK\n",
arc_tempreserve>>10,
@ -3959,6 +4010,8 @@ arc_fini(void)
buf_fini();
ASSERT(arc_loaned_bytes == 0);
mutex_destroy(&arc_lowmem_lock);
#ifdef _KERNEL
if (arc_event_lowmem != NULL)
@ -4103,7 +4156,7 @@ arc_fini(void)
*/
static boolean_t
l2arc_write_eligible(spa_t *spa, arc_buf_hdr_t *ab)
l2arc_write_eligible(uint64_t spa_guid, arc_buf_hdr_t *ab)
{
/*
* A buffer is *not* eligible for the L2ARC if it:
@ -4112,7 +4165,7 @@ l2arc_write_eligible(spa_t *spa, arc_buf_hdr_t *ab)
* 3. has an I/O in progress (it may be an incomplete read).
* 4. is flagged not eligible (zfs property).
*/
if (ab->b_spa != spa) {
if (ab->b_spa != spa_guid) {
ARCSTAT_BUMP(arcstat_l2_write_spa_mismatch);
return (B_FALSE);
}
@ -4399,11 +4452,15 @@ l2arc_read_done(zio_t *zio)
* storage now. If there *is* a waiter, the caller must
* issue the i/o in a context where it's OK to block.
*/
if (zio->io_waiter == NULL)
zio_nowait(zio_read(zio->io_parent,
cb->l2rcb_spa, &cb->l2rcb_bp,
if (zio->io_waiter == NULL) {
zio_t *pio = zio_unique_parent(zio);
ASSERT(!pio || pio->io_child_type == ZIO_CHILD_LOGICAL);
zio_nowait(zio_read(pio, cb->l2rcb_spa, &cb->l2rcb_bp,
buf->b_data, zio->io_size, arc_read_done, buf,
zio->io_priority, cb->l2rcb_flags, &cb->l2rcb_zb));
}
}
kmem_free(cb, sizeof (l2arc_read_callback_t));
@ -4600,6 +4657,7 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz)
boolean_t have_lock, full;
l2arc_write_callback_t *cb;
zio_t *pio, *wzio;
uint64_t guid = spa_guid(spa);
int try;
ASSERT(dev->l2ad_vdev != NULL);
@ -4661,7 +4719,7 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz)
break;
}
if (!l2arc_write_eligible(spa, ab)) {
if (!l2arc_write_eligible(guid, ab)) {
mutex_exit(hash_lock);
continue;
}
@ -5001,7 +5059,7 @@ l2arc_fini(void)
void
l2arc_start(void)
{
if (!(spa_mode & FWRITE))
if (!(spa_mode_global & FWRITE))
return;
(void) thread_create(NULL, 0, l2arc_feed_thread, NULL, 0, &p0,
@ -5011,7 +5069,7 @@ l2arc_start(void)
void
l2arc_stop(void)
{
if (!(spa_mode & FWRITE))
if (!(spa_mode_global & FWRITE))
return;
mutex_enter(&l2arc_feed_thr_lock);

View File

@ -327,7 +327,7 @@ dbuf_verify(dmu_buf_impl_t *db)
if (db->db_parent == dn->dn_dbuf) {
/* db is pointed to by the dnode */
/* ASSERT3U(db->db_blkid, <, dn->dn_nblkptr); */
if (db->db.db_object == DMU_META_DNODE_OBJECT)
if (DMU_OBJECT_IS_SPECIAL(db->db.db_object))
ASSERT(db->db_parent == NULL);
else
ASSERT(db->db_parent != NULL);
@ -899,15 +899,11 @@ dbuf_dirty(dmu_buf_impl_t *db, dmu_tx_t *tx)
* Shouldn't dirty a regular buffer in syncing context. Private
* objects may be dirtied in syncing context, but only if they
* were already pre-dirtied in open context.
* XXX We may want to prohibit dirtying in syncing context even
* if they did pre-dirty.
*/
ASSERT(!dmu_tx_is_syncing(tx) ||
BP_IS_HOLE(dn->dn_objset->os_rootbp) ||
dn->dn_object == DMU_META_DNODE_OBJECT ||
dn->dn_objset->os_dsl_dataset == NULL ||
dsl_dir_is_private(dn->dn_objset->os_dsl_dataset->ds_dir));
DMU_OBJECT_IS_SPECIAL(dn->dn_object) ||
dn->dn_objset->os_dsl_dataset == NULL);
/*
* We make this assert for private objects as well, but after we
* check if we're already dirty. They are allowed to re-dirty
@ -965,7 +961,8 @@ dbuf_dirty(dmu_buf_impl_t *db, dmu_tx_t *tx)
/*
* Only valid if not already dirty.
*/
ASSERT(dn->dn_dirtyctx == DN_UNDIRTIED || dn->dn_dirtyctx ==
ASSERT(dn->dn_object == 0 ||
dn->dn_dirtyctx == DN_UNDIRTIED || dn->dn_dirtyctx ==
(dmu_tx_is_syncing(tx) ? DN_DIRTY_SYNC : DN_DIRTY_OPEN));
ASSERT3U(dn->dn_nlevels, >, db->db_level);
@ -977,15 +974,13 @@ dbuf_dirty(dmu_buf_impl_t *db, dmu_tx_t *tx)
/*
* We should only be dirtying in syncing context if it's the
* mos, a spa os, or we're initializing the os. However, we are
* allowed to dirty in syncing context provided we already
* dirtied it in open context. Hence we must make this
* assertion only if we're not already dirty.
* mos or we're initializing the os or it's a special object.
* However, we are allowed to dirty in syncing context provided
* we already dirtied it in open context. Hence we must make
* this assertion only if we're not already dirty.
*/
ASSERT(!dmu_tx_is_syncing(tx) ||
os->os_dsl_dataset == NULL ||
!dsl_dir_is_private(os->os_dsl_dataset->ds_dir) ||
!BP_IS_HOLE(os->os_rootbp));
ASSERT(!dmu_tx_is_syncing(tx) || DMU_OBJECT_IS_SPECIAL(dn->dn_object) ||
os->os_dsl_dataset == NULL || BP_IS_HOLE(os->os_rootbp));
ASSERT(db->db.db_size != 0);
dprintf_dbuf(db, "size=%llx\n", (u_longlong_t)db->db.db_size);
@ -1284,6 +1279,68 @@ dbuf_fill_done(dmu_buf_impl_t *db, dmu_tx_t *tx)
mutex_exit(&db->db_mtx);
}
/*
* Directly assign a provided arc buf to a given dbuf if it's not referenced
* by anybody except our caller. Otherwise copy arcbuf's contents to dbuf.
*/
void
dbuf_assign_arcbuf(dmu_buf_impl_t *db, arc_buf_t *buf, dmu_tx_t *tx)
{
ASSERT(!refcount_is_zero(&db->db_holds));
ASSERT(db->db_dnode->dn_object != DMU_META_DNODE_OBJECT);
ASSERT(db->db_blkid != DB_BONUS_BLKID);
ASSERT(db->db_level == 0);
ASSERT(DBUF_GET_BUFC_TYPE(db) == ARC_BUFC_DATA);
ASSERT(buf != NULL);
ASSERT(arc_buf_size(buf) == db->db.db_size);
ASSERT(tx->tx_txg != 0);
arc_return_buf(buf, db);
ASSERT(arc_released(buf));
mutex_enter(&db->db_mtx);
while (db->db_state == DB_READ || db->db_state == DB_FILL)
cv_wait(&db->db_changed, &db->db_mtx);
ASSERT(db->db_state == DB_CACHED || db->db_state == DB_UNCACHED);
if (db->db_state == DB_CACHED &&
refcount_count(&db->db_holds) - 1 > db->db_dirtycnt) {
mutex_exit(&db->db_mtx);
(void) dbuf_dirty(db, tx);
bcopy(buf->b_data, db->db.db_data, db->db.db_size);
VERIFY(arc_buf_remove_ref(buf, db) == 1);
return;
}
if (db->db_state == DB_CACHED) {
dbuf_dirty_record_t *dr = db->db_last_dirty;
ASSERT(db->db_buf != NULL);
if (dr != NULL && dr->dr_txg == tx->tx_txg) {
ASSERT(dr->dt.dl.dr_data == db->db_buf);
if (!arc_released(db->db_buf)) {
ASSERT(dr->dt.dl.dr_override_state ==
DR_OVERRIDDEN);
arc_release(db->db_buf, db);
}
dr->dt.dl.dr_data = buf;
VERIFY(arc_buf_remove_ref(db->db_buf, db) == 1);
} else if (dr == NULL || dr->dt.dl.dr_data != db->db_buf) {
arc_release(db->db_buf, db);
VERIFY(arc_buf_remove_ref(db->db_buf, db) == 1);
}
db->db_buf = NULL;
}
ASSERT(db->db_buf == NULL);
dbuf_set_data(db, buf);
db->db_state = DB_FILL;
mutex_exit(&db->db_mtx);
(void) dbuf_dirty(db, tx);
dbuf_fill_done(db, tx);
}
/*
* "Clear" the contents of this dbuf. This will mark the dbuf
* EVICTING and clear *most* of its references. Unfortunetely,
@ -1827,6 +1884,19 @@ dmu_buf_get_user(dmu_buf_t *db_fake)
return (db->db_user_ptr);
}
boolean_t
dmu_buf_freeable(dmu_buf_t *dbuf)
{
boolean_t res = B_FALSE;
dmu_buf_impl_t *db = (dmu_buf_impl_t *)dbuf;
if (db->db_blkptr)
res = dsl_dataset_block_freeable(db->db_objset->os_dsl_dataset,
db->db_blkptr->blk_birth);
return (res);
}
static void
dbuf_check_blkptr(dnode_t *dn, dmu_buf_impl_t *db)
{

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -82,6 +82,8 @@ const dmu_object_type_info_t dmu_ot[DMU_OT_NUMTYPES] = {
{ byteswap_uint64_array, TRUE, "FUID table size" },
{ zap_byteswap, TRUE, "DSL dataset next clones"},
{ zap_byteswap, TRUE, "scrub work queue" },
{ zap_byteswap, TRUE, "ZFS user/group used" },
{ zap_byteswap, TRUE, "ZFS user/group quota" },
};
int
@ -177,22 +179,22 @@ dmu_bonus_hold(objset_t *os, uint64_t object, void *tag, dmu_buf_t **dbp)
* whose dnodes are in the same block.
*/
static int
dmu_buf_hold_array_by_dnode(dnode_t *dn, uint64_t offset,
uint64_t length, int read, void *tag, int *numbufsp, dmu_buf_t ***dbpp)
dmu_buf_hold_array_by_dnode(dnode_t *dn, uint64_t offset, uint64_t length,
int read, void *tag, int *numbufsp, dmu_buf_t ***dbpp, uint32_t flags)
{
dsl_pool_t *dp = NULL;
dmu_buf_t **dbp;
uint64_t blkid, nblks, i;
uint32_t flags;
uint32_t dbuf_flags;
int err;
zio_t *zio;
hrtime_t start;
ASSERT(length <= DMU_MAX_ACCESS);
flags = DB_RF_CANFAIL | DB_RF_NEVERWAIT;
if (length > zfetch_array_rd_sz)
flags |= DB_RF_NOPREFETCH;
dbuf_flags = DB_RF_CANFAIL | DB_RF_NEVERWAIT;
if (flags & DMU_READ_NO_PREFETCH || length > zfetch_array_rd_sz)
dbuf_flags |= DB_RF_NOPREFETCH;
rw_enter(&dn->dn_struct_rwlock, RW_READER);
if (dn->dn_datablkshift) {
@ -230,7 +232,7 @@ dmu_buf_hold_array_by_dnode(dnode_t *dn, uint64_t offset,
/* initiate async i/o */
if (read) {
rw_exit(&dn->dn_struct_rwlock);
(void) dbuf_read(db, zio, flags);
(void) dbuf_read(db, zio, dbuf_flags);
rw_enter(&dn->dn_struct_rwlock, RW_READER);
}
dbp[i] = &db->db;
@ -282,7 +284,7 @@ dmu_buf_hold_array(objset_t *os, uint64_t object, uint64_t offset,
return (err);
err = dmu_buf_hold_array_by_dnode(dn, offset, length, read, tag,
numbufsp, dbpp);
numbufsp, dbpp, DMU_READ_PREFETCH);
dnode_rele(dn, FTAG);
@ -297,7 +299,7 @@ dmu_buf_hold_array_by_bonus(dmu_buf_t *db, uint64_t offset,
int err;
err = dmu_buf_hold_array_by_dnode(dn, offset, length, read, tag,
numbufsp, dbpp);
numbufsp, dbpp, DMU_READ_PREFETCH);
return (err);
}
@ -434,7 +436,8 @@ dmu_free_long_range_impl(objset_t *os, dnode_t *dn, uint64_t offset,
object_size = align == 1 ? dn->dn_datablksz :
(dn->dn_maxblkid + 1) << dn->dn_datablkshift;
if (trunc || (end = offset + length) > object_size)
end = offset + length;
if (trunc || end > object_size)
end = object_size;
if (end <= offset)
return (0);
@ -442,6 +445,7 @@ dmu_free_long_range_impl(objset_t *os, dnode_t *dn, uint64_t offset,
while (length) {
start = end;
/* assert(offset <= start) */
err = get_next_chunk(dn, &start, offset);
if (err)
return (err);
@ -532,7 +536,7 @@ dmu_free_range(objset_t *os, uint64_t object, uint64_t offset,
int
dmu_read(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
void *buf)
void *buf, uint32_t flags)
{
dnode_t *dn;
dmu_buf_t **dbp;
@ -562,7 +566,7 @@ dmu_read(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
* to be reading in parallel.
*/
err = dmu_buf_hold_array_by_dnode(dn, offset, mylen,
TRUE, FTAG, &numbufs, &dbp);
TRUE, FTAG, &numbufs, &dbp, flags);
if (err)
break;
@ -771,9 +775,6 @@ dmu_write_pages(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
if (tocpy == db->db_size)
dmu_buf_fill_done(db, tx);
if (err)
break;
offset += tocpy;
size -= tocpy;
}
@ -783,6 +784,58 @@ dmu_write_pages(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
#endif /* !__FreeBSD__ */
#endif /* _KERNEL */
/*
* Allocate a loaned anonymous arc buffer.
*/
arc_buf_t *
dmu_request_arcbuf(dmu_buf_t *handle, int size)
{
dnode_t *dn = ((dmu_buf_impl_t *)handle)->db_dnode;
return (arc_loan_buf(dn->dn_objset->os_spa, size));
}
/*
* Free a loaned arc buffer.
*/
void
dmu_return_arcbuf(arc_buf_t *buf)
{
arc_return_buf(buf, FTAG);
VERIFY(arc_buf_remove_ref(buf, FTAG) == 1);
}
/*
* When possible directly assign passed loaned arc buffer to a dbuf.
* If this is not possible copy the contents of passed arc buf via
* dmu_write().
*/
void
dmu_assign_arcbuf(dmu_buf_t *handle, uint64_t offset, arc_buf_t *buf,
dmu_tx_t *tx)
{
dnode_t *dn = ((dmu_buf_impl_t *)handle)->db_dnode;
dmu_buf_impl_t *db;
uint32_t blksz = (uint32_t)arc_buf_size(buf);
uint64_t blkid;
rw_enter(&dn->dn_struct_rwlock, RW_READER);
blkid = dbuf_whichblock(dn, offset);
VERIFY((db = dbuf_hold(dn, blkid, FTAG)) != NULL);
rw_exit(&dn->dn_struct_rwlock);
if (offset == db->db.db_offset && blksz == db->db.db_size) {
dbuf_assign_arcbuf(db, buf, tx);
dbuf_rele(db, FTAG);
} else {
dbuf_rele(db, FTAG);
ASSERT(dn->dn_objset->os.os == dn->dn_objset);
dmu_write(&dn->dn_objset->os, dn->dn_object, offset, blksz,
buf->b_data, tx);
dmu_return_arcbuf(buf);
}
}
typedef struct {
dbuf_dirty_record_t *dr;
dmu_sync_cb_t *done;
@ -794,14 +847,20 @@ static void
dmu_sync_ready(zio_t *zio, arc_buf_t *buf, void *varg)
{
blkptr_t *bp = zio->io_bp;
dmu_sync_arg_t *in = varg;
dbuf_dirty_record_t *dr = in->dr;
dmu_buf_impl_t *db = dr->dr_dbuf;
if (!BP_IS_HOLE(bp)) {
dmu_sync_arg_t *in = varg;
dbuf_dirty_record_t *dr = in->dr;
dmu_buf_impl_t *db = dr->dr_dbuf;
ASSERT(BP_GET_TYPE(bp) == db->db_dnode->dn_type);
ASSERT(BP_GET_LEVEL(bp) == 0);
bp->blk_fill = 1;
} else {
/*
* dmu_sync() can compress a block of zeros to a null blkptr
* but the block size still needs to be passed through to replay
*/
BP_SET_LSIZE(bp, db->db.db_size);
}
}
@ -817,6 +876,8 @@ dmu_sync_done(zio_t *zio, arc_buf_t *buf, void *varg)
mutex_enter(&db->db_mtx);
ASSERT(dr->dt.dl.dr_override_state == DR_IN_DMU_SYNC);
dr->dt.dl.dr_overridden_by = *zio->io_bp; /* structure assignment */
if (BP_IS_HOLE(&dr->dt.dl.dr_overridden_by))
BP_ZERO(&dr->dt.dl.dr_overridden_by);
dr->dt.dl.dr_override_state = DR_OVERRIDDEN;
cv_broadcast(&db->db_changed);
mutex_exit(&db->db_mtx);

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -164,10 +164,15 @@ dmu_objset_byteswap(void *buf, size_t size)
{
objset_phys_t *osp = buf;
ASSERT(size == sizeof (objset_phys_t));
ASSERT(size == OBJSET_OLD_PHYS_SIZE || size == sizeof (objset_phys_t));
dnode_byteswap(&osp->os_meta_dnode);
byteswap_uint64_array(&osp->os_zil_header, sizeof (zil_header_t));
osp->os_type = BSWAP_64(osp->os_type);
osp->os_flags = BSWAP_64(osp->os_flags);
if (size == sizeof (objset_phys_t)) {
dnode_byteswap(&osp->os_userused_dnode);
dnode_byteswap(&osp->os_groupused_dnode);
}
}
int
@ -210,12 +215,30 @@ dmu_objset_open_impl(spa_t *spa, dsl_dataset_t *ds, blkptr_t *bp,
err = EIO;
return (err);
}
/* Increase the blocksize if we are permitted. */
if (spa_version(spa) >= SPA_VERSION_USERSPACE &&
arc_buf_size(osi->os_phys_buf) < sizeof (objset_phys_t)) {
arc_buf_t *buf = arc_buf_alloc(spa,
sizeof (objset_phys_t), &osi->os_phys_buf,
ARC_BUFC_METADATA);
bzero(buf->b_data, sizeof (objset_phys_t));
bcopy(osi->os_phys_buf->b_data, buf->b_data,
arc_buf_size(osi->os_phys_buf));
(void) arc_buf_remove_ref(osi->os_phys_buf,
&osi->os_phys_buf);
osi->os_phys_buf = buf;
}
osi->os_phys = osi->os_phys_buf->b_data;
osi->os_flags = osi->os_phys->os_flags;
} else {
osi->os_phys_buf = arc_buf_alloc(spa, sizeof (objset_phys_t),
int size = spa_version(spa) >= SPA_VERSION_USERSPACE ?
sizeof (objset_phys_t) : OBJSET_OLD_PHYS_SIZE;
osi->os_phys_buf = arc_buf_alloc(spa, size,
&osi->os_phys_buf, ARC_BUFC_METADATA);
osi->os_phys = osi->os_phys_buf->b_data;
bzero(osi->os_phys, sizeof (objset_phys_t));
bzero(osi->os_phys, size);
}
/*
@ -276,6 +299,12 @@ dmu_objset_open_impl(spa_t *spa, dsl_dataset_t *ds, blkptr_t *bp,
osi->os_meta_dnode = dnode_special_open(osi,
&osi->os_phys->os_meta_dnode, DMU_META_DNODE_OBJECT);
if (arc_buf_size(osi->os_phys_buf) >= sizeof (objset_phys_t)) {
osi->os_userused_dnode = dnode_special_open(osi,
&osi->os_phys->os_userused_dnode, DMU_USERUSED_OBJECT);
osi->os_groupused_dnode = dnode_special_open(osi,
&osi->os_phys->os_groupused_dnode, DMU_GROUPUSED_OBJECT);
}
/*
* We should be the only thread trying to do this because we
@ -456,13 +485,15 @@ dmu_objset_evict(dsl_dataset_t *ds, void *arg)
os.os = osi;
(void) dmu_objset_evict_dbufs(&os);
ASSERT3P(list_head(&osi->os_dnodes), ==, osi->os_meta_dnode);
ASSERT3P(list_tail(&osi->os_dnodes), ==, osi->os_meta_dnode);
ASSERT3P(list_head(&osi->os_meta_dnode->dn_dbufs), ==, NULL);
dnode_special_close(osi->os_meta_dnode);
if (osi->os_userused_dnode) {
dnode_special_close(osi->os_userused_dnode);
dnode_special_close(osi->os_groupused_dnode);
}
zil_free(osi->os_zil);
ASSERT3P(list_head(&osi->os_dnodes), ==, NULL);
VERIFY(arc_buf_remove_ref(osi->os_phys_buf, &osi->os_phys_buf) == 1);
mutex_destroy(&osi->os_lock);
mutex_destroy(&osi->os_obj_lock);
@ -520,6 +551,10 @@ dmu_objset_create_impl(spa_t *spa, dsl_dataset_t *ds, blkptr_t *bp,
ASSERT(type != DMU_OST_ANY);
ASSERT(type < DMU_OST_NUMTYPES);
osi->os_phys->os_type = type;
if (dmu_objset_userused_enabled(osi)) {
osi->os_phys->os_flags |= OBJSET_FLAG_USERACCOUNTING_COMPLETE;
osi->os_flags = osi->os_phys->os_flags;
}
dsl_dataset_dirty(ds, tx);
@ -704,13 +739,33 @@ struct snaparg {
char *snapname;
char failed[MAXPATHLEN];
boolean_t checkperms;
list_t objsets;
nvlist_t *props;
};
struct osnode {
list_node_t node;
objset_t *os;
};
static int
snapshot_check(void *arg1, void *arg2, dmu_tx_t *tx)
{
objset_t *os = arg1;
struct snaparg *sn = arg2;
/* The props have already been checked by zfs_check_userprops(). */
return (dsl_dataset_snapshot_check(os->os->os_dsl_dataset,
sn->snapname, tx));
}
static void
snapshot_sync(void *arg1, void *arg2, cred_t *cr, dmu_tx_t *tx)
{
objset_t *os = arg1;
dsl_dataset_t *ds = os->os->os_dsl_dataset;
struct snaparg *sn = arg2;
dsl_dataset_snapshot_sync(ds, sn->snapname, cr, tx);
if (sn->props)
dsl_props_set_sync(ds->ds_prev, sn->props, cr, tx);
}
static int
dmu_objset_snapshot_one(char *name, void *arg)
@ -747,13 +802,8 @@ dmu_objset_snapshot_one(char *name, void *arg)
*/
err = zil_suspend(dmu_objset_zil(os));
if (err == 0) {
struct osnode *osn;
dsl_sync_task_create(sn->dstg, dsl_dataset_snapshot_check,
dsl_dataset_snapshot_sync, os->os->os_dsl_dataset,
sn->snapname, 3);
osn = kmem_alloc(sizeof (struct osnode), KM_SLEEP);
osn->os = os;
list_insert_tail(&sn->objsets, osn);
dsl_sync_task_create(sn->dstg, snapshot_check,
snapshot_sync, os, sn, 3);
} else {
dmu_objset_close(os);
}
@ -762,11 +812,11 @@ dmu_objset_snapshot_one(char *name, void *arg)
}
int
dmu_objset_snapshot(char *fsname, char *snapname, boolean_t recursive)
dmu_objset_snapshot(char *fsname, char *snapname,
nvlist_t *props, boolean_t recursive)
{
dsl_sync_task_t *dst;
struct osnode *osn;
struct snaparg sn = { 0 };
struct snaparg sn;
spa_t *spa;
int err;
@ -778,8 +828,7 @@ dmu_objset_snapshot(char *fsname, char *snapname, boolean_t recursive)
sn.dstg = dsl_sync_task_group_create(spa_get_dsl(spa));
sn.snapname = snapname;
list_create(&sn.objsets, sizeof (struct osnode),
offsetof(struct osnode, node));
sn.props = props;
if (recursive) {
sn.checkperms = B_TRUE;
@ -790,27 +839,19 @@ dmu_objset_snapshot(char *fsname, char *snapname, boolean_t recursive)
err = dmu_objset_snapshot_one(fsname, &sn);
}
if (err)
goto out;
err = dsl_sync_task_group_wait(sn.dstg);
if (err == 0)
err = dsl_sync_task_group_wait(sn.dstg);
for (dst = list_head(&sn.dstg->dstg_tasks); dst;
dst = list_next(&sn.dstg->dstg_tasks, dst)) {
dsl_dataset_t *ds = dst->dst_arg1;
objset_t *os = dst->dst_arg1;
dsl_dataset_t *ds = os->os->os_dsl_dataset;
if (dst->dst_err)
dsl_dataset_name(ds, sn.failed);
zil_resume(dmu_objset_zil(os));
dmu_objset_close(os);
}
out:
while (osn = list_head(&sn.objsets)) {
list_remove(&sn.objsets, osn);
zil_resume(dmu_objset_zil(osn->os));
dmu_objset_close(osn->os);
kmem_free(osn, sizeof (struct osnode));
}
list_destroy(&sn.objsets);
if (err)
(void) strcpy(fsname, sn.failed);
dsl_sync_task_group_destroy(sn.dstg);
@ -819,7 +860,7 @@ dmu_objset_snapshot(char *fsname, char *snapname, boolean_t recursive)
}
static void
dmu_objset_sync_dnodes(list_t *list, dmu_tx_t *tx)
dmu_objset_sync_dnodes(list_t *list, list_t *newlist, dmu_tx_t *tx)
{
dnode_t *dn;
@ -827,14 +868,20 @@ dmu_objset_sync_dnodes(list_t *list, dmu_tx_t *tx)
ASSERT(dn->dn_object != DMU_META_DNODE_OBJECT);
ASSERT(dn->dn_dbuf->db_data_pending);
/*
* Initialize dn_zio outside dnode_sync()
* to accomodate meta-dnode
* Initialize dn_zio outside dnode_sync() because the
* meta-dnode needs to set it ouside dnode_sync().
*/
dn->dn_zio = dn->dn_dbuf->db_data_pending->dr_zio;
ASSERT(dn->dn_zio);
ASSERT3U(dn->dn_nlevels, <=, DN_MAX_LEVELS);
list_remove(list, dn);
if (newlist) {
(void) dnode_add_ref(dn, newlist);
list_insert_tail(newlist, dn);
}
dnode_sync(dn, tx);
}
}
@ -853,9 +900,12 @@ ready(zio_t *zio, arc_buf_t *abuf, void *arg)
ASSERT(BP_GET_LEVEL(bp) == 0);
/*
* Update rootbp fill count.
* Update rootbp fill count: it should be the number of objects
* allocated in the object set (not counting the "special"
* objects that are stored in the objset_phys_t -- the meta
* dnode and user/group accounting objects).
*/
bp->blk_fill = 1; /* count the meta-dnode */
bp->blk_fill = 0;
for (int i = 0; i < dnp->dn_nblkptr; i++)
bp->blk_fill += dnp->dn_blkptr[i].blk_fill;
@ -878,6 +928,7 @@ dmu_objset_sync(objset_impl_t *os, zio_t *pio, dmu_tx_t *tx)
writeprops_t wp = { 0 };
zio_t *zio;
list_t *list;
list_t *newlist = NULL;
dbuf_dirty_record_t *dr;
dprintf_ds(os->os_dsl_dataset, "txg=%llu\n", tx->tx_txg);
@ -915,20 +966,41 @@ dmu_objset_sync(objset_impl_t *os, zio_t *pio, dmu_tx_t *tx)
}
arc_release(os->os_phys_buf, &os->os_phys_buf);
zio = arc_write(pio, os->os_spa, &wp, DMU_OS_IS_L2CACHEABLE(os),
tx->tx_txg, os->os_rootbp, os->os_phys_buf, ready, NULL, os,
ZIO_PRIORITY_ASYNC_WRITE, ZIO_FLAG_MUSTSUCCEED, &zb);
/*
* Sync meta-dnode - the parent IO for the sync is the root block
* Sync special dnodes - the parent IO for the sync is the root block
*/
os->os_meta_dnode->dn_zio = zio;
dnode_sync(os->os_meta_dnode, tx);
os->os_phys->os_flags = os->os_flags;
if (os->os_userused_dnode &&
os->os_userused_dnode->dn_type != DMU_OT_NONE) {
os->os_userused_dnode->dn_zio = zio;
dnode_sync(os->os_userused_dnode, tx);
os->os_groupused_dnode->dn_zio = zio;
dnode_sync(os->os_groupused_dnode, tx);
}
txgoff = tx->tx_txg & TXG_MASK;
dmu_objset_sync_dnodes(&os->os_free_dnodes[txgoff], tx);
dmu_objset_sync_dnodes(&os->os_dirty_dnodes[txgoff], tx);
if (dmu_objset_userused_enabled(os)) {
newlist = &os->os_synced_dnodes;
/*
* We must create the list here because it uses the
* dn_dirty_link[] of this txg.
*/
list_create(newlist, sizeof (dnode_t),
offsetof(dnode_t, dn_dirty_link[txgoff]));
}
dmu_objset_sync_dnodes(&os->os_free_dnodes[txgoff], newlist, tx);
dmu_objset_sync_dnodes(&os->os_dirty_dnodes[txgoff], newlist, tx);
list = &os->os_meta_dnode->dn_dirty_records[txgoff];
while (dr = list_head(list)) {
@ -945,6 +1017,146 @@ dmu_objset_sync(objset_impl_t *os, zio_t *pio, dmu_tx_t *tx)
zio_nowait(zio);
}
static objset_used_cb_t *used_cbs[DMU_OST_NUMTYPES];
void
dmu_objset_register_type(dmu_objset_type_t ost, objset_used_cb_t *cb)
{
used_cbs[ost] = cb;
}
boolean_t
dmu_objset_userused_enabled(objset_impl_t *os)
{
return (spa_version(os->os_spa) >= SPA_VERSION_USERSPACE &&
used_cbs[os->os_phys->os_type] &&
os->os_userused_dnode);
}
void
dmu_objset_do_userquota_callbacks(objset_impl_t *os, dmu_tx_t *tx)
{
dnode_t *dn;
list_t *list = &os->os_synced_dnodes;
static const char zerobuf[DN_MAX_BONUSLEN] = {0};
ASSERT(list_head(list) == NULL || dmu_objset_userused_enabled(os));
while (dn = list_head(list)) {
dmu_object_type_t bonustype;
ASSERT(!DMU_OBJECT_IS_SPECIAL(dn->dn_object));
ASSERT(dn->dn_oldphys);
ASSERT(dn->dn_phys->dn_type == DMU_OT_NONE ||
dn->dn_phys->dn_flags &
DNODE_FLAG_USERUSED_ACCOUNTED);
/* Allocate the user/groupused objects if necessary. */
if (os->os_userused_dnode->dn_type == DMU_OT_NONE) {
VERIFY(0 == zap_create_claim(&os->os,
DMU_USERUSED_OBJECT,
DMU_OT_USERGROUP_USED, DMU_OT_NONE, 0, tx));
VERIFY(0 == zap_create_claim(&os->os,
DMU_GROUPUSED_OBJECT,
DMU_OT_USERGROUP_USED, DMU_OT_NONE, 0, tx));
}
/*
* If the object was not previously
* accounted, pretend that it was free.
*/
if (!(dn->dn_oldphys->dn_flags &
DNODE_FLAG_USERUSED_ACCOUNTED)) {
bzero(dn->dn_oldphys, sizeof (dnode_phys_t));
}
/*
* If the object was freed, use the previous bonustype.
*/
bonustype = dn->dn_phys->dn_bonustype ?
dn->dn_phys->dn_bonustype : dn->dn_oldphys->dn_bonustype;
ASSERT(dn->dn_phys->dn_type != 0 ||
(bcmp(DN_BONUS(dn->dn_phys), zerobuf,
DN_MAX_BONUSLEN) == 0 &&
DN_USED_BYTES(dn->dn_phys) == 0));
ASSERT(dn->dn_oldphys->dn_type != 0 ||
(bcmp(DN_BONUS(dn->dn_oldphys), zerobuf,
DN_MAX_BONUSLEN) == 0 &&
DN_USED_BYTES(dn->dn_oldphys) == 0));
used_cbs[os->os_phys->os_type](&os->os, bonustype,
DN_BONUS(dn->dn_oldphys), DN_BONUS(dn->dn_phys),
DN_USED_BYTES(dn->dn_oldphys),
DN_USED_BYTES(dn->dn_phys), tx);
/*
* The mutex is needed here for interlock with dnode_allocate.
*/
mutex_enter(&dn->dn_mtx);
zio_buf_free(dn->dn_oldphys, sizeof (dnode_phys_t));
dn->dn_oldphys = NULL;
mutex_exit(&dn->dn_mtx);
list_remove(list, dn);
dnode_rele(dn, list);
}
}
boolean_t
dmu_objset_userspace_present(objset_t *os)
{
return (os->os->os_phys->os_flags &
OBJSET_FLAG_USERACCOUNTING_COMPLETE);
}
int
dmu_objset_userspace_upgrade(objset_t *os)
{
uint64_t obj;
int err = 0;
if (dmu_objset_userspace_present(os))
return (0);
if (!dmu_objset_userused_enabled(os->os))
return (ENOTSUP);
if (dmu_objset_is_snapshot(os))
return (EINVAL);
/*
* We simply need to mark every object dirty, so that it will be
* synced out and now accounted. If this is called
* concurrently, or if we already did some work before crashing,
* that's fine, since we track each object's accounted state
* independently.
*/
for (obj = 0; err == 0; err = dmu_object_next(os, &obj, FALSE, 0)) {
dmu_tx_t *tx;
dmu_buf_t *db;
int objerr;
if (issig(JUSTLOOKING) && issig(FORREAL))
return (EINTR);
objerr = dmu_bonus_hold(os, obj, FTAG, &db);
if (objerr)
continue;
tx = dmu_tx_create(os);
dmu_tx_hold_bonus(tx, obj);
objerr = dmu_tx_assign(tx, TXG_WAIT);
if (objerr) {
dmu_tx_abort(tx);
continue;
}
dmu_buf_will_dirty(db, tx);
dmu_buf_rele(db, FTAG);
dmu_tx_commit(tx);
}
os->os->os_flags |= OBJSET_FLAG_USERACCOUNTING_COMPLETE;
txg_wait_synced(dmu_objset_pool(os), 0);
return (0);
}
void
dmu_objset_space(objset_t *os, uint64_t *refdbytesp, uint64_t *availbytesp,
uint64_t *usedobjsp, uint64_t *availobjsp)
@ -978,6 +1190,8 @@ dmu_objset_stats(objset_t *os, nvlist_t *nv)
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_TYPE,
os->os->os_phys->os_type);
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_USERACCOUNTING,
dmu_objset_userspace_present(os));
}
int

View File

@ -180,7 +180,9 @@ backup_cb(spa_t *spa, blkptr_t *bp, const zbookmark_t *zb,
if (issig(JUSTLOOKING) && issig(FORREAL))
return (EINTR);
if (bp == NULL && zb->zb_object == 0) {
if (zb->zb_object != 0 && DMU_OBJECT_IS_SPECIAL(zb->zb_object)) {
return (0);
} else if (bp == NULL && zb->zb_object == 0) {
uint64_t span = BP_SPAN(dnp, zb->zb_level);
uint64_t dnobj = (zb->zb_blkid * span) >> DNODE_SHIFT;
err = dump_freeobjects(ba, dnobj, span >> DNODE_SHIFT);

View File

@ -64,6 +64,9 @@ struct traverse_data {
void *td_arg;
};
static int traverse_dnode(struct traverse_data *td, const dnode_phys_t *dnp,
arc_buf_t *buf, uint64_t objset, uint64_t object);
/* ARGSUSED */
static void
traverse_zil_block(zilog_t *zilog, blkptr_t *bp, void *arg, uint64_t claim_txg)
@ -119,7 +122,7 @@ traverse_zil(struct traverse_data *td, zil_header_t *zh)
* We only want to visit blocks that have been claimed but not yet
* replayed (or, in read-only mode, blocks that *would* be claimed).
*/
if (claim_txg == 0 && (spa_mode & FWRITE))
if (claim_txg == 0 && spa_writeable(td->td_spa))
return;
zilog = zil_alloc(spa_get_dsl(td->td_spa)->dp_meta_objset, zh);
@ -189,7 +192,7 @@ traverse_visitbp(struct traverse_data *td, const dnode_phys_t *dnp,
}
} else if (BP_GET_TYPE(bp) == DMU_OT_DNODE) {
uint32_t flags = ARC_WAIT;
int i, j;
int i;
int epb = BP_GET_LSIZE(bp) >> DNODE_SHIFT;
err = arc_read(NULL, td->td_spa, bp, pbuf,
@ -201,20 +204,15 @@ traverse_visitbp(struct traverse_data *td, const dnode_phys_t *dnp,
/* recursively visitbp() blocks below this */
dnp = buf->b_data;
for (i = 0; i < epb && err == 0; i++, dnp++) {
for (j = 0; j < dnp->dn_nblkptr; j++) {
SET_BOOKMARK(&czb, zb->zb_objset,
zb->zb_blkid * epb + i,
dnp->dn_nlevels - 1, j);
err = traverse_visitbp(td, dnp, buf,
(blkptr_t *)&dnp->dn_blkptr[j], &czb);
if (err)
break;
}
err = traverse_dnode(td, dnp, buf, zb->zb_objset,
zb->zb_blkid * epb + i);
if (err)
break;
}
} else if (BP_GET_TYPE(bp) == DMU_OT_OBJSET) {
uint32_t flags = ARC_WAIT;
objset_phys_t *osp;
int j;
dnode_phys_t *dnp;
err = arc_read_nolock(NULL, td->td_spa, bp,
arc_getbuf_func, &buf,
@ -225,14 +223,17 @@ traverse_visitbp(struct traverse_data *td, const dnode_phys_t *dnp,
osp = buf->b_data;
traverse_zil(td, &osp->os_zil_header);
for (j = 0; j < osp->os_meta_dnode.dn_nblkptr; j++) {
SET_BOOKMARK(&czb, zb->zb_objset, 0,
osp->os_meta_dnode.dn_nlevels - 1, j);
err = traverse_visitbp(td, &osp->os_meta_dnode, buf,
(blkptr_t *)&osp->os_meta_dnode.dn_blkptr[j],
&czb);
if (err)
break;
dnp = &osp->os_meta_dnode;
err = traverse_dnode(td, dnp, buf, zb->zb_objset, 0);
if (err == 0 && arc_buf_size(buf) >= sizeof (objset_phys_t)) {
dnp = &osp->os_userused_dnode;
err = traverse_dnode(td, dnp, buf, zb->zb_objset,
DMU_USERUSED_OBJECT);
}
if (err == 0 && arc_buf_size(buf) >= sizeof (objset_phys_t)) {
dnp = &osp->os_groupused_dnode;
err = traverse_dnode(td, dnp, buf, zb->zb_objset,
DMU_GROUPUSED_OBJECT);
}
}
@ -245,6 +246,23 @@ traverse_visitbp(struct traverse_data *td, const dnode_phys_t *dnp,
return (err);
}
static int
traverse_dnode(struct traverse_data *td, const dnode_phys_t *dnp,
arc_buf_t *buf, uint64_t objset, uint64_t object)
{
int j, err = 0;
zbookmark_t czb;
for (j = 0; j < dnp->dn_nblkptr; j++) {
SET_BOOKMARK(&czb, objset, object, dnp->dn_nlevels - 1, j);
err = traverse_visitbp(td, dnp, buf,
(blkptr_t *)&dnp->dn_blkptr[j], &czb);
if (err)
break;
}
return (err);
}
/* ARGSUSED */
static int
traverse_prefetcher(spa_t *spa, blkptr_t *bp, const zbookmark_t *zb,

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -160,6 +160,41 @@ dmu_tx_check_ioerr(zio_t *zio, dnode_t *dn, int level, uint64_t blkid)
return (err);
}
static void
dmu_tx_count_indirects(dmu_tx_hold_t *txh, dmu_buf_impl_t *db,
boolean_t freeable, dmu_buf_impl_t **history)
{
int i = db->db_level + 1;
dnode_t *dn = db->db_dnode;
if (i >= dn->dn_nlevels)
return;
db = db->db_parent;
if (db == NULL) {
uint64_t lvls = dn->dn_nlevels - i;
txh->txh_space_towrite += lvls << dn->dn_indblkshift;
return;
}
if (db != history[i]) {
dsl_dataset_t *ds = dn->dn_objset->os_dsl_dataset;
uint64_t space = 1ULL << dn->dn_indblkshift;
freeable = (db->db_blkptr && (freeable ||
dsl_dataset_block_freeable(ds, db->db_blkptr->blk_birth)));
if (freeable)
txh->txh_space_tooverwrite += space;
else
txh->txh_space_towrite += space;
if (db->db_blkptr)
txh->txh_space_tounref += space;
history[i] = db;
dmu_tx_count_indirects(txh, db, freeable, history);
}
}
/* ARGSUSED */
static void
dmu_tx_count_write(dmu_tx_hold_t *txh, uint64_t off, uint64_t len)
@ -177,17 +212,26 @@ dmu_tx_count_write(dmu_tx_hold_t *txh, uint64_t off, uint64_t len)
min_ibs = DN_MIN_INDBLKSHIFT;
max_ibs = DN_MAX_INDBLKSHIFT;
/*
* For i/o error checking, read the first and last level-0
* blocks (if they are not aligned), and all the level-1 blocks.
*/
if (dn) {
dmu_buf_impl_t *last[DN_MAX_LEVELS];
int nlvls = dn->dn_nlevels;
int delta;
/*
* For i/o error checking, read the first and last level-0
* blocks (if they are not aligned), and all the level-1 blocks.
*/
if (dn->dn_maxblkid == 0) {
err = dmu_tx_check_ioerr(NULL, dn, 0, 0);
if (err)
goto out;
delta = dn->dn_datablksz;
start = (off < dn->dn_datablksz) ? 0 : 1;
end = (off+len <= dn->dn_datablksz) ? 0 : 1;
if (start == 0 && (off > 0 || len < dn->dn_datablksz)) {
err = dmu_tx_check_ioerr(NULL, dn, 0, 0);
if (err)
goto out;
delta -= off;
}
} else {
zio_t *zio = zio_root(dn->dn_objset->os_spa,
NULL, NULL, ZIO_FLAG_CANFAIL);
@ -211,10 +255,9 @@ dmu_tx_count_write(dmu_tx_hold_t *txh, uint64_t off, uint64_t len)
}
/* level-1 blocks */
if (dn->dn_nlevels > 1) {
start >>= dn->dn_indblkshift - SPA_BLKPTRSHIFT;
end >>= dn->dn_indblkshift - SPA_BLKPTRSHIFT;
for (i = start+1; i < end; i++) {
if (nlvls > 1) {
int shft = dn->dn_indblkshift - SPA_BLKPTRSHIFT;
for (i = (start>>shft)+1; i < end>>shft; i++) {
err = dmu_tx_check_ioerr(zio, dn, 1, i);
if (err)
goto out;
@ -224,20 +267,70 @@ dmu_tx_count_write(dmu_tx_hold_t *txh, uint64_t off, uint64_t len)
err = zio_wait(zio);
if (err)
goto out;
delta = P2NPHASE(off, dn->dn_datablksz);
}
}
/*
* If there's more than one block, the blocksize can't change,
* so we can make a more precise estimate. Alternatively,
* if the dnode's ibs is larger than max_ibs, always use that.
* This ensures that if we reduce DN_MAX_INDBLKSHIFT,
* the code will still work correctly on existing pools.
*/
if (dn && (dn->dn_maxblkid != 0 || dn->dn_indblkshift > max_ibs)) {
min_ibs = max_ibs = dn->dn_indblkshift;
if (dn->dn_datablkshift != 0)
if (dn->dn_maxblkid > 0) {
/*
* The blocksize can't change,
* so we can make a more precise estimate.
*/
ASSERT(dn->dn_datablkshift != 0);
min_bs = max_bs = dn->dn_datablkshift;
min_ibs = max_ibs = dn->dn_indblkshift;
} else if (dn->dn_indblkshift > max_ibs) {
/*
* This ensures that if we reduce DN_MAX_INDBLKSHIFT,
* the code will still work correctly on older pools.
*/
min_ibs = max_ibs = dn->dn_indblkshift;
}
/*
* If this write is not off the end of the file
* we need to account for overwrites/unref.
*/
if (start <= dn->dn_maxblkid)
bzero(last, sizeof (dmu_buf_impl_t *) * DN_MAX_LEVELS);
while (start <= dn->dn_maxblkid) {
spa_t *spa = txh->txh_tx->tx_pool->dp_spa;
dsl_dataset_t *ds = dn->dn_objset->os_dsl_dataset;
dmu_buf_impl_t *db;
rw_enter(&dn->dn_struct_rwlock, RW_READER);
db = dbuf_hold_level(dn, 0, start, FTAG);
rw_exit(&dn->dn_struct_rwlock);
if (db->db_blkptr && dsl_dataset_block_freeable(ds,
db->db_blkptr->blk_birth)) {
dprintf_bp(db->db_blkptr, "can free old%s", "");
txh->txh_space_tooverwrite += dn->dn_datablksz;
txh->txh_space_tounref += dn->dn_datablksz;
dmu_tx_count_indirects(txh, db, TRUE, last);
} else {
txh->txh_space_towrite += dn->dn_datablksz;
if (db->db_blkptr)
txh->txh_space_tounref +=
bp_get_dasize(spa, db->db_blkptr);
dmu_tx_count_indirects(txh, db, FALSE, last);
}
dbuf_rele(db, FTAG);
if (++start > end) {
/*
* Account for new indirects appearing
* before this IO gets assigned into a txg.
*/
bits = 64 - min_bs;
epbs = min_ibs - SPA_BLKPTRSHIFT;
for (bits -= epbs * (nlvls - 1);
bits >= 0; bits -= epbs)
txh->txh_fudge += 1ULL << max_ibs;
goto out;
}
off += delta;
if (len >= delta)
len -= delta;
delta = dn->dn_datablksz;
}
}
/*
@ -260,20 +353,22 @@ dmu_tx_count_write(dmu_tx_hold_t *txh, uint64_t off, uint64_t len)
for (bits = 64 - min_bs; bits >= 0; bits -= epbs) {
start >>= epbs;
end >>= epbs;
/*
* If we increase the number of levels of indirection,
* we'll need new blkid=0 indirect blocks. If start == 0,
* we're already accounting for that blocks; and if end == 0,
* we can't increase the number of levels beyond that.
*/
if (start != 0 && end != 0)
txh->txh_space_towrite += 1ULL << max_ibs;
ASSERT3U(end, >=, start);
txh->txh_space_towrite += (end - start + 1) << max_ibs;
if (start != 0) {
/*
* We also need a new blkid=0 indirect block
* to reference any existing file data.
*/
txh->txh_space_towrite += 1ULL << max_ibs;
}
}
ASSERT(txh->txh_space_towrite < 2 * DMU_MAX_ACCESS);
out:
if (txh->txh_space_towrite + txh->txh_space_tooverwrite >
2 * DMU_MAX_ACCESS)
err = EFBIG;
if (err)
txh->txh_tx->tx_err = err;
}
@ -290,6 +385,7 @@ dmu_tx_count_dnode(dmu_tx_hold_t *txh)
dsl_dataset_block_freeable(dn->dn_objset->os_dsl_dataset,
dn->dn_dbuf->db_blkptr->blk_birth)) {
txh->txh_space_tooverwrite += space;
txh->txh_space_tounref += space;
} else {
txh->txh_space_towrite += space;
if (dn && dn->dn_dbuf->db_blkptr)
@ -533,7 +629,7 @@ dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object, uint64_t off, uint64_t len)
}
void
dmu_tx_hold_zap(dmu_tx_t *tx, uint64_t object, int add, char *name)
dmu_tx_hold_zap(dmu_tx_t *tx, uint64_t object, int add, const char *name)
{
dmu_tx_hold_t *txh;
dnode_t *dn;
@ -601,12 +697,8 @@ dmu_tx_hold_zap(dmu_tx_t *tx, uint64_t object, int add, char *name)
}
}
/*
* 3 blocks overwritten: target leaf, ptrtbl block, header block
* 3 new blocks written if adding: new split leaf, 2 grown ptrtbl blocks
*/
dmu_tx_count_write(txh, dn->dn_maxblkid * dn->dn_datablksz,
(3 + (add ? 3 : 0)) << dn->dn_datablkshift);
err = zap_count_write(&dn->dn_objset->os, dn->dn_object, name, add,
&txh->txh_space_towrite, &txh->txh_space_tooverwrite);
/*
* If the modified blocks are scattered to the four winds,
@ -614,7 +706,10 @@ dmu_tx_hold_zap(dmu_tx_t *tx, uint64_t object, int add, char *name)
*/
epbs = dn->dn_indblkshift - SPA_BLKPTRSHIFT;
for (nblocks = dn->dn_maxblkid >> epbs; nblocks != 0; nblocks >>= epbs)
txh->txh_space_towrite += 3 << dn->dn_indblkshift;
if (dn->dn_objset->os_dsl_dataset->ds_phys->ds_prev_snap_obj)
txh->txh_space_towrite += 3 << dn->dn_indblkshift;
else
txh->txh_space_tooverwrite += 3 << dn->dn_indblkshift;
}
void

View File

@ -156,7 +156,7 @@ dnode_verify(dnode_t *dn)
}
if (dn->dn_phys->dn_type != DMU_OT_NONE)
ASSERT3U(dn->dn_phys->dn_nlevels, <=, dn->dn_nlevels);
ASSERT(dn->dn_object == DMU_META_DNODE_OBJECT || dn->dn_dbuf != NULL);
ASSERT(DMU_OBJECT_IS_SPECIAL(dn->dn_object) || dn->dn_dbuf != NULL);
if (dn->dn_dbuf != NULL) {
ASSERT3P(dn->dn_phys, ==,
(dnode_phys_t *)dn->dn_dbuf->db.db_data +
@ -320,6 +320,7 @@ dnode_destroy(dnode_t *dn)
}
ASSERT(NULL == list_head(&dn->dn_dbufs));
#endif
ASSERT(dn->dn_oldphys == NULL);
mutex_enter(&os->os_lock);
list_remove(&os->os_dnodes, dn);
@ -550,6 +551,22 @@ dnode_hold_impl(objset_impl_t *os, uint64_t object, int flag,
*/
ASSERT(spa_config_held(os->os_spa, SCL_ALL, RW_WRITER) == 0);
if (object == DMU_USERUSED_OBJECT || object == DMU_GROUPUSED_OBJECT) {
dn = (object == DMU_USERUSED_OBJECT) ?
os->os_userused_dnode : os->os_groupused_dnode;
if (dn == NULL)
return (ENOENT);
type = dn->dn_type;
if ((flag & DNODE_MUST_BE_ALLOCATED) && type == DMU_OT_NONE)
return (ENOENT);
if ((flag & DNODE_MUST_BE_FREE) && type != DMU_OT_NONE)
return (EEXIST);
DNODE_VERIFY(dn);
(void) refcount_add(&dn->dn_holds, tag);
*dnp = dn;
return (0);
}
if (object == 0 || object >= DN_MAX_OBJECT)
return (EINVAL);
@ -608,7 +625,8 @@ dnode_hold_impl(objset_impl_t *os, uint64_t object, int flag,
type = dn->dn_type;
if (dn->dn_free_txg ||
((flag & DNODE_MUST_BE_ALLOCATED) && type == DMU_OT_NONE) ||
((flag & DNODE_MUST_BE_FREE) && type != DMU_OT_NONE)) {
((flag & DNODE_MUST_BE_FREE) &&
(type != DMU_OT_NONE || dn->dn_oldphys))) {
mutex_exit(&dn->dn_mtx);
dbuf_rele(db, FTAG);
return (type == DMU_OT_NONE ? ENOENT : EEXIST);
@ -673,8 +691,10 @@ dnode_setdirty(dnode_t *dn, dmu_tx_t *tx)
objset_impl_t *os = dn->dn_objset;
uint64_t txg = tx->tx_txg;
if (dn->dn_object == DMU_META_DNODE_OBJECT)
if (DMU_OBJECT_IS_SPECIAL(dn->dn_object)) {
dsl_dataset_dirty(os->os_dsl_dataset, tx);
return;
}
DNODE_VERIFY(dn);
@ -1270,7 +1290,7 @@ dnode_next_offset_level(dnode_t *dn, int flags, uint64_t *offset,
dprintf("probing object %llu offset %llx level %d of %u\n",
dn->dn_object, *offset, lvl, dn->dn_phys->dn_nlevels);
hole = flags & DNODE_FIND_HOLE;
hole = ((flags & DNODE_FIND_HOLE) != 0);
inc = (flags & DNODE_FIND_BACKWARDS) ? -1 : 1;
ASSERT(txg == 0 || !hole);

View File

@ -506,9 +506,6 @@ dnode_sync_free(dnode_t *dn, dmu_tx_t *tx)
/*
* Write out the dnode's dirty buffers.
*
* NOTE: The dnode is kept in memory by being dirty. Once the
* dirty bit is cleared, it may be evicted. Beware of this!
*/
void
dnode_sync(dnode_t *dn, dmu_tx_t *tx)
@ -517,20 +514,33 @@ dnode_sync(dnode_t *dn, dmu_tx_t *tx)
dnode_phys_t *dnp = dn->dn_phys;
int txgoff = tx->tx_txg & TXG_MASK;
list_t *list = &dn->dn_dirty_records[txgoff];
static const dnode_phys_t zerodn = { 0 };
ASSERT(dmu_tx_is_syncing(tx));
ASSERT(dnp->dn_type != DMU_OT_NONE || dn->dn_allocated_txg);
ASSERT(dnp->dn_type != DMU_OT_NONE ||
bcmp(dnp, &zerodn, DNODE_SIZE) == 0);
DNODE_VERIFY(dn);
ASSERT(dn->dn_dbuf == NULL || arc_released(dn->dn_dbuf->db_buf));
if (dmu_objset_userused_enabled(dn->dn_objset) &&
!DMU_OBJECT_IS_SPECIAL(dn->dn_object)) {
ASSERT(dn->dn_oldphys == NULL);
dn->dn_oldphys = zio_buf_alloc(sizeof (dnode_phys_t));
*dn->dn_oldphys = *dn->dn_phys; /* struct assignment */
dn->dn_phys->dn_flags |= DNODE_FLAG_USERUSED_ACCOUNTED;
} else {
/* Once we account for it, we should always account for it. */
ASSERT(!(dn->dn_phys->dn_flags &
DNODE_FLAG_USERUSED_ACCOUNTED));
}
mutex_enter(&dn->dn_mtx);
if (dn->dn_allocated_txg == tx->tx_txg) {
/* The dnode is newly allocated or reallocated */
if (dnp->dn_type == DMU_OT_NONE) {
/* this is a first alloc, not a realloc */
/* XXX shouldn't the phys already be zeroed? */
bzero(dnp, DNODE_CORE_SIZE);
dnp->dn_nlevels = 1;
dnp->dn_nblkptr = dn->dn_nblkptr;
}
@ -628,7 +638,7 @@ dnode_sync(dnode_t *dn, dmu_tx_t *tx)
dbuf_sync_list(list, tx);
if (dn->dn_object != DMU_META_DNODE_OBJECT) {
if (!DMU_OBJECT_IS_SPECIAL(dn->dn_object)) {
ASSERT3P(list_head(list), ==, NULL);
dnode_rele(dn, (void *)(uintptr_t)tx->tx_txg);
}

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Copyright 2010 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -229,7 +229,7 @@ dsl_dataset_prev_snap_txg(dsl_dataset_t *ds)
return (MAX(ds->ds_phys->ds_prev_snap_txg, trysnap));
}
int
boolean_t
dsl_dataset_block_freeable(dsl_dataset_t *ds, uint64_t blk_birth)
{
return (blk_birth > dsl_dataset_prev_snap_txg(ds));
@ -525,7 +525,15 @@ dsl_dataset_hold_ref(dsl_dataset_t *ds, void *tag)
rw_enter(&dp->dp_config_rwlock, RW_READER);
return (ENOENT);
}
/*
* The dp_config_rwlock lives above the ds_lock. And
* we need to check DSL_DATASET_IS_DESTROYED() while
* holding the ds_lock, so we have to drop and reacquire
* the ds_lock here.
*/
mutex_exit(&ds->ds_lock);
rw_enter(&dp->dp_config_rwlock, RW_READER);
mutex_enter(&ds->ds_lock);
}
mutex_exit(&ds->ds_lock);
return (0);
@ -981,6 +989,27 @@ dsl_dataset_destroy(dsl_dataset_t *ds, void *tag)
(void) dmu_free_object(os, obj);
}
/*
* We need to sync out all in-flight IO before we try to evict
* (the dataset evict func is trying to clear the cached entries
* for this dataset in the ARC).
*/
txg_wait_synced(dd->dd_pool, 0);
/*
* If we managed to free all the objects in open
* context, the user space accounting should be zero.
*/
if (ds->ds_phys->ds_bp.blk_fill == 0 &&
dmu_objset_userused_enabled(os->os)) {
uint64_t count;
ASSERT(zap_count(os, DMU_USERUSED_OBJECT, &count) != 0 ||
count == 0);
ASSERT(zap_count(os, DMU_GROUPUSED_OBJECT, &count) != 0 ||
count == 0);
}
dmu_objset_close(os);
if (err != ESRCH)
goto out;
@ -1065,7 +1094,6 @@ dsl_dataset_get_user_ptr(dsl_dataset_t *ds)
return (ds->ds_user_ptr);
}
blkptr_t *
dsl_dataset_get_blkptr(dsl_dataset_t *ds)
{
@ -1445,6 +1473,33 @@ dsl_dataset_drain_refs(dsl_dataset_t *ds, void *tag)
cv_destroy(&arg.cv);
}
static void
remove_from_next_clones(dsl_dataset_t *ds, uint64_t obj, dmu_tx_t *tx)
{
objset_t *mos = ds->ds_dir->dd_pool->dp_meta_objset;
uint64_t count;
int err;
ASSERT(ds->ds_phys->ds_num_children >= 2);
err = zap_remove_int(mos, ds->ds_phys->ds_next_clones_obj, obj, tx);
/*
* The err should not be ENOENT, but a bug in a previous version
* of the code could cause upgrade_clones_cb() to not set
* ds_next_snap_obj when it should, leading to a missing entry.
* If we knew that the pool was created after
* SPA_VERSION_NEXT_CLONES, we could assert that it isn't
* ENOENT. However, at least we can check that we don't have
* too many entries in the next_clones_obj even after failing to
* remove this one.
*/
if (err != ENOENT) {
VERIFY3U(err, ==, 0);
}
ASSERT3U(0, ==, zap_count(mos, ds->ds_phys->ds_next_clones_obj,
&count));
ASSERT3U(count, <=, ds->ds_phys->ds_num_children - 2);
}
void
dsl_dataset_destroy_sync(void *arg1, void *tag, cred_t *cr, dmu_tx_t *tx)
{
@ -1495,8 +1550,7 @@ dsl_dataset_destroy_sync(void *arg1, void *tag, cred_t *cr, dmu_tx_t *tx)
dmu_buf_will_dirty(ds_prev->ds_dbuf, tx);
if (after_branch_point &&
ds_prev->ds_phys->ds_next_clones_obj != 0) {
VERIFY(0 == zap_remove_int(mos,
ds_prev->ds_phys->ds_next_clones_obj, obj, tx));
remove_from_next_clones(ds_prev, obj, tx);
if (ds->ds_phys->ds_next_snap_obj != 0) {
VERIFY(0 == zap_add_int(mos,
ds_prev->ds_phys->ds_next_clones_obj,
@ -1852,8 +1906,8 @@ dsl_dataset_snapshot_sync(void *arg1, void *arg2, cred_t *cr, dmu_tx_t *tx)
ds->ds_prev->ds_phys->ds_creation_txg);
ds->ds_prev->ds_phys->ds_next_snap_obj = dsobj;
} else if (next_clones_obj != 0) {
VERIFY3U(0, ==, zap_remove_int(mos,
next_clones_obj, dsphys->ds_next_snap_obj, tx));
remove_from_next_clones(ds->ds_prev,
dsphys->ds_next_snap_obj, tx);
VERIFY3U(0, ==, zap_add_int(mos,
next_clones_obj, dsobj, tx));
}
@ -1962,6 +2016,9 @@ dsl_dataset_fast_stat(dsl_dataset_t *ds, dmu_objset_stats_t *stat)
if (ds->ds_phys->ds_next_snap_obj) {
stat->dds_is_snapshot = B_TRUE;
stat->dds_num_clones = ds->ds_phys->ds_num_children - 1;
} else {
stat->dds_is_snapshot = B_FALSE;
stat->dds_num_clones = 0;
}
/* clone origin is really a dsl_dir thing... */
@ -1973,6 +2030,8 @@ dsl_dataset_fast_stat(dsl_dataset_t *ds, dmu_objset_stats_t *stat)
ds->ds_dir->dd_phys->dd_origin_obj, FTAG, &ods));
dsl_dataset_name(ods, stat->dds_origin);
dsl_dataset_drop_ref(ods, FTAG);
} else {
stat->dds_origin[0] = '\0';
}
rw_exit(&ds->ds_dir->dd_pool->dp_config_rwlock);
}
@ -2439,9 +2498,7 @@ dsl_dataset_promote_sync(void *arg1, void *arg2, cred_t *cr, dmu_tx_t *tx)
/* change the origin's next clone */
if (origin_ds->ds_phys->ds_next_clones_obj) {
VERIFY3U(0, ==, zap_remove_int(dp->dp_meta_objset,
origin_ds->ds_phys->ds_next_clones_obj,
origin_ds->ds_phys->ds_next_snap_obj, tx));
remove_from_next_clones(origin_ds, snap->ds->ds_object, tx);
VERIFY3U(0, ==, zap_add_int(dp->dp_meta_objset,
origin_ds->ds_phys->ds_next_clones_obj,
oldnext_obj, tx));
@ -3039,12 +3096,8 @@ dsl_dataset_set_reservation_check(void *arg1, void *arg2, dmu_tx_t *tx)
dsl_dataset_t *ds = arg1;
uint64_t *reservationp = arg2;
uint64_t new_reservation = *reservationp;
int64_t delta;
uint64_t unique;
if (new_reservation > INT64_MAX)
return (EOVERFLOW);
if (spa_version(ds->ds_dir->dd_pool->dp_spa) <
SPA_VERSION_REFRESERVATION)
return (ENOTSUP);
@ -3061,15 +3114,18 @@ dsl_dataset_set_reservation_check(void *arg1, void *arg2, dmu_tx_t *tx)
mutex_enter(&ds->ds_lock);
unique = dsl_dataset_unique(ds);
delta = MAX(unique, new_reservation) - MAX(unique, ds->ds_reserved);
mutex_exit(&ds->ds_lock);
if (delta > 0 &&
delta > dsl_dir_space_available(ds->ds_dir, NULL, 0, TRUE))
return (ENOSPC);
if (delta > 0 && ds->ds_quota > 0 &&
new_reservation > ds->ds_quota)
return (ENOSPC);
if (MAX(unique, new_reservation) > MAX(unique, ds->ds_reserved)) {
uint64_t delta = MAX(unique, new_reservation) -
MAX(unique, ds->ds_reserved);
if (delta > dsl_dir_space_available(ds->ds_dir, NULL, 0, TRUE))
return (ENOSPC);
if (ds->ds_quota > 0 &&
new_reservation > ds->ds_quota)
return (ENOSPC);
}
return (0);
}

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -226,24 +226,11 @@ dsl_dir_namelen(dsl_dir_t *dd)
return (result);
}
int
dsl_dir_is_private(dsl_dir_t *dd)
{
int rv = FALSE;
if (dd->dd_parent && dsl_dir_is_private(dd->dd_parent))
rv = TRUE;
if (dataset_name_hidden(dd->dd_myname))
rv = TRUE;
return (rv);
}
static int
getcomponent(const char *path, char *component, const char **nextp)
{
char *p;
if (path == NULL)
if ((path == NULL) || (path[0] == '\0'))
return (ENOENT);
/* This would be a good place to reserve some namespace... */
p = strpbrk(path, "/@");
@ -1076,10 +1063,6 @@ dsl_dir_set_reservation_check(void *arg1, void *arg2, dmu_tx_t *tx)
uint64_t *reservationp = arg2;
uint64_t new_reservation = *reservationp;
uint64_t used, avail;
int64_t delta;
if (new_reservation > INT64_MAX)
return (EOVERFLOW);
/*
* If we are doing the preliminary check in open context, the
@ -1090,8 +1073,6 @@ dsl_dir_set_reservation_check(void *arg1, void *arg2, dmu_tx_t *tx)
mutex_enter(&dd->dd_lock);
used = dd->dd_phys->dd_used_bytes;
delta = MAX(used, new_reservation) -
MAX(used, dd->dd_phys->dd_reserved);
mutex_exit(&dd->dd_lock);
if (dd->dd_parent) {
@ -1101,11 +1082,17 @@ dsl_dir_set_reservation_check(void *arg1, void *arg2, dmu_tx_t *tx)
avail = dsl_pool_adjustedsize(dd->dd_pool, B_FALSE) - used;
}
if (delta > 0 && delta > avail)
return (ENOSPC);
if (delta > 0 && dd->dd_phys->dd_quota > 0 &&
new_reservation > dd->dd_phys->dd_quota)
return (ENOSPC);
if (MAX(used, new_reservation) > MAX(used, dd->dd_phys->dd_reserved)) {
uint64_t delta = MAX(used, new_reservation) -
MAX(used, dd->dd_phys->dd_reserved);
if (delta > avail)
return (ENOSPC);
if (dd->dd_phys->dd_quota > 0 &&
new_reservation > dd->dd_phys->dd_quota)
return (ENOSPC);
}
return (0);
}

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -133,14 +133,15 @@ dsl_pool_open(spa_t *spa, uint64_t txg, dsl_pool_t **dpp)
goto out;
err = dsl_dataset_hold_obj(dp, dd->dd_phys->dd_head_dataset_obj,
FTAG, &ds);
if (err)
goto out;
err = dsl_dataset_hold_obj(dp, ds->ds_phys->ds_prev_snap_obj,
dp, &dp->dp_origin_snap);
if (err)
goto out;
dsl_dataset_rele(ds, FTAG);
if (err == 0) {
err = dsl_dataset_hold_obj(dp,
ds->ds_phys->ds_prev_snap_obj, dp,
&dp->dp_origin_snap);
dsl_dataset_rele(ds, FTAG);
}
dsl_dir_close(dd, dp);
if (err)
goto out;
}
/* get scrub status */
@ -303,23 +304,51 @@ dsl_pool_sync(dsl_pool_t *dp, uint64_t txg)
dp->dp_read_overhead = 0;
start = gethrtime();
zio = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED);
while (ds = txg_list_remove(&dp->dp_dirty_datasets, txg)) {
if (!list_link_active(&ds->ds_synced_link))
list_insert_tail(&dp->dp_synced_datasets, ds);
else
dmu_buf_rele(ds->ds_dbuf, ds);
/*
* We must not sync any non-MOS datasets twice, because
* we may have taken a snapshot of them. However, we
* may sync newly-created datasets on pass 2.
*/
ASSERT(!list_link_active(&ds->ds_synced_link));
list_insert_tail(&dp->dp_synced_datasets, ds);
dsl_dataset_sync(ds, zio, tx);
}
DTRACE_PROBE(pool_sync__1setup);
err = zio_wait(zio);
write_time = gethrtime() - start;
ASSERT(err == 0);
DTRACE_PROBE(pool_sync__2rootzio);
while (dstg = txg_list_remove(&dp->dp_sync_tasks, txg))
for (ds = list_head(&dp->dp_synced_datasets); ds;
ds = list_next(&dp->dp_synced_datasets, ds))
dmu_objset_do_userquota_callbacks(ds->ds_user_ptr, tx);
/*
* Sync the datasets again to push out the changes due to
* userquota updates. This must be done before we process the
* sync tasks, because that could cause a snapshot of a dataset
* whose ds_bp will be rewritten when we do this 2nd sync.
*/
zio = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED);
while (ds = txg_list_remove(&dp->dp_dirty_datasets, txg)) {
ASSERT(list_link_active(&ds->ds_synced_link));
dmu_buf_rele(ds->ds_dbuf, ds);
dsl_dataset_sync(ds, zio, tx);
}
err = zio_wait(zio);
while (dstg = txg_list_remove(&dp->dp_sync_tasks, txg)) {
/*
* No more sync tasks should have been added while we
* were syncing.
*/
ASSERT(spa_sync_pass(dp->dp_spa) == 1);
dsl_sync_task_group_sync(dstg, tx);
}
DTRACE_PROBE(pool_sync__3task);
start = gethrtime();
@ -574,6 +603,7 @@ upgrade_clones_cb(spa_t *spa, uint64_t dsobj, const char *dsname, void *arg)
ASSERT(ds->ds_phys->ds_prev_snap_obj == prev->ds_object);
if (prev->ds_phys->ds_next_clones_obj == 0) {
dmu_buf_will_dirty(prev->ds_dbuf, tx);
prev->ds_phys->ds_next_clones_obj =
zap_create(dp->dp_meta_objset,
DMU_OT_NEXT_CLONES, DMU_OT_NONE, 0, tx);
@ -593,8 +623,8 @@ dsl_pool_upgrade_clones(dsl_pool_t *dp, dmu_tx_t *tx)
ASSERT(dmu_tx_is_syncing(tx));
ASSERT(dp->dp_origin_snap != NULL);
(void) dmu_objset_find_spa(dp->dp_spa, NULL, upgrade_clones_cb,
tx, DS_FIND_CHILDREN);
VERIFY3U(0, ==, dmu_objset_find_spa(dp->dp_spa, NULL, upgrade_clones_cb,
tx, DS_FIND_CHILDREN));
}
void

View File

@ -19,12 +19,10 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#pragma ident "%Z%%M% %I% %E% SMI"
#include <sys/dmu.h>
#include <sys/dmu_objset.h>
#include <sys/dmu_tx.h>
@ -415,6 +413,34 @@ dsl_prop_set_sync(void *arg1, void *arg2, cred_t *cr, dmu_tx_t *tx)
"%s=%s dataset = %llu", psa->name, valstr, ds->ds_object);
}
void
dsl_props_set_sync(void *arg1, void *arg2, cred_t *cr, dmu_tx_t *tx)
{
dsl_dataset_t *ds = arg1;
nvlist_t *nvl = arg2;
nvpair_t *elem = NULL;
while ((elem = nvlist_next_nvpair(nvl, elem)) != NULL) {
struct prop_set_arg psa;
psa.name = nvpair_name(elem);
if (nvpair_type(elem) == DATA_TYPE_STRING) {
VERIFY(nvpair_value_string(elem,
(char **)&psa.buf) == 0);
psa.intsz = 1;
psa.numints = strlen(psa.buf) + 1;
} else {
uint64_t intval;
VERIFY(nvpair_value_uint64(elem, &intval) == 0);
psa.intsz = sizeof (intval);
psa.numints = 1;
psa.buf = &intval;
}
dsl_prop_set_sync(ds, &psa, cr, tx);
}
}
void
dsl_prop_set_uint64_sync(dsl_dir_t *dd, const char *name, uint64_t val,
cred_t *cr, dmu_tx_t *tx)
@ -471,6 +497,43 @@ dsl_prop_set(const char *dsname, const char *propname,
return (err);
}
int
dsl_props_set(const char *dsname, nvlist_t *nvl)
{
dsl_dataset_t *ds;
nvpair_t *elem = NULL;
int err;
/*
* Do these checks before the syncfunc, since it can't fail.
*/
while ((elem = nvlist_next_nvpair(nvl, elem)) != NULL) {
if (strlen(nvpair_name(elem)) >= ZAP_MAXNAMELEN)
return (ENAMETOOLONG);
if (nvpair_type(elem) == DATA_TYPE_STRING) {
char *valstr;
VERIFY(nvpair_value_string(elem, &valstr) == 0);
if (strlen(valstr) >= ZAP_MAXVALUELEN)
return (E2BIG);
}
}
if (err = dsl_dataset_hold(dsname, FTAG, &ds))
return (err);
if (dsl_dataset_is_snapshot(ds) &&
spa_version(ds->ds_dir->dd_pool->dp_spa) < SPA_VERSION_SNAP_PROPS) {
dsl_dataset_rele(ds, FTAG);
return (ENOTSUP);
}
err = dsl_sync_task_do(ds->ds_dir->dd_pool,
NULL, dsl_props_set_sync, ds, nvl, 2);
dsl_dataset_rele(ds, FTAG);
return (err);
}
/*
* Iterate over all properties for this dataset and return them in an nvlist.
*/

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -45,6 +45,8 @@ typedef int (scrub_cb_t)(dsl_pool_t *, const blkptr_t *, const zbookmark_t *);
static scrub_cb_t dsl_pool_scrub_clean_cb;
static dsl_syncfunc_t dsl_pool_scrub_cancel_sync;
static void scrub_visitdnode(dsl_pool_t *dp, dnode_phys_t *dnp, arc_buf_t *buf,
uint64_t objset, uint64_t object);
int zfs_scrub_min_time = 1; /* scrub for at least 1 sec each txg */
int zfs_resilver_min_time = 3; /* resilver for at least 3 sec each txg */
@ -95,6 +97,9 @@ dsl_pool_scrub_setup_sync(void *arg1, void *arg2, cred_t *cr, dmu_tx_t *tx)
ESC_ZFS_RESILVER_START);
dp->dp_scrub_max_txg = MIN(dp->dp_scrub_max_txg,
tx->tx_txg);
} else {
spa_event_notify(dp->dp_spa, NULL,
ESC_ZFS_SCRUB_START);
}
/* zero out the scrub stats in all vdev_stat_t's */
@ -212,8 +217,9 @@ dsl_pool_scrub_cancel_sync(void *arg1, void *arg2, cred_t *cr, dmu_tx_t *tx)
*/
vdev_dtl_reassess(dp->dp_spa->spa_root_vdev, tx->tx_txg,
*completep ? dp->dp_scrub_max_txg : 0, B_TRUE);
if (dp->dp_scrub_min_txg && *completep)
spa_event_notify(dp->dp_spa, NULL, ESC_ZFS_RESILVER_FINISH);
if (*completep)
spa_event_notify(dp->dp_spa, NULL, dp->dp_scrub_min_txg ?
ESC_ZFS_RESILVER_FINISH : ESC_ZFS_SCRUB_FINISH);
spa_errlog_rotate(dp->dp_spa);
/*
@ -402,7 +408,7 @@ traverse_zil(dsl_pool_t *dp, zil_header_t *zh)
* We only want to visit blocks that have been claimed but not yet
* replayed (or, in read-only mode, blocks that *would* be claimed).
*/
if (claim_txg == 0 && (spa_mode & FWRITE))
if (claim_txg == 0 && spa_writeable(dp->dp_spa))
return;
zilog = zil_alloc(dp->dp_meta_objset, zh);
@ -420,9 +426,6 @@ scrub_visitbp(dsl_pool_t *dp, dnode_phys_t *dnp,
int err;
arc_buf_t *buf = NULL;
if (bp->blk_birth == 0)
return;
if (bp->blk_birth <= dp->dp_scrub_min_txg)
return;
@ -482,7 +485,7 @@ scrub_visitbp(dsl_pool_t *dp, dnode_phys_t *dnp,
} else if (BP_GET_TYPE(bp) == DMU_OT_DNODE) {
uint32_t flags = ARC_WAIT;
dnode_phys_t *child_dnp;
int i, j;
int i;
int epb = BP_GET_LSIZE(bp) >> DNODE_SHIFT;
err = arc_read(NULL, dp->dp_spa, bp, pbuf,
@ -497,20 +500,12 @@ scrub_visitbp(dsl_pool_t *dp, dnode_phys_t *dnp,
child_dnp = buf->b_data;
for (i = 0; i < epb; i++, child_dnp++) {
for (j = 0; j < child_dnp->dn_nblkptr; j++) {
zbookmark_t czb;
SET_BOOKMARK(&czb, zb->zb_objset,
zb->zb_blkid * epb + i,
child_dnp->dn_nlevels - 1, j);
scrub_visitbp(dp, child_dnp, buf,
&child_dnp->dn_blkptr[j], &czb);
}
scrub_visitdnode(dp, child_dnp, buf, zb->zb_objset,
zb->zb_blkid * epb + i);
}
} else if (BP_GET_TYPE(bp) == DMU_OT_OBJSET) {
uint32_t flags = ARC_WAIT;
objset_phys_t *osp;
int j;
err = arc_read_nolock(NULL, dp->dp_spa, bp,
arc_getbuf_func, &buf,
@ -526,13 +521,13 @@ scrub_visitbp(dsl_pool_t *dp, dnode_phys_t *dnp,
traverse_zil(dp, &osp->os_zil_header);
for (j = 0; j < osp->os_meta_dnode.dn_nblkptr; j++) {
zbookmark_t czb;
SET_BOOKMARK(&czb, zb->zb_objset, 0,
osp->os_meta_dnode.dn_nlevels - 1, j);
scrub_visitbp(dp, &osp->os_meta_dnode, buf,
&osp->os_meta_dnode.dn_blkptr[j], &czb);
scrub_visitdnode(dp, &osp->os_meta_dnode,
buf, zb->zb_objset, 0);
if (arc_buf_size(buf) >= sizeof (objset_phys_t)) {
scrub_visitdnode(dp, &osp->os_userused_dnode,
buf, zb->zb_objset, 0);
scrub_visitdnode(dp, &osp->os_groupused_dnode,
buf, zb->zb_objset, 0);
}
}
@ -541,6 +536,21 @@ scrub_visitbp(dsl_pool_t *dp, dnode_phys_t *dnp,
(void) arc_buf_remove_ref(buf, &buf);
}
static void
scrub_visitdnode(dsl_pool_t *dp, dnode_phys_t *dnp, arc_buf_t *buf,
uint64_t objset, uint64_t object)
{
int j;
for (j = 0; j < dnp->dn_nblkptr; j++) {
zbookmark_t czb;
SET_BOOKMARK(&czb, objset, object, dnp->dn_nlevels - 1, j);
scrub_visitbp(dp, dnp, buf, &dnp->dn_blkptr[j], &czb);
}
}
static void
scrub_visit_rootbp(dsl_pool_t *dp, dsl_dataset_t *ds, blkptr_t *bp)
{
@ -688,17 +698,34 @@ scrub_visitds(dsl_pool_t *dp, uint64_t dsobj, dmu_tx_t *tx)
ds->ds_phys->ds_next_snap_obj, tx) == 0);
}
if (ds->ds_phys->ds_num_children > 1) {
if (spa_version(dp->dp_spa) < SPA_VERSION_DSL_SCRUB) {
boolean_t usenext = B_FALSE;
if (ds->ds_phys->ds_next_clones_obj != 0) {
uint64_t count;
/*
* A bug in a previous version of the code could
* cause upgrade_clones_cb() to not set
* ds_next_snap_obj when it should, leading to a
* missing entry. Therefore we can only use the
* next_clones_obj when its count is correct.
*/
int err = zap_count(dp->dp_meta_objset,
ds->ds_phys->ds_next_clones_obj, &count);
if (err == 0 &&
count == ds->ds_phys->ds_num_children - 1)
usenext = B_TRUE;
}
if (usenext) {
VERIFY(zap_join(dp->dp_meta_objset,
ds->ds_phys->ds_next_clones_obj,
dp->dp_scrub_queue_obj, tx) == 0);
} else {
struct enqueue_clones_arg eca;
eca.tx = tx;
eca.originobj = ds->ds_object;
(void) dmu_objset_find_spa(ds->ds_dir->dd_pool->dp_spa,
NULL, enqueue_clones_cb, &eca, DS_FIND_CHILDREN);
} else {
VERIFY(zap_join(dp->dp_meta_objset,
ds->ds_phys->ds_next_clones_obj,
dp->dp_scrub_queue_obj, tx) == 0);
}
}
@ -751,6 +778,7 @@ enqueue_cb(spa_t *spa, uint64_t dsobj, const char *dsname, void *arg)
void
dsl_pool_scrub_sync(dsl_pool_t *dp, dmu_tx_t *tx)
{
spa_t *spa = dp->dp_spa;
zap_cursor_t zc;
zap_attribute_t za;
boolean_t complete = B_TRUE;
@ -758,8 +786,10 @@ dsl_pool_scrub_sync(dsl_pool_t *dp, dmu_tx_t *tx)
if (dp->dp_scrub_func == SCRUB_FUNC_NONE)
return;
/* If the spa is not fully loaded, don't bother. */
if (dp->dp_spa->spa_load_state != SPA_LOAD_NONE)
/*
* If the pool is not loaded, or is trying to unload, leave it alone.
*/
if (spa->spa_load_state != SPA_LOAD_NONE || spa_shutting_down(spa))
return;
if (dp->dp_scrub_restart) {
@ -768,13 +798,13 @@ dsl_pool_scrub_sync(dsl_pool_t *dp, dmu_tx_t *tx)
dsl_pool_scrub_setup_sync(dp, &func, kcred, tx);
}
if (dp->dp_spa->spa_root_vdev->vdev_stat.vs_scrub_type == 0) {
if (spa->spa_root_vdev->vdev_stat.vs_scrub_type == 0) {
/*
* We must have resumed after rebooting; reset the vdev
* stats to know that we're doing a scrub (although it
* will think we're just starting now).
*/
vdev_scrub_stat_update(dp->dp_spa->spa_root_vdev,
vdev_scrub_stat_update(spa->spa_root_vdev,
dp->dp_scrub_min_txg ? POOL_SCRUB_RESILVER :
POOL_SCRUB_EVERYTHING, B_FALSE);
}
@ -782,7 +812,7 @@ dsl_pool_scrub_sync(dsl_pool_t *dp, dmu_tx_t *tx)
dp->dp_scrub_pausing = B_FALSE;
dp->dp_scrub_start_time = lbolt64;
dp->dp_scrub_isresilver = (dp->dp_scrub_min_txg != 0);
dp->dp_spa->spa_scrub_active = B_TRUE;
spa->spa_scrub_active = B_TRUE;
if (dp->dp_scrub_bookmark.zb_objset == 0) {
/* First do the MOS & ORIGIN */
@ -790,8 +820,8 @@ dsl_pool_scrub_sync(dsl_pool_t *dp, dmu_tx_t *tx)
if (dp->dp_scrub_pausing)
goto out;
if (spa_version(dp->dp_spa) < SPA_VERSION_DSL_SCRUB) {
VERIFY(0 == dmu_objset_find_spa(dp->dp_spa,
if (spa_version(spa) < SPA_VERSION_DSL_SCRUB) {
VERIFY(0 == dmu_objset_find_spa(spa,
NULL, enqueue_cb, tx, DS_FIND_CHILDREN));
} else {
scrub_visitds(dp, dp->dp_origin_snap->ds_object, tx);
@ -841,15 +871,13 @@ dsl_pool_scrub_sync(dsl_pool_t *dp, dmu_tx_t *tx)
VERIFY(0 == zap_update(dp->dp_meta_objset,
DMU_POOL_DIRECTORY_OBJECT,
DMU_POOL_SCRUB_ERRORS, sizeof (uint64_t), 1,
&dp->dp_spa->spa_scrub_errors, tx));
&spa->spa_scrub_errors, tx));
/* XXX this is scrub-clean specific */
mutex_enter(&dp->dp_spa->spa_scrub_lock);
while (dp->dp_spa->spa_scrub_inflight > 0) {
cv_wait(&dp->dp_spa->spa_scrub_io_cv,
&dp->dp_spa->spa_scrub_lock);
}
mutex_exit(&dp->dp_spa->spa_scrub_lock);
mutex_enter(&spa->spa_scrub_lock);
while (spa->spa_scrub_inflight > 0)
cv_wait(&spa->spa_scrub_io_cv, &spa->spa_scrub_lock);
mutex_exit(&spa->spa_scrub_lock);
}
void
@ -931,13 +959,17 @@ static int
dsl_pool_scrub_clean_cb(dsl_pool_t *dp,
const blkptr_t *bp, const zbookmark_t *zb)
{
size_t size = BP_GET_LSIZE(bp);
int d;
size_t size = BP_GET_PSIZE(bp);
spa_t *spa = dp->dp_spa;
boolean_t needs_io;
int zio_flags = ZIO_FLAG_SCRUB_THREAD | ZIO_FLAG_CANFAIL;
int zio_flags = ZIO_FLAG_SCRUB_THREAD | ZIO_FLAG_RAW | ZIO_FLAG_CANFAIL;
int zio_priority;
ASSERT(bp->blk_birth > dp->dp_scrub_min_txg);
if (bp->blk_birth >= dp->dp_scrub_max_txg)
return (0);
count_block(dp->dp_blkstats, bp);
if (dp->dp_scrub_isresilver == 0) {
@ -956,7 +988,7 @@ dsl_pool_scrub_clean_cb(dsl_pool_t *dp,
if (zb->zb_level == -1 && BP_GET_TYPE(bp) != DMU_OT_OBJSET)
zio_flags |= ZIO_FLAG_SPECULATIVE;
for (d = 0; d < BP_GET_NDVAS(bp); d++) {
for (int d = 0; d < BP_GET_NDVAS(bp); d++) {
vdev_t *vd = vdev_lookup_top(spa,
DVA_GET_VDEV(&bp->blk_dva[d]));
@ -974,16 +1006,17 @@ dsl_pool_scrub_clean_cb(dsl_pool_t *dp,
if (DVA_GET_GANG(&bp->blk_dva[d])) {
/*
* Gang members may be spread across multiple
* vdevs, so the best we can do is look at the
* pool-wide DTL.
* vdevs, so the best estimate we have is the
* scrub range, which has already been checked.
* XXX -- it would be better to change our
* allocation policy to ensure that this can't
* happen.
* allocation policy to ensure that all
* gang members reside on the same vdev.
*/
vd = spa->spa_root_vdev;
needs_io = B_TRUE;
} else {
needs_io = vdev_dtl_contains(vd, DTL_PARTIAL,
bp->blk_birth, 1);
}
needs_io = vdev_dtl_contains(&vd->vdev_dtl_map,
bp->blk_birth, 1);
}
}

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -35,19 +35,36 @@
uint64_t metaslab_aliquot = 512ULL << 10;
uint64_t metaslab_gang_bang = SPA_MAXBLOCKSIZE + 1; /* force gang blocks */
/*
* Minimum size which forces the dynamic allocator to change
* it's allocation strategy. Once the space map cannot satisfy
* an allocation of this size then it switches to using more
* aggressive strategy (i.e search by size rather than offset).
*/
uint64_t metaslab_df_alloc_threshold = SPA_MAXBLOCKSIZE;
/*
* The minimum free space, in percent, which must be available
* in a space map to continue allocations in a first-fit fashion.
* Once the space_map's free space drops below this level we dynamically
* switch to using best-fit allocations.
*/
int metaslab_df_free_pct = 30;
/*
* ==========================================================================
* Metaslab classes
* ==========================================================================
*/
metaslab_class_t *
metaslab_class_create(void)
metaslab_class_create(space_map_ops_t *ops)
{
metaslab_class_t *mc;
mc = kmem_zalloc(sizeof (metaslab_class_t), KM_SLEEP);
mc->mc_rotor = NULL;
mc->mc_ops = ops;
return (mc);
}
@ -202,30 +219,14 @@ metaslab_group_sort(metaslab_group_t *mg, metaslab_t *msp, uint64_t weight)
}
/*
* ==========================================================================
* The first-fit block allocator
* ==========================================================================
* This is a helper function that can be used by the allocator to find
* a suitable block to allocate. This will search the specified AVL
* tree looking for a block that matches the specified criteria.
*/
static void
metaslab_ff_load(space_map_t *sm)
{
ASSERT(sm->sm_ppd == NULL);
sm->sm_ppd = kmem_zalloc(64 * sizeof (uint64_t), KM_SLEEP);
}
static void
metaslab_ff_unload(space_map_t *sm)
{
kmem_free(sm->sm_ppd, 64 * sizeof (uint64_t));
sm->sm_ppd = NULL;
}
static uint64_t
metaslab_ff_alloc(space_map_t *sm, uint64_t size)
metaslab_block_picker(avl_tree_t *t, uint64_t *cursor, uint64_t size,
uint64_t align)
{
avl_tree_t *t = &sm->sm_root;
uint64_t align = size & -size;
uint64_t *cursor = (uint64_t *)sm->sm_ppd + highbit(align) - 1;
space_seg_t *ss, ssearch;
avl_index_t where;
@ -254,7 +255,37 @@ metaslab_ff_alloc(space_map_t *sm, uint64_t size)
return (-1ULL);
*cursor = 0;
return (metaslab_ff_alloc(sm, size));
return (metaslab_block_picker(t, cursor, size, align));
}
/*
* ==========================================================================
* The first-fit block allocator
* ==========================================================================
*/
static void
metaslab_ff_load(space_map_t *sm)
{
ASSERT(sm->sm_ppd == NULL);
sm->sm_ppd = kmem_zalloc(64 * sizeof (uint64_t), KM_SLEEP);
sm->sm_pp_root = NULL;
}
static void
metaslab_ff_unload(space_map_t *sm)
{
kmem_free(sm->sm_ppd, 64 * sizeof (uint64_t));
sm->sm_ppd = NULL;
}
static uint64_t
metaslab_ff_alloc(space_map_t *sm, uint64_t size)
{
avl_tree_t *t = &sm->sm_root;
uint64_t align = size & -size;
uint64_t *cursor = (uint64_t *)sm->sm_ppd + highbit(align) - 1;
return (metaslab_block_picker(t, cursor, size, align));
}
/* ARGSUSED */
@ -276,9 +307,136 @@ static space_map_ops_t metaslab_ff_ops = {
metaslab_ff_unload,
metaslab_ff_alloc,
metaslab_ff_claim,
metaslab_ff_free
metaslab_ff_free,
NULL /* maxsize */
};
/*
* Dynamic block allocator -
* Uses the first fit allocation scheme until space get low and then
* adjusts to a best fit allocation method. Uses metaslab_df_alloc_threshold
* and metaslab_df_free_pct to determine when to switch the allocation scheme.
*/
uint64_t
metaslab_df_maxsize(space_map_t *sm)
{
avl_tree_t *t = sm->sm_pp_root;
space_seg_t *ss;
if (t == NULL || (ss = avl_last(t)) == NULL)
return (0ULL);
return (ss->ss_end - ss->ss_start);
}
static int
metaslab_df_seg_compare(const void *x1, const void *x2)
{
const space_seg_t *s1 = x1;
const space_seg_t *s2 = x2;
uint64_t ss_size1 = s1->ss_end - s1->ss_start;
uint64_t ss_size2 = s2->ss_end - s2->ss_start;
if (ss_size1 < ss_size2)
return (-1);
if (ss_size1 > ss_size2)
return (1);
if (s1->ss_start < s2->ss_start)
return (-1);
if (s1->ss_start > s2->ss_start)
return (1);
return (0);
}
static void
metaslab_df_load(space_map_t *sm)
{
space_seg_t *ss;
ASSERT(sm->sm_ppd == NULL);
sm->sm_ppd = kmem_zalloc(64 * sizeof (uint64_t), KM_SLEEP);
sm->sm_pp_root = kmem_alloc(sizeof (avl_tree_t), KM_SLEEP);
avl_create(sm->sm_pp_root, metaslab_df_seg_compare,
sizeof (space_seg_t), offsetof(struct space_seg, ss_pp_node));
for (ss = avl_first(&sm->sm_root); ss; ss = AVL_NEXT(&sm->sm_root, ss))
avl_add(sm->sm_pp_root, ss);
}
static void
metaslab_df_unload(space_map_t *sm)
{
void *cookie = NULL;
kmem_free(sm->sm_ppd, 64 * sizeof (uint64_t));
sm->sm_ppd = NULL;
while (avl_destroy_nodes(sm->sm_pp_root, &cookie) != NULL) {
/* tear down the tree */
}
avl_destroy(sm->sm_pp_root);
kmem_free(sm->sm_pp_root, sizeof (avl_tree_t));
sm->sm_pp_root = NULL;
}
static uint64_t
metaslab_df_alloc(space_map_t *sm, uint64_t size)
{
avl_tree_t *t = &sm->sm_root;
uint64_t align = size & -size;
uint64_t *cursor = (uint64_t *)sm->sm_ppd + highbit(align) - 1;
uint64_t max_size = metaslab_df_maxsize(sm);
int free_pct = sm->sm_space * 100 / sm->sm_size;
ASSERT(MUTEX_HELD(sm->sm_lock));
ASSERT3U(avl_numnodes(&sm->sm_root), ==, avl_numnodes(sm->sm_pp_root));
if (max_size < size)
return (-1ULL);
/*
* If we're running low on space switch to using the size
* sorted AVL tree (best-fit).
*/
if (max_size < metaslab_df_alloc_threshold ||
free_pct < metaslab_df_free_pct) {
t = sm->sm_pp_root;
*cursor = 0;
}
return (metaslab_block_picker(t, cursor, size, 1ULL));
}
/* ARGSUSED */
static void
metaslab_df_claim(space_map_t *sm, uint64_t start, uint64_t size)
{
/* No need to update cursor */
}
/* ARGSUSED */
static void
metaslab_df_free(space_map_t *sm, uint64_t start, uint64_t size)
{
/* No need to update cursor */
}
static space_map_ops_t metaslab_df_ops = {
metaslab_df_load,
metaslab_df_unload,
metaslab_df_alloc,
metaslab_df_claim,
metaslab_df_free,
metaslab_df_maxsize
};
space_map_ops_t *zfs_metaslab_ops = &metaslab_df_ops;
/*
* ==========================================================================
* Metaslabs
@ -414,20 +572,28 @@ metaslab_weight(metaslab_t *msp)
}
static int
metaslab_activate(metaslab_t *msp, uint64_t activation_weight)
metaslab_activate(metaslab_t *msp, uint64_t activation_weight, uint64_t size)
{
space_map_t *sm = &msp->ms_map;
space_map_ops_t *sm_ops = msp->ms_group->mg_class->mc_ops;
ASSERT(MUTEX_HELD(&msp->ms_lock));
if ((msp->ms_weight & METASLAB_ACTIVE_MASK) == 0) {
int error = space_map_load(sm, &metaslab_ff_ops,
SM_FREE, &msp->ms_smo,
int error = space_map_load(sm, sm_ops, SM_FREE, &msp->ms_smo,
msp->ms_group->mg_vd->vdev_spa->spa_meta_objset);
if (error) {
metaslab_group_sort(msp->ms_group, msp, 0);
return (error);
}
/*
* If we were able to load the map then make sure
* that this map is still able to satisfy our request.
*/
if (msp->ms_weight < size)
return (ENOSPC);
metaslab_group_sort(msp->ms_group, msp,
msp->ms_weight | activation_weight);
}
@ -636,11 +802,16 @@ metaslab_group_alloc(metaslab_group_t *mg, uint64_t size, uint64_t txg,
int i;
activation_weight = METASLAB_WEIGHT_PRIMARY;
for (i = 0; i < d; i++)
if (DVA_GET_VDEV(&dva[i]) == mg->mg_vd->vdev_id)
for (i = 0; i < d; i++) {
if (DVA_GET_VDEV(&dva[i]) == mg->mg_vd->vdev_id) {
activation_weight = METASLAB_WEIGHT_SECONDARY;
break;
}
}
for (;;) {
boolean_t was_active;
mutex_enter(&mg->mg_lock);
for (msp = avl_first(t); msp; msp = AVL_NEXT(t, msp)) {
if (msp->ms_weight < size) {
@ -648,6 +819,7 @@ metaslab_group_alloc(metaslab_group_t *mg, uint64_t size, uint64_t txg,
return (-1ULL);
}
was_active = msp->ms_weight & METASLAB_ACTIVE_MASK;
if (activation_weight == METASLAB_WEIGHT_PRIMARY)
break;
@ -673,7 +845,9 @@ metaslab_group_alloc(metaslab_group_t *mg, uint64_t size, uint64_t txg,
* another thread may have changed the weight while we
* were blocked on the metaslab lock.
*/
if (msp->ms_weight < size) {
if (msp->ms_weight < size || (was_active &&
!(msp->ms_weight & METASLAB_ACTIVE_MASK) &&
activation_weight == METASLAB_WEIGHT_PRIMARY)) {
mutex_exit(&msp->ms_lock);
continue;
}
@ -686,7 +860,7 @@ metaslab_group_alloc(metaslab_group_t *mg, uint64_t size, uint64_t txg,
continue;
}
if (metaslab_activate(msp, activation_weight) != 0) {
if (metaslab_activate(msp, activation_weight, size) != 0) {
mutex_exit(&msp->ms_lock);
continue;
}
@ -720,6 +894,8 @@ metaslab_alloc_dva(spa_t *spa, metaslab_class_t *mc, uint64_t psize,
vdev_t *vd;
int dshift = 3;
int all_zero;
int zio_lock = B_FALSE;
boolean_t allocatable;
uint64_t offset = -1ULL;
uint64_t asize;
uint64_t distance;
@ -778,11 +954,20 @@ metaslab_alloc_dva(spa_t *spa, metaslab_class_t *mc, uint64_t psize,
all_zero = B_TRUE;
do {
vd = mg->mg_vd;
/*
* Don't allocate from faulted devices.
*/
if (!vdev_allocatable(vd))
if (zio_lock) {
spa_config_enter(spa, SCL_ZIO, FTAG, RW_READER);
allocatable = vdev_allocatable(vd);
spa_config_exit(spa, SCL_ZIO, FTAG);
} else {
allocatable = vdev_allocatable(vd);
}
if (!allocatable)
goto next;
/*
* Avoid writing single-copy data to a failing vdev
*/
@ -858,6 +1043,12 @@ metaslab_alloc_dva(spa_t *spa, metaslab_class_t *mc, uint64_t psize,
goto top;
}
if (!allocatable && !zio_lock) {
dshift = 3;
zio_lock = B_TRUE;
goto top;
}
bzero(&dva[d], sizeof (dva_t));
return (ENOSPC);
@ -938,7 +1129,7 @@ metaslab_claim_dva(spa_t *spa, const dva_t *dva, uint64_t txg)
mutex_enter(&msp->ms_lock);
error = metaslab_activate(msp, METASLAB_WEIGHT_SECONDARY);
error = metaslab_activate(msp, METASLAB_WEIGHT_SECONDARY, 0);
if (error || txg == 0) { /* txg == 0 indicates dry run */
mutex_exit(&msp->ms_lock);
return (error);
@ -946,7 +1137,7 @@ metaslab_claim_dva(spa_t *spa, const dva_t *dva, uint64_t txg)
space_map_claim(&msp->ms_map, offset, size);
if (spa_mode & FWRITE) { /* don't dirty if we're zdb(1M) */
if (spa_writeable(spa)) { /* don't dirty if we're zdb(1M) */
if (msp->ms_allocmap[txg & TXG_MASK].sm_space == 0)
vdev_dirty(vd, VDD_METASLAB, msp, txg);
space_map_add(&msp->ms_allocmap[txg & TXG_MASK], offset, size);

File diff suppressed because it is too large Load Diff

View File

@ -212,6 +212,9 @@ spa_config_sync(spa_t *target, boolean_t removing, boolean_t postsysevent)
ASSERT(MUTEX_HELD(&spa_namespace_lock));
if (rootdir == NULL || !(spa_mode_global & FWRITE))
return;
/*
* Iterate over all cachefiles for the pool, past or present. When the
* cachefile is changed, the new one is pushed onto this list, allowing
@ -385,24 +388,13 @@ spa_config_generate(spa_t *spa, vdev_t *vd, uint64_t txg, int getstats)
return (config);
}
/*
* For a pool that's not currently a booting rootpool, update all disk labels,
* generate a fresh config based on the current in-core state, and sync the
* global config cache.
*/
void
spa_config_update(spa_t *spa, int what)
{
spa_config_update_common(spa, what, FALSE);
}
/*
* Update all disk labels, generate a fresh config based on the current
* in-core state, and sync the global config cache (do not sync the config
* cache if this is a booting rootpool).
*/
void
spa_config_update_common(spa_t *spa, int what, boolean_t isroot)
spa_config_update(spa_t *spa, int what)
{
vdev_t *rvd = spa->spa_root_vdev;
uint64_t txg;
@ -440,9 +432,9 @@ spa_config_update_common(spa_t *spa, int what, boolean_t isroot)
/*
* Update the global config cache to reflect the new mosconfig.
*/
if (!isroot)
if (!spa->spa_is_root)
spa_config_sync(spa, B_FALSE, what != SPA_CONFIG_UPDATE_POOL);
if (what == SPA_CONFIG_UPDATE_POOL)
spa_config_update_common(spa, SPA_CONFIG_UPDATE_VDEVS, isroot);
spa_config_update(spa, SPA_CONFIG_UPDATE_VDEVS);
}

View File

@ -19,12 +19,10 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#pragma ident "%Z%%M% %I% %E% SMI"
/*
* Routines to manage the on-disk persistent error log.
*
@ -61,8 +59,8 @@
* lowercase hexidecimal numbers that don't overflow.
*/
#ifdef _KERNEL
static uint64_t
_strtonum(char *str, char **nptr)
uint64_t
_strtonum(const char *str, char **nptr)
{
uint64_t val = 0;
char c;
@ -82,7 +80,8 @@ _strtonum(char *str, char **nptr)
str++;
}
*nptr = str;
if (nptr)
*nptr = (char *)str;
return (val);
}

View File

@ -20,12 +20,10 @@
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#pragma ident "%Z%%M% %I% %E% SMI"
#include <sys/spa.h>
#include <sys/spa_impl.h>
#include <sys/zap.h>
@ -127,12 +125,12 @@ spa_history_advance_bof(spa_t *spa, spa_history_phys_t *shpp)
firstread = MIN(sizeof (reclen), shpp->sh_phys_max_off - phys_bof);
if ((err = dmu_read(mos, spa->spa_history, phys_bof, firstread,
buf)) != 0)
buf, DMU_READ_PREFETCH)) != 0)
return (err);
if (firstread != sizeof (reclen)) {
if ((err = dmu_read(mos, spa->spa_history,
shpp->sh_pool_create_len, sizeof (reclen) - firstread,
buf + firstread)) != 0)
buf + firstread, DMU_READ_PREFETCH)) != 0)
return (err);
}
@ -381,10 +379,11 @@ spa_history_get(spa_t *spa, uint64_t *offp, uint64_t *len, char *buf)
return (0);
}
err = dmu_read(mos, spa->spa_history, phys_read_off, read_len, buf);
err = dmu_read(mos, spa->spa_history, phys_read_off, read_len, buf,
DMU_READ_PREFETCH);
if (leftover && err == 0) {
err = dmu_read(mos, spa->spa_history, shpp->sh_pool_create_len,
leftover, buf + read_len);
leftover, buf + read_len, DMU_READ_PREFETCH);
}
mutex_exit(&spa->spa_history_lock);

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -230,7 +230,7 @@ static kmutex_t spa_l2cache_lock;
static avl_tree_t spa_l2cache_avl;
kmem_cache_t *spa_buffer_pool;
int spa_mode;
int spa_mode_global;
#ifdef ZFS_DEBUG
/* Everything except dprintf is on by default in debug builds */
@ -429,7 +429,6 @@ spa_add(const char *name, const char *altroot)
spa = kmem_zalloc(sizeof (spa_t), KM_SLEEP);
mutex_init(&spa->spa_async_lock, NULL, MUTEX_DEFAULT, NULL);
mutex_init(&spa->spa_async_root_lock, NULL, MUTEX_DEFAULT, NULL);
mutex_init(&spa->spa_scrub_lock, NULL, MUTEX_DEFAULT, NULL);
mutex_init(&spa->spa_errlog_lock, NULL, MUTEX_DEFAULT, NULL);
mutex_init(&spa->spa_errlist_lock, NULL, MUTEX_DEFAULT, NULL);
@ -438,7 +437,6 @@ spa_add(const char *name, const char *altroot)
mutex_init(&spa->spa_props_lock, NULL, MUTEX_DEFAULT, NULL);
cv_init(&spa->spa_async_cv, NULL, CV_DEFAULT, NULL);
cv_init(&spa->spa_async_root_cv, NULL, CV_DEFAULT, NULL);
cv_init(&spa->spa_scrub_io_cv, NULL, CV_DEFAULT, NULL);
cv_init(&spa->spa_suspend_cv, NULL, CV_DEFAULT, NULL);
@ -512,12 +510,10 @@ spa_remove(spa_t *spa)
spa_config_lock_destroy(spa);
cv_destroy(&spa->spa_async_cv);
cv_destroy(&spa->spa_async_root_cv);
cv_destroy(&spa->spa_scrub_io_cv);
cv_destroy(&spa->spa_suspend_cv);
mutex_destroy(&spa->spa_async_lock);
mutex_destroy(&spa->spa_async_root_lock);
mutex_destroy(&spa->spa_scrub_lock);
mutex_destroy(&spa->spa_errlog_lock);
mutex_destroy(&spa->spa_errlist_lock);
@ -884,8 +880,10 @@ spa_vdev_exit(spa_t *spa, vdev_t *vd, uint64_t txg, int error)
txg_wait_synced(spa->spa_dsl_pool, txg);
if (vd != NULL) {
ASSERT(!vd->vdev_detached || vd->vdev_dtl.smo_object == 0);
ASSERT(!vd->vdev_detached || vd->vdev_dtl_smo.smo_object == 0);
spa_config_enter(spa, SCL_ALL, spa, RW_WRITER);
vdev_free(vd);
spa_config_exit(spa, SCL_ALL, spa);
}
/*
@ -916,6 +914,15 @@ spa_vdev_state_exit(spa_t *spa, vdev_t *vd, int error)
spa_config_exit(spa, SCL_STATE_ALL, spa);
/*
* If anything changed, wait for it to sync. This ensures that,
* from the system administrator's perspective, zpool(1M) commands
* are synchronous. This is important for things like zpool offline:
* when the command completes, you expect no further I/O from ZFS.
*/
if (vd != NULL)
txg_wait_synced(spa->spa_dsl_pool, 0);
return (error);
}
@ -1117,6 +1124,37 @@ zfs_panic_recover(const char *fmt, ...)
va_end(adx);
}
/*
* This is a stripped-down version of strtoull, suitable only for converting
* lowercase hexidecimal numbers that don't overflow.
*/
uint64_t
zfs_strtonum(const char *str, char **nptr)
{
uint64_t val = 0;
char c;
int digit;
while ((c = *str) != '\0') {
if (c >= '0' && c <= '9')
digit = c - '0';
else if (c >= 'a' && c <= 'f')
digit = 10 + c - 'a';
else
break;
val *= 16;
val += digit;
str++;
}
if (nptr)
*nptr = (char *)str;
return (val);
}
/*
* ==========================================================================
* Accessor functions
@ -1355,7 +1393,7 @@ spa_init(int mode)
avl_create(&spa_l2cache_avl, spa_l2cache_compare, sizeof (spa_aux_t),
offsetof(spa_aux_t, aux_avl));
spa_mode = mode;
spa_mode_global = mode;
refcount_sysinit();
unique_init();
@ -1412,3 +1450,15 @@ spa_is_root(spa_t *spa)
{
return (spa->spa_is_root);
}
boolean_t
spa_writeable(spa_t *spa)
{
return (!!(spa->spa_mode & FWRITE));
}
int
spa_mode(spa_t *spa)
{
return (spa->spa_mode);
}

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -116,12 +116,23 @@ space_map_add(space_map_t *sm, uint64_t start, uint64_t size)
if (merge_before && merge_after) {
avl_remove(&sm->sm_root, ss_before);
if (sm->sm_pp_root) {
avl_remove(sm->sm_pp_root, ss_before);
avl_remove(sm->sm_pp_root, ss_after);
}
ss_after->ss_start = ss_before->ss_start;
kmem_free(ss_before, sizeof (*ss_before));
ss = ss_after;
} else if (merge_before) {
ss_before->ss_end = end;
if (sm->sm_pp_root)
avl_remove(sm->sm_pp_root, ss_before);
ss = ss_before;
} else if (merge_after) {
ss_after->ss_start = start;
if (sm->sm_pp_root)
avl_remove(sm->sm_pp_root, ss_after);
ss = ss_after;
} else {
ss = kmem_alloc(sizeof (*ss), KM_SLEEP);
ss->ss_start = start;
@ -129,6 +140,9 @@ space_map_add(space_map_t *sm, uint64_t start, uint64_t size)
avl_insert(&sm->sm_root, ss, where);
}
if (sm->sm_pp_root)
avl_add(sm->sm_pp_root, ss);
sm->sm_space += size;
}
@ -163,12 +177,17 @@ space_map_remove(space_map_t *sm, uint64_t start, uint64_t size)
left_over = (ss->ss_start != start);
right_over = (ss->ss_end != end);
if (sm->sm_pp_root)
avl_remove(sm->sm_pp_root, ss);
if (left_over && right_over) {
newseg = kmem_alloc(sizeof (*newseg), KM_SLEEP);
newseg->ss_start = end;
newseg->ss_end = ss->ss_end;
ss->ss_end = start;
avl_insert_here(&sm->sm_root, newseg, ss, AVL_AFTER);
if (sm->sm_pp_root)
avl_add(sm->sm_pp_root, newseg);
} else if (left_over) {
ss->ss_end = start;
} else if (right_over) {
@ -176,12 +195,16 @@ space_map_remove(space_map_t *sm, uint64_t start, uint64_t size)
} else {
avl_remove(&sm->sm_root, ss);
kmem_free(ss, sizeof (*ss));
ss = NULL;
}
if (sm->sm_pp_root && ss != NULL)
avl_add(sm->sm_pp_root, ss);
sm->sm_space -= size;
}
int
boolean_t
space_map_contains(space_map_t *sm, uint64_t start, uint64_t size)
{
avl_index_t where;
@ -221,59 +244,10 @@ space_map_walk(space_map_t *sm, space_map_func_t *func, space_map_t *mdest)
{
space_seg_t *ss;
for (ss = avl_first(&sm->sm_root); ss; ss = AVL_NEXT(&sm->sm_root, ss))
func(mdest, ss->ss_start, ss->ss_end - ss->ss_start);
}
void
space_map_excise(space_map_t *sm, uint64_t start, uint64_t size)
{
avl_tree_t *t = &sm->sm_root;
avl_index_t where;
space_seg_t *ss, search;
uint64_t end = start + size;
uint64_t rm_start, rm_end;
ASSERT(MUTEX_HELD(sm->sm_lock));
search.ss_start = start;
search.ss_end = start;
for (;;) {
ss = avl_find(t, &search, &where);
if (ss == NULL)
ss = avl_nearest(t, where, AVL_AFTER);
if (ss == NULL || ss->ss_start >= end)
break;
rm_start = MAX(ss->ss_start, start);
rm_end = MIN(ss->ss_end, end);
space_map_remove(sm, rm_start, rm_end - rm_start);
}
}
/*
* Replace smd with the union of smd and sms.
*/
void
space_map_union(space_map_t *smd, space_map_t *sms)
{
avl_tree_t *t = &sms->sm_root;
space_seg_t *ss;
ASSERT(MUTEX_HELD(smd->sm_lock));
/*
* For each source segment, remove any intersections with the
* destination, then add the source segment to the destination.
*/
for (ss = avl_first(t); ss != NULL; ss = AVL_NEXT(t, ss)) {
space_map_excise(smd, ss->ss_start, ss->ss_end - ss->ss_start);
space_map_add(smd, ss->ss_start, ss->ss_end - ss->ss_start);
}
for (ss = avl_first(&sm->sm_root); ss; ss = AVL_NEXT(&sm->sm_root, ss))
func(mdest, ss->ss_start, ss->ss_end - ss->ss_start);
}
/*
@ -337,7 +311,8 @@ space_map_load(space_map_t *sm, space_map_ops_t *ops, uint8_t maptype,
smo->smo_object, offset, size);
mutex_exit(sm->sm_lock);
error = dmu_read(os, smo->smo_object, offset, size, entry_map);
error = dmu_read(os, smo->smo_object, offset, size, entry_map,
DMU_READ_PREFETCH);
mutex_enter(sm->sm_lock);
if (error != 0)
break;
@ -390,6 +365,15 @@ space_map_unload(space_map_t *sm)
space_map_vacate(sm, NULL, NULL);
}
uint64_t
space_map_maxsize(space_map_t *sm)
{
if (sm->sm_loaded && sm->sm_ops != NULL)
return (sm->sm_ops->smop_max(sm));
else
return (-1ULL);
}
uint64_t
space_map_alloc(space_map_t *sm, uint64_t size)
{
@ -505,3 +489,131 @@ space_map_truncate(space_map_obj_t *smo, objset_t *os, dmu_tx_t *tx)
smo->smo_objsize = 0;
smo->smo_alloc = 0;
}
/*
* Space map reference trees.
*
* A space map is a collection of integers. Every integer is either
* in the map, or it's not. A space map reference tree generalizes
* the idea: it allows its members to have arbitrary reference counts,
* as opposed to the implicit reference count of 0 or 1 in a space map.
* This representation comes in handy when computing the union or
* intersection of multiple space maps. For example, the union of
* N space maps is the subset of the reference tree with refcnt >= 1.
* The intersection of N space maps is the subset with refcnt >= N.
*
* [It's very much like a Fourier transform. Unions and intersections
* are hard to perform in the 'space map domain', so we convert the maps
* into the 'reference count domain', where it's trivial, then invert.]
*
* vdev_dtl_reassess() uses computations of this form to determine
* DTL_MISSING and DTL_OUTAGE for interior vdevs -- e.g. a RAID-Z vdev
* has an outage wherever refcnt >= vdev_nparity + 1, and a mirror vdev
* has an outage wherever refcnt >= vdev_children.
*/
static int
space_map_ref_compare(const void *x1, const void *x2)
{
const space_ref_t *sr1 = x1;
const space_ref_t *sr2 = x2;
if (sr1->sr_offset < sr2->sr_offset)
return (-1);
if (sr1->sr_offset > sr2->sr_offset)
return (1);
if (sr1 < sr2)
return (-1);
if (sr1 > sr2)
return (1);
return (0);
}
void
space_map_ref_create(avl_tree_t *t)
{
avl_create(t, space_map_ref_compare,
sizeof (space_ref_t), offsetof(space_ref_t, sr_node));
}
void
space_map_ref_destroy(avl_tree_t *t)
{
space_ref_t *sr;
void *cookie = NULL;
while ((sr = avl_destroy_nodes(t, &cookie)) != NULL)
kmem_free(sr, sizeof (*sr));
avl_destroy(t);
}
static void
space_map_ref_add_node(avl_tree_t *t, uint64_t offset, int64_t refcnt)
{
space_ref_t *sr;
sr = kmem_alloc(sizeof (*sr), KM_SLEEP);
sr->sr_offset = offset;
sr->sr_refcnt = refcnt;
avl_add(t, sr);
}
void
space_map_ref_add_seg(avl_tree_t *t, uint64_t start, uint64_t end,
int64_t refcnt)
{
space_map_ref_add_node(t, start, refcnt);
space_map_ref_add_node(t, end, -refcnt);
}
/*
* Convert (or add) a space map into a reference tree.
*/
void
space_map_ref_add_map(avl_tree_t *t, space_map_t *sm, int64_t refcnt)
{
space_seg_t *ss;
ASSERT(MUTEX_HELD(sm->sm_lock));
for (ss = avl_first(&sm->sm_root); ss; ss = AVL_NEXT(&sm->sm_root, ss))
space_map_ref_add_seg(t, ss->ss_start, ss->ss_end, refcnt);
}
/*
* Convert a reference tree into a space map. The space map will contain
* all members of the reference tree for which refcnt >= minref.
*/
void
space_map_ref_generate_map(avl_tree_t *t, space_map_t *sm, int64_t minref)
{
uint64_t start = -1ULL;
int64_t refcnt = 0;
space_ref_t *sr;
ASSERT(MUTEX_HELD(sm->sm_lock));
space_map_vacate(sm, NULL, NULL);
for (sr = avl_first(t); sr != NULL; sr = AVL_NEXT(t, sr)) {
refcnt += sr->sr_refcnt;
if (refcnt >= minref) {
if (start == -1ULL) {
start = sr->sr_offset;
}
} else {
if (start != -1ULL) {
uint64_t end = sr->sr_offset;
ASSERT(start <= end);
if (end > start)
space_map_add(sm, start, end - start);
start = -1ULL;
}
}
}
ASSERT(refcnt == 0);
ASSERT(start == -1ULL);
}

View File

@ -85,6 +85,8 @@ void *arc_data_buf_alloc(uint64_t space);
void arc_data_buf_free(void *buf, uint64_t space);
arc_buf_t *arc_buf_alloc(spa_t *spa, int size, void *tag,
arc_buf_contents_t type);
arc_buf_t *arc_loan_buf(spa_t *spa, int size);
void arc_return_buf(arc_buf_t *buf, void *tag);
void arc_buf_add_ref(arc_buf_t *buf, void *tag);
int arc_buf_remove_ref(arc_buf_t *buf, void *tag);
int arc_buf_size(arc_buf_t *buf);

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -262,6 +262,7 @@ void dmu_buf_will_fill(dmu_buf_t *db, dmu_tx_t *tx);
void dbuf_fill_done(dmu_buf_impl_t *db, dmu_tx_t *tx);
void dmu_buf_will_fill(dmu_buf_t *db, dmu_tx_t *tx);
void dmu_buf_fill_done(dmu_buf_t *db, dmu_tx_t *tx);
void dbuf_assign_arcbuf(dmu_buf_impl_t *db, arc_buf_t *buf, dmu_tx_t *tx);
dbuf_dirty_record_t *dbuf_dirty(dmu_buf_impl_t *db, dmu_tx_t *tx);
void dbuf_clear(dmu_buf_impl_t *db);

View File

@ -61,6 +61,7 @@ struct zbookmark;
struct spa;
struct nvlist;
struct objset_impl;
struct arc_buf;
struct file;
typedef struct objset objset_t;
@ -116,6 +117,8 @@ typedef enum dmu_object_type {
DMU_OT_FUID_SIZE, /* FUID table size UINT64 */
DMU_OT_NEXT_CLONES, /* ZAP */
DMU_OT_SCRUB_QUEUE, /* ZAP */
DMU_OT_USERGROUP_USED, /* ZAP */
DMU_OT_USERGROUP_QUOTA, /* ZAP */
DMU_OT_NUMTYPES
} dmu_object_type_t;
@ -158,6 +161,9 @@ void zfs_znode_byteswap(void *buf, size_t size);
#define DMU_MAX_ACCESS (10<<20) /* 10MB */
#define DMU_MAX_DELETEBLKCNT (20480) /* ~5MB of indirect blocks */
#define DMU_USERUSED_OBJECT (-1ULL)
#define DMU_GROUPUSED_OBJECT (-2ULL)
/*
* Public routines to create, destroy, open, and close objsets.
*/
@ -173,7 +179,8 @@ int dmu_objset_create(const char *name, dmu_objset_type_t type,
int dmu_objset_destroy(const char *name);
int dmu_snapshots_destroy(char *fsname, char *snapname);
int dmu_objset_rollback(objset_t *os);
int dmu_objset_snapshot(char *fsname, char *snapname, boolean_t recursive);
int dmu_objset_snapshot(char *fsname, char *snapname, struct nvlist *props,
boolean_t recursive);
int dmu_objset_rename(const char *name, const char *newname,
boolean_t recursive);
int dmu_objset_find(char *name, int func(char *, void *), void *arg,
@ -399,6 +406,11 @@ void *dmu_buf_get_user(dmu_buf_t *db);
*/
void dmu_buf_will_dirty(dmu_buf_t *db, dmu_tx_t *tx);
/*
* Tells if the given dbuf is freeable.
*/
boolean_t dmu_buf_freeable(dmu_buf_t *);
/*
* You must create a transaction, then hold the objects which you will
* (or might) modify as part of this transaction. Then you must assign
@ -424,7 +436,7 @@ dmu_tx_t *dmu_tx_create(objset_t *os);
void dmu_tx_hold_write(dmu_tx_t *tx, uint64_t object, uint64_t off, int len);
void dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object, uint64_t off,
uint64_t len);
void dmu_tx_hold_zap(dmu_tx_t *tx, uint64_t object, int add, char *name);
void dmu_tx_hold_zap(dmu_tx_t *tx, uint64_t object, int add, const char *name);
void dmu_tx_hold_bonus(dmu_tx_t *tx, uint64_t object);
void dmu_tx_abort(dmu_tx_t *tx);
int dmu_tx_assign(dmu_tx_t *tx, uint64_t txg_how);
@ -447,8 +459,10 @@ int dmu_free_object(objset_t *os, uint64_t object);
* Canfail routines will return 0 on success, or an errno if there is a
* nonrecoverable I/O error.
*/
#define DMU_READ_PREFETCH 0 /* prefetch */
#define DMU_READ_NO_PREFETCH 1 /* don't prefetch */
int dmu_read(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
void *buf);
void *buf, uint32_t flags);
void dmu_write(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
const void *buf, dmu_tx_t *tx);
int dmu_read_uio(objset_t *os, uint64_t object, struct uio *uio, uint64_t size);
@ -456,6 +470,10 @@ int dmu_write_uio(objset_t *os, uint64_t object, struct uio *uio, uint64_t size,
dmu_tx_t *tx);
int dmu_write_pages(objset_t *os, uint64_t object, uint64_t offset,
uint64_t size, struct page *pp, dmu_tx_t *tx);
struct arc_buf *dmu_request_arcbuf(dmu_buf_t *handle, int size);
void dmu_return_arcbuf(struct arc_buf *buf);
void dmu_assign_arcbuf(dmu_buf_t *handle, uint64_t offset, struct arc_buf *buf,
dmu_tx_t *tx);
extern int zfs_prefetch_disable;
@ -562,6 +580,12 @@ extern int dmu_snapshot_realname(objset_t *os, char *name, char *real,
int maxlen, boolean_t *conflict);
extern int dmu_dir_list_next(objset_t *os, int namelen, char *name,
uint64_t *idp, uint64_t *offp);
typedef void objset_used_cb_t(objset_t *os, dmu_object_type_t bonustype,
void *oldbonus, void *newbonus, uint64_t oldused, uint64_t newused,
dmu_tx_t *tx);
extern void dmu_objset_register_type(dmu_objset_type_t ost,
objset_used_cb_t *cb);
extern void dmu_objset_set_user(objset_t *os, void *user_ptr);
extern void *dmu_objset_get_user(objset_t *os);

View File

@ -42,12 +42,20 @@ struct dsl_dataset;
struct dmu_tx;
struct objset_impl;
#define OBJSET_PHYS_SIZE 2048
#define OBJSET_OLD_PHYS_SIZE 1024
#define OBJSET_FLAG_USERACCOUNTING_COMPLETE (1ULL<<0)
typedef struct objset_phys {
dnode_phys_t os_meta_dnode;
zil_header_t os_zil_header;
uint64_t os_type;
char os_pad[1024 - sizeof (dnode_phys_t) - sizeof (zil_header_t) -
sizeof (uint64_t)];
uint64_t os_flags;
char os_pad[OBJSET_PHYS_SIZE - sizeof (dnode_phys_t)*3 -
sizeof (zil_header_t) - sizeof (uint64_t)*2];
dnode_phys_t os_userused_dnode;
dnode_phys_t os_groupused_dnode;
} objset_phys_t;
struct objset {
@ -62,6 +70,8 @@ typedef struct objset_impl {
arc_buf_t *os_phys_buf;
objset_phys_t *os_phys;
dnode_t *os_meta_dnode;
dnode_t *os_userused_dnode;
dnode_t *os_groupused_dnode;
zilog_t *os_zil;
objset_t os;
uint8_t os_checksum; /* can change, under dsl_dir's locks */
@ -74,6 +84,8 @@ typedef struct objset_impl {
struct dmu_tx *os_synctx; /* XXX sketchy */
blkptr_t *os_rootbp;
zil_header_t os_zil_header;
list_t os_synced_dnodes;
uint64_t os_flags;
/* Protected by os_obj_lock */
kmutex_t os_obj_lock;
@ -92,6 +104,7 @@ typedef struct objset_impl {
} objset_impl_t;
#define DMU_META_DNODE_OBJECT 0
#define DMU_OBJECT_IS_SPECIAL(obj) ((int64_t)(obj) <= 0)
#define DMU_OS_IS_L2CACHEABLE(os) \
((os)->os_secondary_cache == ZFS_CACHE_ALL || \
@ -106,7 +119,8 @@ int dmu_objset_create(const char *name, dmu_objset_type_t type,
void (*func)(objset_t *os, void *arg, cred_t *cr, dmu_tx_t *tx), void *arg);
int dmu_objset_destroy(const char *name);
int dmu_objset_rollback(objset_t *os);
int dmu_objset_snapshot(char *fsname, char *snapname, boolean_t recursive);
int dmu_objset_snapshot(char *fsname, char *snapname, nvlist_t *props,
boolean_t recursive);
void dmu_objset_stats(objset_t *os, nvlist_t *nv);
void dmu_objset_fast_stat(objset_t *os, dmu_objset_stats_t *stat);
void dmu_objset_space(objset_t *os, uint64_t *refdbytesp, uint64_t *availbytesp,
@ -127,6 +141,10 @@ objset_impl_t *dmu_objset_create_impl(spa_t *spa, struct dsl_dataset *ds,
int dmu_objset_open_impl(spa_t *spa, struct dsl_dataset *ds, blkptr_t *bp,
objset_impl_t **osip);
void dmu_objset_evict(struct dsl_dataset *ds, void *arg);
void dmu_objset_do_userquota_callbacks(objset_impl_t *os, dmu_tx_t *tx);
boolean_t dmu_objset_userused_enabled(objset_impl_t *os);
int dmu_objset_userspace_upgrade(objset_t *os);
boolean_t dmu_objset_userspace_present(objset_t *os);
#ifdef __cplusplus
}

View File

@ -98,7 +98,8 @@ enum dnode_dirtycontext {
};
/* Is dn_used in bytes? if not, it's in multiples of SPA_MINBLOCKSIZE */
#define DNODE_FLAG_USED_BYTES (1<<0)
#define DNODE_FLAG_USED_BYTES (1<<0)
#define DNODE_FLAG_USERUSED_ACCOUNTED (1<<1)
typedef struct dnode_phys {
uint8_t dn_type; /* dmu_object_type_t */
@ -131,10 +132,7 @@ typedef struct dnode {
*/
krwlock_t dn_struct_rwlock;
/*
* Our link on dataset's dd_dnodes list.
* Protected by dd_accounting_mtx.
*/
/* Our link on dn_objset->os_dnodes list; protected by os_lock. */
list_node_t dn_link;
/* immutable: */
@ -191,6 +189,9 @@ typedef struct dnode {
/* parent IO for current sync write */
zio_t *dn_zio;
/* used in syncing context */
dnode_phys_t *dn_oldphys;
/* holds prefetch structure */
struct zfetch dn_zfetch;
} dnode_t;

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -195,7 +195,7 @@ void dsl_dataset_sync(dsl_dataset_t *os, zio_t *zio, dmu_tx_t *tx);
void dsl_dataset_block_born(dsl_dataset_t *ds, blkptr_t *bp, dmu_tx_t *tx);
int dsl_dataset_block_kill(dsl_dataset_t *ds, blkptr_t *bp, zio_t *pio,
dmu_tx_t *tx);
int dsl_dataset_block_freeable(dsl_dataset_t *ds, uint64_t blk_birth);
boolean_t dsl_dataset_block_freeable(dsl_dataset_t *ds, uint64_t blk_birth);
uint64_t dsl_dataset_prev_snap_txg(dsl_dataset_t *ds);
void dsl_dataset_dirty(dsl_dataset_t *ds, dmu_tx_t *tx);

View File

@ -19,15 +19,13 @@
* CDDL HEADER END
*/
/*
* Copyright 2007 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#ifndef _SYS_DSL_DELEG_H
#define _SYS_DSL_DELEG_H
#pragma ident "%Z%%M% %I% %E% SMI"
#include <sys/dmu.h>
#include <sys/dsl_pool.h>
#include <sys/zfs_context.h>
@ -51,6 +49,10 @@ extern "C" {
#define ZFS_DELEG_PERM_ALLOW "allow"
#define ZFS_DELEG_PERM_USERPROP "userprop"
#define ZFS_DELEG_PERM_VSCAN "vscan"
#define ZFS_DELEG_PERM_USERQUOTA "userquota"
#define ZFS_DELEG_PERM_GROUPQUOTA "groupquota"
#define ZFS_DELEG_PERM_USERUSED "userused"
#define ZFS_DELEG_PERM_GROUPUSED "groupused"
/*
* Note: the names of properties that are marked delegatable are also

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -107,7 +107,6 @@ int dsl_dir_open_obj(dsl_pool_t *dp, uint64_t ddobj,
const char *tail, void *tag, dsl_dir_t **);
void dsl_dir_name(dsl_dir_t *dd, char *buf);
int dsl_dir_namelen(dsl_dir_t *dd);
int dsl_dir_is_private(dsl_dir_t *dd);
uint64_t dsl_dir_create_sync(dsl_pool_t *dp, dsl_dir_t *pds,
const char *name, dmu_tx_t *tx);
dsl_checkfunc_t dsl_dir_destroy_check;

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/

View File

@ -19,18 +19,17 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#ifndef _SYS_DSL_PROP_H
#define _SYS_DSL_PROP_H
#pragma ident "%Z%%M% %I% %E% SMI"
#include <sys/dmu.h>
#include <sys/dsl_pool.h>
#include <sys/zfs_context.h>
#include <sys/dsl_synctask.h>
#ifdef __cplusplus
extern "C" {
@ -66,8 +65,10 @@ int dsl_prop_get_ds(struct dsl_dataset *ds, const char *propname,
int dsl_prop_get_dd(struct dsl_dir *dd, const char *propname,
int intsz, int numints, void *buf, char *setpoint);
dsl_syncfunc_t dsl_props_set_sync;
int dsl_prop_set(const char *ddname, const char *propname,
int intsz, int numints, const void *buf);
int dsl_props_set(const char *dsname, nvlist_t *nvl);
void dsl_prop_set_uint64_sync(dsl_dir_t *dd, const char *name, uint64_t val,
cred_t *cr, dmu_tx_t *tx);

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -39,6 +39,8 @@ extern "C" {
typedef struct metaslab_class metaslab_class_t;
typedef struct metaslab_group metaslab_group_t;
extern space_map_ops_t *zfs_metaslab_ops;
extern metaslab_t *metaslab_init(metaslab_group_t *mg, space_map_obj_t *smo,
uint64_t start, uint64_t size, uint64_t txg);
extern void metaslab_fini(metaslab_t *msp);
@ -55,7 +57,7 @@ extern void metaslab_free(spa_t *spa, const blkptr_t *bp, uint64_t txg,
boolean_t now);
extern int metaslab_claim(spa_t *spa, const blkptr_t *bp, uint64_t txg);
extern metaslab_class_t *metaslab_class_create(void);
extern metaslab_class_t *metaslab_class_create(space_map_ops_t *ops);
extern void metaslab_class_destroy(metaslab_class_t *mc);
extern void metaslab_class_add(metaslab_class_t *mc, metaslab_group_t *mg);
extern void metaslab_class_remove(metaslab_class_t *mc, metaslab_group_t *mg);

View File

@ -19,15 +19,13 @@
* CDDL HEADER END
*/
/*
* Copyright 2006 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#ifndef _SYS_METASLAB_IMPL_H
#define _SYS_METASLAB_IMPL_H
#pragma ident "%Z%%M% %I% %E% SMI"
#include <sys/metaslab.h>
#include <sys/space_map.h>
#include <sys/vdev.h>
@ -41,6 +39,7 @@ extern "C" {
struct metaslab_class {
metaslab_group_t *mc_rotor;
uint64_t mc_allocated;
space_map_ops_t *mc_ops;
};
struct metaslab_group {

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -208,8 +208,8 @@ typedef struct blkptr {
#define DVA_SET_GANG(dva, x) BF64_SET((dva)->dva_word[1], 63, 1, x)
#define BP_GET_LSIZE(bp) \
(BP_IS_HOLE(bp) ? 0 : \
BF64_GET_SB((bp)->blk_prop, 0, 16, SPA_MINBLOCKSHIFT, 1))
BF64_GET_SB((bp)->blk_prop, 0, 16, SPA_MINBLOCKSHIFT, 1)
#define BP_SET_LSIZE(bp, x) \
BF64_SET_SB((bp)->blk_prop, 0, 16, SPA_MINBLOCKSHIFT, 1, x)
@ -329,7 +329,7 @@ extern int spa_check_rootconf(char *devpath, char *devid,
extern boolean_t spa_rootdev_validate(nvlist_t *nv);
extern int spa_import_rootpool(char *devpath, char *devid);
extern int spa_import(const char *pool, nvlist_t *config, nvlist_t *props);
extern int spa_import_faulted(const char *, nvlist_t *, nvlist_t *);
extern int spa_import_verbatim(const char *, nvlist_t *, nvlist_t *);
extern nvlist_t *spa_tryimport(nvlist_t *tryconfig);
extern int spa_destroy(char *pool);
extern int spa_export(char *pool, nvlist_t **oldconfig, boolean_t force,
@ -352,9 +352,11 @@ extern void spa_inject_delref(spa_t *spa);
extern int spa_vdev_add(spa_t *spa, nvlist_t *nvroot);
extern int spa_vdev_attach(spa_t *spa, uint64_t guid, nvlist_t *nvroot,
int replacing);
extern int spa_vdev_detach(spa_t *spa, uint64_t guid, int replace_done);
extern int spa_vdev_detach(spa_t *spa, uint64_t guid, uint64_t pguid,
int replace_done);
extern int spa_vdev_remove(spa_t *spa, uint64_t guid, boolean_t unspare);
extern int spa_vdev_setpath(spa_t *spa, uint64_t guid, const char *newpath);
extern int spa_vdev_setfru(spa_t *spa, uint64_t guid, const char *newfru);
/* spare state (which is global across all pools) */
extern void spa_spare_add(vdev_t *vd);
@ -476,6 +478,10 @@ extern boolean_t spa_has_spare(spa_t *, uint64_t guid);
extern uint64_t bp_get_dasize(spa_t *spa, const blkptr_t *bp);
extern boolean_t spa_has_slogs(spa_t *spa);
extern boolean_t spa_is_root(spa_t *spa);
extern boolean_t spa_writeable(spa_t *spa);
extern int spa_mode(spa_t *spa);
extern uint64_t zfs_strtonum(const char *str, char **nptr);
#define strtonum(str, nptr) zfs_strtonum((str), (nptr))
/* history logging */
typedef enum history_log_type {
@ -529,6 +535,7 @@ extern void spa_boot_init();
extern int spa_prop_set(spa_t *spa, nvlist_t *nvp);
extern int spa_prop_get(spa_t *spa, nvlist_t **nvp);
extern void spa_prop_clear_bootfs(spa_t *spa, uint64_t obj, dmu_tx_t *tx);
extern void spa_configfile_set(spa_t *, nvlist_t *, boolean_t);
/* asynchronous event notification */
extern void spa_event_notify(spa_t *spa, vdev_t *vdev, const char *name);
@ -546,7 +553,7 @@ _NOTE(CONSTCOND) } while (0)
#define dprintf_bp(bp, fmt, ...)
#endif
extern int spa_mode; /* mode, e.g. FREAD | FWRITE */
extern int spa_mode_global; /* mode, e.g. FREAD | FWRITE */
#ifdef __cplusplus
}

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -105,6 +105,7 @@ struct spa {
int spa_inject_ref; /* injection references */
uint8_t spa_sync_on; /* sync threads are running */
spa_load_state_t spa_load_state; /* current load operation */
boolean_t spa_load_verbatim; /* load the given config? */
taskq_t *spa_zio_taskq[ZIO_TYPES][ZIO_TASKQ_TYPES];
dsl_pool_t *spa_dsl_pool;
metaslab_class_t *spa_normal_class; /* normal data class */
@ -141,9 +142,6 @@ struct spa {
int spa_async_suspended; /* async tasks suspended */
kcondvar_t spa_async_cv; /* wait for thread_exit() */
uint16_t spa_async_tasks; /* async task mask */
kmutex_t spa_async_root_lock; /* protects async root count */
uint64_t spa_async_root_count; /* number of async root zios */
kcondvar_t spa_async_root_cv; /* notify when count == 0 */
char *spa_root; /* alternate root directory */
uint64_t spa_ena; /* spa-wide ereport ENA */
boolean_t spa_last_open_failed; /* true if last open faled */
@ -163,13 +161,14 @@ struct spa {
uint64_t spa_failmode; /* failure mode for the pool */
uint64_t spa_delegation; /* delegation on/off */
list_t spa_config_list; /* previous cache file(s) */
zio_t *spa_async_zio_root; /* root of all async I/O */
zio_t *spa_suspend_zio_root; /* root of all suspended I/O */
kmutex_t spa_suspend_lock; /* protects suspend_zio_root */
kcondvar_t spa_suspend_cv; /* notification of resume */
uint8_t spa_suspended; /* pool is suspended */
boolean_t spa_import_faulted; /* allow faulted vdevs */
boolean_t spa_is_root; /* pool is root */
int spa_minref; /* num refs when first opened */
int spa_mode; /* FREAD | FWRITE */
spa_log_state_t spa_log_state; /* log state */
/*
* spa_refcnt & spa_config_lock must be the last elements

View File

@ -19,15 +19,13 @@
* CDDL HEADER END
*/
/*
* Copyright 2006 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#ifndef _SYS_SPACE_MAP_H
#define _SYS_SPACE_MAP_H
#pragma ident "%Z%%M% %I% %E% SMI"
#include <sys/avl.h>
#include <sys/dmu.h>
@ -48,16 +46,24 @@ typedef struct space_map {
uint8_t sm_loading; /* map loading? */
kcondvar_t sm_load_cv; /* map load completion */
space_map_ops_t *sm_ops; /* space map block picker ops vector */
avl_tree_t *sm_pp_root; /* picker-private AVL tree */
void *sm_ppd; /* picker-private data */
kmutex_t *sm_lock; /* pointer to lock that protects map */
} space_map_t;
typedef struct space_seg {
avl_node_t ss_node; /* AVL node */
avl_node_t ss_pp_node; /* AVL picker-private node */
uint64_t ss_start; /* starting offset of this segment */
uint64_t ss_end; /* ending offset (non-inclusive) */
} space_seg_t;
typedef struct space_ref {
avl_node_t sr_node; /* AVL node */
uint64_t sr_offset; /* offset (start or end) */
int64_t sr_refcnt; /* associated reference count */
} space_ref_t;
typedef struct space_map_obj {
uint64_t smo_object; /* on-disk space map object */
uint64_t smo_objsize; /* size of the object */
@ -70,6 +76,7 @@ struct space_map_ops {
uint64_t (*smop_alloc)(space_map_t *sm, uint64_t size);
void (*smop_claim)(space_map_t *sm, uint64_t start, uint64_t size);
void (*smop_free)(space_map_t *sm, uint64_t start, uint64_t size);
uint64_t (*smop_max)(space_map_t *sm);
};
/*
@ -133,13 +140,12 @@ extern void space_map_create(space_map_t *sm, uint64_t start, uint64_t size,
extern void space_map_destroy(space_map_t *sm);
extern void space_map_add(space_map_t *sm, uint64_t start, uint64_t size);
extern void space_map_remove(space_map_t *sm, uint64_t start, uint64_t size);
extern int space_map_contains(space_map_t *sm, uint64_t start, uint64_t size);
extern boolean_t space_map_contains(space_map_t *sm,
uint64_t start, uint64_t size);
extern void space_map_vacate(space_map_t *sm,
space_map_func_t *func, space_map_t *mdest);
extern void space_map_walk(space_map_t *sm,
space_map_func_t *func, space_map_t *mdest);
extern void space_map_excise(space_map_t *sm, uint64_t start, uint64_t size);
extern void space_map_union(space_map_t *smd, space_map_t *sms);
extern void space_map_load_wait(space_map_t *sm);
extern int space_map_load(space_map_t *sm, space_map_ops_t *ops,
@ -149,12 +155,22 @@ extern void space_map_unload(space_map_t *sm);
extern uint64_t space_map_alloc(space_map_t *sm, uint64_t size);
extern void space_map_claim(space_map_t *sm, uint64_t start, uint64_t size);
extern void space_map_free(space_map_t *sm, uint64_t start, uint64_t size);
extern uint64_t space_map_maxsize(space_map_t *sm);
extern void space_map_sync(space_map_t *sm, uint8_t maptype,
space_map_obj_t *smo, objset_t *os, dmu_tx_t *tx);
extern void space_map_truncate(space_map_obj_t *smo,
objset_t *os, dmu_tx_t *tx);
extern void space_map_ref_create(avl_tree_t *t);
extern void space_map_ref_destroy(avl_tree_t *t);
extern void space_map_ref_add_seg(avl_tree_t *t,
uint64_t start, uint64_t end, int64_t refcnt);
extern void space_map_ref_add_map(avl_tree_t *t,
space_map_t *sm, int64_t refcnt);
extern void space_map_ref_generate_map(avl_tree_t *t,
space_map_t *sm, int64_t minref);
#ifdef __cplusplus
}
#endif

View File

@ -19,21 +19,24 @@
* CDDL HEADER END
*/
/*
* Copyright 2007 Sun Microsystems, Inc. All rights reserved.
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#ifndef _SYS_UBERBLOCK_IMPL_H
#define _SYS_UBERBLOCK_IMPL_H
#pragma ident "%Z%%M% %I% %E% SMI"
#include <sys/uberblock.h>
#ifdef __cplusplus
extern "C" {
#endif
/*
* For zdb use and debugging purposes only
*/
extern uint64_t ub_max_txg;
/*
* The uberblock version is incremented whenever an incompatible on-disk
* format change is made to the SPA, DMU, or ZAP.

View File

@ -36,6 +36,14 @@
extern "C" {
#endif
typedef enum vdev_dtl_type {
DTL_MISSING, /* 0% replication: no copies of the data */
DTL_PARTIAL, /* less than 100% replication: some copies missing */
DTL_SCRUB, /* unable to fully repair during scrub/resilver */
DTL_OUTAGE, /* temporarily missing (used to attempt detach) */
DTL_TYPES
} vdev_dtl_type_t;
extern boolean_t zfs_nocacheflush;
extern int vdev_open(vdev_t *);
@ -50,10 +58,14 @@ extern zio_t *vdev_probe(vdev_t *vd, zio_t *pio);
extern boolean_t vdev_is_bootable(vdev_t *vd);
extern vdev_t *vdev_lookup_top(spa_t *spa, uint64_t vdev);
extern vdev_t *vdev_lookup_by_guid(vdev_t *vd, uint64_t guid);
extern void vdev_dtl_dirty(space_map_t *sm, uint64_t txg, uint64_t size);
extern int vdev_dtl_contains(space_map_t *sm, uint64_t txg, uint64_t size);
extern void vdev_dtl_dirty(vdev_t *vd, vdev_dtl_type_t d,
uint64_t txg, uint64_t size);
extern boolean_t vdev_dtl_contains(vdev_t *vd, vdev_dtl_type_t d,
uint64_t txg, uint64_t size);
extern boolean_t vdev_dtl_empty(vdev_t *vd, vdev_dtl_type_t d);
extern void vdev_dtl_reassess(vdev_t *vd, uint64_t txg, uint64_t scrub_txg,
int scrub_done);
extern boolean_t vdev_dtl_required(vdev_t *vd);
extern boolean_t vdev_resilver_needed(vdev_t *vd,
uint64_t *minp, uint64_t *maxp);

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -123,8 +123,7 @@ struct vdev {
vdev_t *vdev_parent; /* parent vdev */
vdev_t **vdev_child; /* array of children */
uint64_t vdev_children; /* number of children */
space_map_t vdev_dtl_map; /* dirty time log in-core state */
space_map_t vdev_dtl_scrub; /* DTL for scrub repair writes */
space_map_t vdev_dtl[DTL_TYPES]; /* in-core dirty time logs */
vdev_stat_t vdev_stat; /* virtual device statistics */
/*
@ -149,7 +148,7 @@ struct vdev {
* Leaf vdev state.
*/
uint64_t vdev_psize; /* physical device capacity */
space_map_obj_t vdev_dtl; /* dirty time log on-disk state */
space_map_obj_t vdev_dtl_smo; /* dirty time log space map obj */
txg_node_t vdev_dtl_node; /* per-txg dirty DTL linkage */
uint64_t vdev_wholedisk; /* true if this is a whole disk */
uint64_t vdev_offline; /* persistent offline state */
@ -160,6 +159,7 @@ struct vdev {
char *vdev_path; /* vdev path (if any) */
char *vdev_devid; /* vdev devid (if any) */
char *vdev_physpath; /* vdev device path (if any) */
char *vdev_fru; /* physical FRU location */
uint64_t vdev_not_present; /* not present during import */
uint64_t vdev_unspare; /* unspare when resilvering done */
hrtime_t vdev_last_try; /* last reopen time */
@ -189,8 +189,9 @@ struct vdev {
kmutex_t vdev_probe_lock; /* protects vdev_probe_zio */
};
#define VDEV_SKIP_SIZE (8 << 10)
#define VDEV_BOOT_HEADER_SIZE (8 << 10)
#define VDEV_PAD_SIZE (8 << 10)
/* 2 padding areas (vl_pad1 and vl_pad2) to skip */
#define VDEV_SKIP_SIZE VDEV_PAD_SIZE * 2
#define VDEV_PHYS_SIZE (112 << 10)
#define VDEV_UBERBLOCK_RING (128 << 10)
@ -202,26 +203,14 @@ struct vdev {
offsetof(vdev_label_t, vl_uberblock[(n) << VDEV_UBERBLOCK_SHIFT(vd)])
#define VDEV_UBERBLOCK_SIZE(vd) (1ULL << VDEV_UBERBLOCK_SHIFT(vd))
/* ZFS boot block */
#define VDEV_BOOT_MAGIC 0x2f5b007b10cULL
#define VDEV_BOOT_VERSION 1 /* version number */
typedef struct vdev_boot_header {
uint64_t vb_magic; /* VDEV_BOOT_MAGIC */
uint64_t vb_version; /* VDEV_BOOT_VERSION */
uint64_t vb_offset; /* start offset (bytes) */
uint64_t vb_size; /* size (bytes) */
char vb_pad[VDEV_BOOT_HEADER_SIZE - 4 * sizeof (uint64_t)];
} vdev_boot_header_t;
typedef struct vdev_phys {
char vp_nvlist[VDEV_PHYS_SIZE - sizeof (zio_block_tail_t)];
zio_block_tail_t vp_zbt;
} vdev_phys_t;
typedef struct vdev_label {
char vl_pad[VDEV_SKIP_SIZE]; /* 8K */
vdev_boot_header_t vl_boot_header; /* 8K */
char vl_pad1[VDEV_PAD_SIZE]; /* 8K */
char vl_pad2[VDEV_PAD_SIZE]; /* 8K */
vdev_phys_t vl_vdev_phys; /* 112K */
char vl_uberblock[VDEV_UBERBLOCK_RING]; /* 128K */
} vdev_label_t; /* 256K total */

View File

@ -186,6 +186,9 @@ int zap_lookup_norm(objset_t *ds, uint64_t zapobj, const char *name,
matchtype_t mt, char *realname, int rn_len,
boolean_t *normalization_conflictp);
int zap_count_write(objset_t *os, uint64_t zapobj, const char *name,
int add, uint64_t *towrite, uint64_t *tooverwrite);
/*
* Create an attribute with the given name and value.
*

View File

@ -19,15 +19,13 @@
* CDDL HEADER END
*/
/*
* Copyright 2007 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#ifndef _SYS_ZAP_IMPL_H
#define _SYS_ZAP_IMPL_H
#pragma ident "%Z%%M% %I% %E% SMI"
#include <sys/zap.h>
#include <sys/zfs_context.h>
#include <sys/avl.h>
@ -195,6 +193,8 @@ int fzap_count(zap_t *zap, uint64_t *count);
int fzap_lookup(zap_name_t *zn,
uint64_t integer_size, uint64_t num_integers, void *buf,
char *realname, int rn_len, boolean_t *normalization_conflictp);
int fzap_count_write(zap_name_t *zn, int add, uint64_t *towrite,
uint64_t *tooverwrite);
int fzap_add(zap_name_t *zn, uint64_t integer_size, uint64_t num_integers,
const void *val, dmu_tx_t *tx);
int fzap_update(zap_name_t *zn,

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -113,8 +113,6 @@ typedef struct zfs_acl_phys {
uint8_t z_ace_data[ZFS_ACE_SPACE]; /* space for embedded ACEs */
} zfs_acl_phys_t;
typedef struct acl_ops {
uint32_t (*ace_mask_get) (void *acep); /* get access mask */
void (*ace_mask_set) (void *acep,
@ -160,12 +158,21 @@ typedef struct zfs_acl {
zfs_acl_node_t *z_curr_node; /* current node iterator is handling */
list_t z_acl; /* chunks of ACE data */
acl_ops_t z_ops; /* ACL operations */
boolean_t z_has_fuids; /* FUIDs present in ACL? */
} zfs_acl_t;
#define ACL_DATA_ALLOCED 0x1
#define ZFS_ACL_SIZE(aclcnt) (sizeof (ace_t) * (aclcnt))
struct zfs_fuid_info;
typedef struct zfs_acl_ids {
uint64_t z_fuid; /* file owner fuid */
uint64_t z_fgid; /* file group owner fuid */
uint64_t z_mode; /* mode to set on create */
zfs_acl_t *z_aclp; /* ACL to create with file */
struct zfs_fuid_info *z_fuidp; /* for tracking fuids for log */
} zfs_acl_ids_t;
/*
* Property values for acl_mode and acl_inherit.
*
@ -182,11 +189,12 @@ typedef struct zfs_acl {
struct znode;
struct zfsvfs;
struct zfs_fuid_info;
#ifdef _KERNEL
void zfs_perm_init(struct znode *, struct znode *, int, vattr_t *,
dmu_tx_t *, cred_t *, zfs_acl_t *, zfs_fuid_info_t **);
int zfs_acl_ids_create(struct znode *, int, vattr_t *,
cred_t *, vsecattr_t *, zfs_acl_ids_t *);
void zfs_acl_ids_free(zfs_acl_ids_t *);
boolean_t zfs_acl_ids_overquota(struct zfsvfs *, zfs_acl_ids_t *);
int zfs_getacl(struct znode *, vsecattr_t *, boolean_t, cred_t *);
int zfs_setacl(struct znode *, vsecattr_t *, boolean_t, cred_t *);
void zfs_acl_rele(void *);
@ -201,9 +209,9 @@ int zfs_zaccess_delete(struct znode *, struct znode *, cred_t *);
int zfs_zaccess_rename(struct znode *, struct znode *,
struct znode *, struct znode *, cred_t *cr);
void zfs_acl_free(zfs_acl_t *);
int zfs_vsec_2_aclp(struct zfsvfs *, vtype_t, vsecattr_t *, zfs_acl_t **);
int zfs_aclset_common(struct znode *, zfs_acl_t *, cred_t *,
struct zfs_fuid_info **, dmu_tx_t *);
int zfs_vsec_2_aclp(struct zfsvfs *, vtype_t, vsecattr_t *, cred_t *,
struct zfs_fuid_info **, zfs_acl_t **);
int zfs_aclset_common(struct znode *, zfs_acl_t *, cred_t *, dmu_tx_t *);
#endif

View File

@ -134,4 +134,6 @@ extern struct mtx zfs_debug_mtx;
} \
} while (0)
#define sys_shutdown rebooting
#endif /* _SYS_ZFS_CONTEXT_H */

View File

@ -19,15 +19,13 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#ifndef _ZFS_CTLDIR_H
#define _ZFS_CTLDIR_H
#pragma ident "%Z%%M% %I% %E% SMI"
#include <sys/vnode.h>
#include <sys/zfs_vfsops.h>
#include <sys/zfs_znode.h>
@ -63,6 +61,7 @@ int zfsctl_lookup_objset(vfs_t *vfsp, uint64_t objsetid, zfsvfs_t **zfsvfsp);
#define ZFSCTL_INO_ROOT 0x1
#define ZFSCTL_INO_SNAPDIR 0x2
#define ZFSCTL_INO_SHARES 0x3
#ifdef __cplusplus
}

View File

@ -49,7 +49,6 @@ extern "C" {
/* mknode flags */
#define IS_ROOT_NODE 0x01 /* create a root node */
#define IS_XATTR 0x02 /* create an extended attribute node */
#define IS_REPLAY 0x04 /* we are replaying intent log */
extern int zfs_dirent_lock(zfs_dirlock_t **, znode_t *, char *, znode_t **,
int, int *, pathname_t *);
@ -60,7 +59,7 @@ extern int zfs_link_destroy(zfs_dirlock_t *, znode_t *, dmu_tx_t *, int,
extern int zfs_dirlook(znode_t *, char *, vnode_t **, int, int *,
pathname_t *);
extern void zfs_mknode(znode_t *, vattr_t *, dmu_tx_t *, cred_t *,
uint_t, znode_t **, int, zfs_acl_t *, zfs_fuid_info_t **);
uint_t, znode_t **, int, zfs_acl_ids_t *);
extern void zfs_rmnode(znode_t *);
extern void zfs_dl_name_switch(zfs_dirlock_t *dl, char *new, char **old);
extern boolean_t zfs_dirempty(znode_t *);

View File

@ -19,15 +19,13 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#ifndef _SYS_FS_ZFS_FUID_H
#define _SYS_FS_ZFS_FUID_H
#pragma ident "%Z%%M% %I% %E% SMI"
#include <sys/types.h>
#ifdef _KERNEL
#include <sys/kidmap.h>
@ -51,11 +49,11 @@ typedef enum {
* Estimate space needed for one more fuid table entry.
* for now assume its current size + 1K
*/
#define FUID_SIZE_ESTIMATE(z) (z->z_fuid_size + (SPA_MINBLOCKSIZE << 1))
#define FUID_SIZE_ESTIMATE(z) ((z)->z_fuid_size + (SPA_MINBLOCKSIZE << 1))
#define FUID_INDEX(x) (x >> 32)
#define FUID_RID(x) (x & 0xffffffff)
#define FUID_ENCODE(idx, rid) ((idx << 32) | rid)
#define FUID_INDEX(x) ((x) >> 32)
#define FUID_RID(x) ((x) & 0xffffffff)
#define FUID_ENCODE(idx, rid) (((uint64_t)(idx) << 32) | (rid))
/*
* FUIDs cause problems for the intent log
* we need to replay the creation of the FUID,
@ -104,17 +102,23 @@ struct znode;
extern uid_t zfs_fuid_map_id(zfsvfs_t *, uint64_t, cred_t *, zfs_fuid_type_t);
extern void zfs_fuid_destroy(zfsvfs_t *);
extern uint64_t zfs_fuid_create_cred(zfsvfs_t *, zfs_fuid_type_t,
dmu_tx_t *, cred_t *, zfs_fuid_info_t **);
cred_t *, zfs_fuid_info_t **);
extern uint64_t zfs_fuid_create(zfsvfs_t *, uint64_t, cred_t *, zfs_fuid_type_t,
dmu_tx_t *, zfs_fuid_info_t **);
extern void zfs_fuid_map_ids(struct znode *zp, cred_t *cr, uid_t *uid,
uid_t *gid);
zfs_fuid_info_t **);
extern void zfs_fuid_map_ids(struct znode *zp, cred_t *cr,
uid_t *uid, uid_t *gid);
extern zfs_fuid_info_t *zfs_fuid_info_alloc(void);
extern void zfs_fuid_info_free();
extern void zfs_fuid_info_free(zfs_fuid_info_t *);
extern boolean_t zfs_groupmember(zfsvfs_t *, uint64_t, cred_t *);
void zfs_fuid_sync(zfsvfs_t *, dmu_tx_t *);
extern int zfs_fuid_find_by_domain(zfsvfs_t *, const char *domain,
char **retdomain, boolean_t addok);
extern const char *zfs_fuid_find_by_idx(zfsvfs_t *zfsvfs, uint32_t idx);
extern void zfs_fuid_txhold(zfsvfs_t *zfsvfs, dmu_tx_t *tx);
#endif
char *zfs_fuid_idx_domain(avl_tree_t *, uint32_t);
void zfs_fuid_avl_tree_create(avl_tree_t *, avl_tree_t *);
uint64_t zfs_fuid_table_load(objset_t *, uint64_t, avl_tree_t *, avl_tree_t *);
void zfs_fuid_table_destroy(avl_tree_t *, avl_tree_t *);

View File

@ -169,6 +169,13 @@ typedef struct zfs_cmd {
zinject_record_t zc_inject_record;
} zfs_cmd_t;
typedef struct zfs_useracct {
char zu_domain[256];
uid_t zu_rid;
uint32_t zu_pad;
uint64_t zu_space;
} zfs_useracct_t;
#define ZVOL_MAX_MINOR (1 << 16)
#define ZFS_MIN_MINOR (ZVOL_MAX_MINOR + 1)

View File

@ -19,15 +19,13 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#ifndef _SYS_FS_ZFS_VFSOPS_H
#define _SYS_FS_ZFS_VFSOPS_H
#pragma ident "%Z%%M% %I% %E% SMI"
#include <sys/list.h>
#include <sys/vfs.h>
#include <sys/zil.h>
@ -47,13 +45,13 @@ struct zfsvfs {
uint64_t z_root; /* id of root znode */
uint64_t z_unlinkedobj; /* id of unlinked zapobj */
uint64_t z_max_blksz; /* maximum block size for files */
uint64_t z_assign; /* TXG_NOWAIT or set by zil_replay() */
uint64_t z_fuid_obj; /* fuid table object number */
uint64_t z_fuid_size; /* fuid table size */
avl_tree_t z_fuid_idx; /* fuid tree keyed by index */
avl_tree_t z_fuid_domain; /* fuid tree keyed by domain */
krwlock_t z_fuid_lock; /* fuid lock */
boolean_t z_fuid_loaded; /* fuid tables are loaded */
boolean_t z_fuid_dirty; /* need to sync fuid table ? */
struct zfs_fuid_info *z_fuid_replay; /* fuid info for replay */
zilog_t *z_log; /* intent log pointer */
uint_t z_acl_mode; /* acl chmod/mode behavior */
@ -72,8 +70,13 @@ struct zfsvfs {
boolean_t z_issnap; /* true if this is a snapshot */
boolean_t z_vscan; /* virus scan on/off */
boolean_t z_use_fuids; /* version allows fuids */
kmutex_t z_online_recv_lock; /* recv in prog grabs as WRITER */
boolean_t z_replay; /* set during ZIL replay */
kmutex_t z_online_recv_lock; /* held while recv in progress */
uint64_t z_version; /* ZPL version */
uint64_t z_shares_dir; /* hidden shares dir */
kmutex_t z_lock;
uint64_t z_userquota_obj;
uint64_t z_groupquota_obj;
#define ZFS_OBJ_MTX_SZ 64
kmutex_t z_hold_mtx[ZFS_OBJ_MTX_SZ]; /* znode hold locks */
};
@ -131,6 +134,17 @@ extern int zfs_super_owner;
extern int zfs_suspend_fs(zfsvfs_t *zfsvfs, char *osname, int *mode);
extern int zfs_resume_fs(zfsvfs_t *zfsvfs, const char *osname, int mode);
extern int zfs_userspace_one(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
const char *domain, uint64_t rid, uint64_t *valuep);
extern int zfs_userspace_many(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
uint64_t *cookiep, void *vbuf, uint64_t *bufsizep);
extern int zfs_set_userquota(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
const char *domain, uint64_t rid, uint64_t quota);
extern boolean_t zfs_usergroup_overquota(zfsvfs_t *zfsvfs,
boolean_t isgroup, uint64_t fuid);
extern int zfs_set_version(zfsvfs_t *zfsvfs, uint64_t newvers);
extern int zfsvfs_create(const char *name, int mode, zfsvfs_t **zvp);
extern void zfsvfs_free(zfsvfs_t *zfsvfs);
#ifdef __cplusplus
}

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -100,6 +100,7 @@ extern "C" {
#define ZFS_ROOT_OBJ "ROOT"
#define ZPL_VERSION_STR "VERSION"
#define ZFS_FUID_TABLES "FUID"
#define ZFS_SHARES_DIR "SHARES"
#define ZFS_MAX_BLOCKSIZE (SPA_MAXBLOCKSIZE)
@ -186,7 +187,6 @@ typedef struct znode {
vnode_t *z_vnode;
uint64_t z_id; /* object ID for this znode */
kmutex_t z_lock; /* znode modification lock */
krwlock_t z_map_lock; /* page map lock */
krwlock_t z_parent_lock; /* parent lock for directories */
krwlock_t z_name_lock; /* "master" lock for dirent locks */
zfs_dirlock_t *z_dirlocks; /* directory entry lock list */
@ -338,7 +338,6 @@ extern void zfs_remove_op_tables();
extern int zfs_create_op_tables();
extern dev_t zfs_cmpldev(uint64_t);
extern int zfs_get_zplprop(objset_t *os, zfs_prop_t prop, uint64_t *value);
extern int zfs_set_version(const char *name, uint64_t newvers);
extern int zfs_get_stats(objset_t *os, nvlist_t *nv);
extern void zfs_znode_dmu_fini(znode_t *);
@ -367,6 +366,7 @@ extern void zfs_log_acl(zilog_t *zilog, dmu_tx_t *tx, znode_t *zp,
#endif
extern void zfs_xvattr_set(znode_t *zp, xvattr_t *xvap);
extern void zfs_upgrade(zfsvfs_t *zfsvfs, dmu_tx_t *tx);
extern int zfs_create_share_dir(zfsvfs_t *zfsvfs, dmu_tx_t *tx);
extern zil_get_data_t zfs_get_data;
extern zil_replay_func_t *zfs_replay_vector[TX_MAX_TYPE];

View File

@ -139,7 +139,8 @@ typedef enum zil_create {
#define TX_MKDIR_ACL 17 /* mkdir with ACL */
#define TX_MKDIR_ATTR 18 /* mkdir with attr */
#define TX_MKDIR_ACL_ATTR 19 /* mkdir with ACL + attrs */
#define TX_MAX_TYPE 20 /* Max transaction type */
#define TX_WRITE2 20 /* dmu_sync EALREADY write */
#define TX_MAX_TYPE 21 /* Max transaction type */
/*
* The transactions for mkdir, symlink, remove, rmdir, link, and rename
@ -341,7 +342,6 @@ typedef void zil_parse_blk_func_t(zilog_t *zilog, blkptr_t *bp, void *arg,
typedef void zil_parse_lr_func_t(zilog_t *zilog, lr_t *lr, void *arg,
uint64_t txg);
typedef int zil_replay_func_t();
typedef void zil_replay_cleaner_t();
typedef int zil_get_data_t(void *arg, lr_write_t *lr, char *dbuf, zio_t *zio);
extern uint64_t zil_parse(zilog_t *zilog, zil_parse_blk_func_t *parse_blk_func,
@ -356,9 +356,8 @@ extern void zil_free(zilog_t *zilog);
extern zilog_t *zil_open(objset_t *os, zil_get_data_t *get_data);
extern void zil_close(zilog_t *zilog);
extern void zil_replay(objset_t *os, void *arg, uint64_t *txgp,
zil_replay_func_t *replay_func[TX_MAX_TYPE],
zil_replay_cleaner_t *replay_cleaner);
extern void zil_replay(objset_t *os, void *arg,
zil_replay_func_t *replay_func[TX_MAX_TYPE]);
extern void zil_destroy(zilog_t *zilog, boolean_t keep_first);
extern void zil_rollback_destroy(zilog_t *zilog, dmu_tx_t *tx);
@ -378,6 +377,7 @@ extern int zil_suspend(zilog_t *zilog);
extern void zil_resume(zilog_t *zilog);
extern void zil_add_block(zilog_t *zilog, blkptr_t *bp);
extern void zil_get_replay_data(zilog_t *zilog, lr_write_t *lr);
extern int zil_disable;

View File

@ -19,15 +19,13 @@
* CDDL HEADER END
*/
/*
* Copyright 2007 Sun Microsystems, Inc. All rights reserved.
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#ifndef _SYS_ZIL_IMPL_H
#define _SYS_ZIL_IMPL_H
#pragma ident "%Z%%M% %I% %E% SMI"
#include <sys/zil.h>
#include <sys/dmu_objset.h>
@ -74,13 +72,14 @@ struct zilog {
uint64_t zl_commit_seq; /* committed upto this number */
uint64_t zl_lr_seq; /* log record sequence number */
uint64_t zl_destroy_txg; /* txg of last zil_destroy() */
uint64_t zl_replay_seq[TXG_SIZE]; /* seq of last replayed rec */
uint64_t zl_replayed_seq[TXG_SIZE]; /* last replayed rec seq */
uint64_t zl_replaying_seq; /* current replay seq number */
uint32_t zl_suspend; /* log suspend count */
kcondvar_t zl_cv_writer; /* log writer thread completion */
kcondvar_t zl_cv_suspend; /* log suspend completion */
uint8_t zl_suspending; /* log is currently suspending */
uint8_t zl_keep_first; /* keep first log block in destroy */
uint8_t zl_stop_replay; /* don't replay any further */
uint8_t zl_replay; /* replaying records while set */
uint8_t zl_stop_sync; /* for debugging */
uint8_t zl_writer; /* boolean: write setup in progress */
uint8_t zl_log_error; /* boolean: log write error */

View File

@ -20,7 +20,7 @@
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -132,12 +132,15 @@ enum zio_compress {
#define ZIO_FLAG_IO_RETRY 0x00400
#define ZIO_FLAG_IO_REWRITE 0x00800
#define ZIO_FLAG_PROBE 0x01000
#define ZIO_FLAG_SELF_HEAL 0x01000
#define ZIO_FLAG_RESILVER 0x02000
#define ZIO_FLAG_SCRUB 0x04000
#define ZIO_FLAG_SCRUB_THREAD 0x08000
#define ZIO_FLAG_GANG_CHILD 0x10000
#define ZIO_FLAG_PROBE 0x10000
#define ZIO_FLAG_GANG_CHILD 0x20000
#define ZIO_FLAG_RAW 0x40000
#define ZIO_FLAG_GODFATHER 0x80000
#define ZIO_FLAG_GANG_INHERIT \
(ZIO_FLAG_CANFAIL | \
@ -146,6 +149,7 @@ enum zio_compress {
ZIO_FLAG_DONT_RETRY | \
ZIO_FLAG_DONT_CACHE | \
ZIO_FLAG_DONT_AGGREGATE | \
ZIO_FLAG_SELF_HEAL | \
ZIO_FLAG_RESILVER | \
ZIO_FLAG_SCRUB | \
ZIO_FLAG_SCRUB_THREAD)
@ -156,6 +160,14 @@ enum zio_compress {
ZIO_FLAG_IO_RETRY | \
ZIO_FLAG_PROBE)
#define ZIO_FLAG_AGG_INHERIT \
(ZIO_FLAG_DONT_AGGREGATE | \
ZIO_FLAG_IO_REPAIR | \
ZIO_FLAG_SELF_HEAL | \
ZIO_FLAG_RESILVER | \
ZIO_FLAG_SCRUB | \
ZIO_FLAG_SCRUB_THREAD)
#define ZIO_PIPELINE_CONTINUE 0x100
#define ZIO_PIPELINE_STOP 0x101
@ -254,6 +266,13 @@ typedef int zio_pipe_stage_t(zio_t *zio);
#define ZIO_REEXECUTE_NOW 0x01
#define ZIO_REEXECUTE_SUSPEND 0x02
typedef struct zio_link {
zio_t *zl_parent;
zio_t *zl_child;
list_node_t zl_parent_node;
list_node_t zl_child_node;
} zio_link_t;
struct zio {
/* Core information about this I/O */
zbookmark_t io_bookmark;
@ -263,15 +282,14 @@ struct zio {
int io_cmd;
uint8_t io_priority;
uint8_t io_reexecute;
uint8_t io_async_root;
uint8_t io_state[ZIO_WAIT_TYPES];
uint64_t io_txg;
spa_t *io_spa;
blkptr_t *io_bp;
blkptr_t io_bp_copy;
zio_t *io_parent;
zio_t *io_child;
zio_t *io_sibling_prev;
zio_t *io_sibling_next;
list_t io_parent_list;
list_t io_child_list;
zio_link_t *io_walk_link;
zio_t *io_logical;
zio_transform_t *io_transform_stack;
@ -294,8 +312,6 @@ struct zio {
avl_node_t io_offset_node;
avl_node_t io_deadline_node;
avl_tree_t *io_vdev_tree;
zio_t *io_delegate_list;
zio_t *io_delegate_next;
/* Internal pipeline state */
int io_flags;
@ -308,6 +324,7 @@ struct zio {
int io_child_error[ZIO_CHILD_TYPES];
uint64_t io_children[ZIO_CHILD_TYPES][ZIO_WAIT_TYPES];
uint64_t *io_stall;
zio_t *io_gang_leader;
zio_gang_node_t *io_gang_tree;
void *io_executor;
void *io_waiter;
@ -323,7 +340,7 @@ struct zio {
#endif
};
extern zio_t *zio_null(zio_t *pio, spa_t *spa,
extern zio_t *zio_null(zio_t *pio, spa_t *spa, vdev_t *vd,
zio_done_func_t *done, void *private, int flags);
extern zio_t *zio_root(spa_t *spa,
@ -371,6 +388,11 @@ extern void zio_nowait(zio_t *zio);
extern void zio_execute(zio_t *zio);
extern void zio_interrupt(zio_t *zio);
extern zio_t *zio_walk_parents(zio_t *cio);
extern zio_t *zio_walk_children(zio_t *pio);
extern zio_t *zio_unique_parent(zio_t *cio);
extern void zio_add_child(zio_t *pio, zio_t *cio);
extern void *zio_buf_alloc(size_t size);
extern void zio_buf_free(void *buf, size_t size);
extern void *zio_data_buf_alloc(size_t size);
@ -397,7 +419,7 @@ extern uint8_t zio_checksum_select(uint8_t child, uint8_t parent);
extern uint8_t zio_compress_select(uint8_t child, uint8_t parent);
extern void zio_suspend(spa_t *spa, zio_t *zio);
extern void zio_resume(spa_t *spa);
extern int zio_resume(spa_t *spa);
extern void zio_resume_wait(spa_t *spa);
/*

View File

@ -20,7 +20,7 @@
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -327,8 +327,10 @@ vdev_alloc_common(spa_t *spa, uint_t id, uint64_t guid, vdev_ops_t *ops)
mutex_init(&vd->vdev_dtl_lock, NULL, MUTEX_DEFAULT, NULL);
mutex_init(&vd->vdev_stat_lock, NULL, MUTEX_DEFAULT, NULL);
mutex_init(&vd->vdev_probe_lock, NULL, MUTEX_DEFAULT, NULL);
space_map_create(&vd->vdev_dtl_map, 0, -1ULL, 0, &vd->vdev_dtl_lock);
space_map_create(&vd->vdev_dtl_scrub, 0, -1ULL, 0, &vd->vdev_dtl_lock);
for (int t = 0; t < DTL_TYPES; t++) {
space_map_create(&vd->vdev_dtl[t], 0, -1ULL, 0,
&vd->vdev_dtl_lock);
}
txg_list_create(&vd->vdev_ms_list,
offsetof(struct metaslab, ms_txg_node));
txg_list_create(&vd->vdev_dtl_list,
@ -444,6 +446,8 @@ vdev_alloc(spa_t *spa, vdev_t **vdp, nvlist_t *nv, vdev_t *parent, uint_t id,
if (nvlist_lookup_string(nv, ZPOOL_CONFIG_PHYS_PATH,
&vd->vdev_physpath) == 0)
vd->vdev_physpath = spa_strdup(vd->vdev_physpath);
if (nvlist_lookup_string(nv, ZPOOL_CONFIG_FRU, &vd->vdev_fru) == 0)
vd->vdev_fru = spa_strdup(vd->vdev_fru);
/*
* Set the whole_disk property. If it's not specified, leave the value
@ -457,9 +461,8 @@ vdev_alloc(spa_t *spa, vdev_t **vdp, nvlist_t *nv, vdev_t *parent, uint_t id,
* Look for the 'not present' flag. This will only be set if the device
* was not present at the time of import.
*/
if (!spa->spa_import_faulted)
(void) nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT,
&vd->vdev_not_present);
(void) nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT,
&vd->vdev_not_present);
/*
* Get the alignment requirement.
@ -485,7 +488,7 @@ vdev_alloc(spa_t *spa, vdev_t **vdp, nvlist_t *nv, vdev_t *parent, uint_t id,
(alloctype == VDEV_ALLOC_LOAD || alloctype == VDEV_ALLOC_L2CACHE)) {
if (alloctype == VDEV_ALLOC_LOAD) {
(void) nvlist_lookup_uint64(nv, ZPOOL_CONFIG_DTL,
&vd->vdev_dtl.smo_object);
&vd->vdev_dtl_smo.smo_object);
(void) nvlist_lookup_uint64(nv, ZPOOL_CONFIG_UNSPARE,
&vd->vdev_unspare);
}
@ -569,6 +572,8 @@ vdev_free(vdev_t *vd)
spa_strfree(vd->vdev_devid);
if (vd->vdev_physpath)
spa_strfree(vd->vdev_physpath);
if (vd->vdev_fru)
spa_strfree(vd->vdev_fru);
if (vd->vdev_isspare)
spa_spare_remove(vd);
@ -577,12 +582,14 @@ vdev_free(vdev_t *vd)
txg_list_destroy(&vd->vdev_ms_list);
txg_list_destroy(&vd->vdev_dtl_list);
mutex_enter(&vd->vdev_dtl_lock);
space_map_unload(&vd->vdev_dtl_map);
space_map_destroy(&vd->vdev_dtl_map);
space_map_vacate(&vd->vdev_dtl_scrub, NULL, NULL);
space_map_destroy(&vd->vdev_dtl_scrub);
for (int t = 0; t < DTL_TYPES; t++) {
space_map_unload(&vd->vdev_dtl[t]);
space_map_destroy(&vd->vdev_dtl[t]);
}
mutex_exit(&vd->vdev_dtl_lock);
mutex_destroy(&vd->vdev_dtl_lock);
mutex_destroy(&vd->vdev_stat_lock);
mutex_destroy(&vd->vdev_probe_lock);
@ -720,14 +727,18 @@ vdev_remove_parent(vdev_t *cvd)
vdev_remove_child(mvd, cvd);
vdev_remove_child(pvd, mvd);
/*
* If cvd will replace mvd as a top-level vdev, preserve mvd's guid.
* Otherwise, we could have detached an offline device, and when we
* go to import the pool we'll think we have two top-level vdevs,
* instead of a different version of the same top-level vdev.
*/
if (mvd->vdev_top == mvd)
cvd->vdev_guid = cvd->vdev_guid_sum = mvd->vdev_guid;
if (mvd->vdev_top == mvd) {
uint64_t guid_delta = mvd->vdev_guid - cvd->vdev_guid;
cvd->vdev_guid += guid_delta;
cvd->vdev_guid_sum += guid_delta;
}
cvd->vdev_id = mvd->vdev_id;
vdev_add_child(pvd, cvd);
vdev_top_update(cvd->vdev_top, cvd->vdev_top);
@ -779,7 +790,8 @@ vdev_metaslab_init(vdev_t *vd, uint64_t txg)
if (txg == 0) {
uint64_t object = 0;
error = dmu_read(mos, vd->vdev_ms_array,
m * sizeof (uint64_t), sizeof (uint64_t), &object);
m * sizeof (uint64_t), sizeof (uint64_t), &object,
DMU_READ_PREFETCH);
if (error)
return (error);
if (object != 0) {
@ -819,22 +831,22 @@ typedef struct vdev_probe_stats {
boolean_t vps_readable;
boolean_t vps_writeable;
int vps_flags;
zio_t *vps_root;
vdev_t *vps_vd;
} vdev_probe_stats_t;
static void
vdev_probe_done(zio_t *zio)
{
spa_t *spa = zio->io_spa;
vdev_t *vd = zio->io_vd;
vdev_probe_stats_t *vps = zio->io_private;
vdev_t *vd = vps->vps_vd;
ASSERT(vd->vdev_probe_zio != NULL);
if (zio->io_type == ZIO_TYPE_READ) {
ASSERT(zio->io_vd == vd);
if (zio->io_error == 0)
vps->vps_readable = 1;
if (zio->io_error == 0 && (spa_mode & FWRITE)) {
zio_nowait(zio_write_phys(vps->vps_root, vd,
if (zio->io_error == 0 && spa_writeable(spa)) {
zio_nowait(zio_write_phys(vd->vdev_probe_zio, vd,
zio->io_offset, zio->io_size, zio->io_data,
ZIO_CHECKSUM_OFF, vdev_probe_done, vps,
ZIO_PRIORITY_SYNC_WRITE, vps->vps_flags, B_TRUE));
@ -842,26 +854,34 @@ vdev_probe_done(zio_t *zio)
zio_buf_free(zio->io_data, zio->io_size);
}
} else if (zio->io_type == ZIO_TYPE_WRITE) {
ASSERT(zio->io_vd == vd);
if (zio->io_error == 0)
vps->vps_writeable = 1;
zio_buf_free(zio->io_data, zio->io_size);
} else if (zio->io_type == ZIO_TYPE_NULL) {
ASSERT(zio->io_vd == NULL);
ASSERT(zio == vps->vps_root);
zio_t *pio;
vd->vdev_cant_read |= !vps->vps_readable;
vd->vdev_cant_write |= !vps->vps_writeable;
if (vdev_readable(vd) &&
(vdev_writeable(vd) || !(spa_mode & FWRITE))) {
(vdev_writeable(vd) || !spa_writeable(spa))) {
zio->io_error = 0;
} else {
ASSERT(zio->io_error != 0);
zfs_ereport_post(FM_EREPORT_ZFS_PROBE_FAILURE,
zio->io_spa, vd, NULL, 0, 0);
spa, vd, NULL, 0, 0);
zio->io_error = ENXIO;
}
mutex_enter(&vd->vdev_probe_lock);
ASSERT(vd->vdev_probe_zio == zio);
vd->vdev_probe_zio = NULL;
mutex_exit(&vd->vdev_probe_lock);
while ((pio = zio_walk_parents(zio)) != NULL)
if (!vdev_accessible(vd, pio))
pio->io_error = ENXIO;
kmem_free(vps, sizeof (*vps));
}
}
@ -872,53 +892,90 @@ vdev_probe_done(zio_t *zio)
* but the first (which we leave alone in case it contains a VTOC).
*/
zio_t *
vdev_probe(vdev_t *vd, zio_t *pio)
vdev_probe(vdev_t *vd, zio_t *zio)
{
spa_t *spa = vd->vdev_spa;
vdev_probe_stats_t *vps;
zio_t *zio;
vps = kmem_zalloc(sizeof (*vps), KM_SLEEP);
vps->vps_flags = ZIO_FLAG_CANFAIL | ZIO_FLAG_PROBE |
ZIO_FLAG_DONT_CACHE | ZIO_FLAG_DONT_AGGREGATE | ZIO_FLAG_DONT_RETRY;
if (spa_config_held(spa, SCL_ZIO, RW_WRITER)) {
/*
* vdev_cant_read and vdev_cant_write can only transition
* from TRUE to FALSE when we have the SCL_ZIO lock as writer;
* otherwise they can only transition from FALSE to TRUE.
* This ensures that any zio looking at these values can
* assume that failures persist for the life of the I/O.
* That's important because when a device has intermittent
* connectivity problems, we want to ensure that they're
* ascribed to the device (ENXIO) and not the zio (EIO).
*
* Since we hold SCL_ZIO as writer here, clear both values
* so the probe can reevaluate from first principles.
*/
vps->vps_flags |= ZIO_FLAG_CONFIG_WRITER;
vd->vdev_cant_read = B_FALSE;
vd->vdev_cant_write = B_FALSE;
}
vdev_probe_stats_t *vps = NULL;
zio_t *pio;
ASSERT(vd->vdev_ops->vdev_op_leaf);
zio = zio_null(pio, spa, vdev_probe_done, vps, vps->vps_flags);
/*
* Don't probe the probe.
*/
if (zio && (zio->io_flags & ZIO_FLAG_PROBE))
return (NULL);
vps->vps_root = zio;
vps->vps_vd = vd;
/*
* To prevent 'probe storms' when a device fails, we create
* just one probe i/o at a time. All zios that want to probe
* this vdev will become parents of the probe io.
*/
mutex_enter(&vd->vdev_probe_lock);
if ((pio = vd->vdev_probe_zio) == NULL) {
vps = kmem_zalloc(sizeof (*vps), KM_SLEEP);
vps->vps_flags = ZIO_FLAG_CANFAIL | ZIO_FLAG_PROBE |
ZIO_FLAG_DONT_CACHE | ZIO_FLAG_DONT_AGGREGATE |
ZIO_FLAG_DONT_RETRY;
if (spa_config_held(spa, SCL_ZIO, RW_WRITER)) {
/*
* vdev_cant_read and vdev_cant_write can only
* transition from TRUE to FALSE when we have the
* SCL_ZIO lock as writer; otherwise they can only
* transition from FALSE to TRUE. This ensures that
* any zio looking at these values can assume that
* failures persist for the life of the I/O. That's
* important because when a device has intermittent
* connectivity problems, we want to ensure that
* they're ascribed to the device (ENXIO) and not
* the zio (EIO).
*
* Since we hold SCL_ZIO as writer here, clear both
* values so the probe can reevaluate from first
* principles.
*/
vps->vps_flags |= ZIO_FLAG_CONFIG_WRITER;
vd->vdev_cant_read = B_FALSE;
vd->vdev_cant_write = B_FALSE;
}
vd->vdev_probe_zio = pio = zio_null(NULL, spa, vd,
vdev_probe_done, vps,
vps->vps_flags | ZIO_FLAG_DONT_PROPAGATE);
if (zio != NULL) {
vd->vdev_probe_wanted = B_TRUE;
spa_async_request(spa, SPA_ASYNC_PROBE);
}
}
if (zio != NULL)
zio_add_child(zio, pio);
mutex_exit(&vd->vdev_probe_lock);
if (vps == NULL) {
ASSERT(zio != NULL);
return (NULL);
}
for (int l = 1; l < VDEV_LABELS; l++) {
zio_nowait(zio_read_phys(zio, vd,
zio_nowait(zio_read_phys(pio, vd,
vdev_label_offset(vd->vdev_psize, l,
offsetof(vdev_label_t, vl_pad)),
VDEV_SKIP_SIZE, zio_buf_alloc(VDEV_SKIP_SIZE),
offsetof(vdev_label_t, vl_pad2)),
VDEV_PAD_SIZE, zio_buf_alloc(VDEV_PAD_SIZE),
ZIO_CHECKSUM_OFF, vdev_probe_done, vps,
ZIO_PRIORITY_SYNC_READ, vps->vps_flags, B_TRUE));
}
return (zio);
if (zio == NULL)
return (pio);
zio_nowait(pio);
return (NULL);
}
/*
@ -927,12 +984,15 @@ vdev_probe(vdev_t *vd, zio_t *pio)
int
vdev_open(vdev_t *vd)
{
spa_t *spa = vd->vdev_spa;
int error;
int c;
uint64_t osize = 0;
uint64_t asize, psize;
uint64_t ashift = 0;
ASSERT(spa_config_held(spa, SCL_STATE_ALL, RW_WRITER) == SCL_STATE_ALL);
ASSERT(vd->vdev_state == VDEV_STATE_CLOSED ||
vd->vdev_state == VDEV_STATE_CANT_OPEN ||
vd->vdev_state == VDEV_STATE_OFFLINE);
@ -1066,16 +1126,12 @@ vdev_open(vdev_t *vd)
/*
* If a leaf vdev has a DTL, and seems healthy, then kick off a
* resilver. But don't do this if we are doing a reopen for a
* scrub, since this would just restart the scrub we are already
* doing.
* resilver. But don't do this if we are doing a reopen for a scrub,
* since this would just restart the scrub we are already doing.
*/
if (vd->vdev_children == 0 && !vd->vdev_spa->spa_scrub_reopen) {
mutex_enter(&vd->vdev_dtl_lock);
if (vd->vdev_dtl_map.sm_space != 0 && vdev_writeable(vd))
spa_async_request(vd->vdev_spa, SPA_ASYNC_RESILVER);
mutex_exit(&vd->vdev_dtl_lock);
}
if (vd->vdev_ops->vdev_op_leaf && !spa->spa_scrub_reopen &&
vdev_resilver_needed(vd, NULL, NULL))
spa_async_request(spa, SPA_ASYNC_RESILVER);
return (0);
}
@ -1154,7 +1210,12 @@ vdev_validate(vdev_t *vd)
nvlist_free(label);
if (spa->spa_load_state == SPA_LOAD_OPEN &&
/*
* If spa->spa_load_verbatim is true, no need to check the
* state of the pool.
*/
if (!spa->spa_load_verbatim &&
spa->spa_load_state == SPA_LOAD_OPEN &&
state != POOL_STATE_ACTIVE)
return (EBADF);
@ -1176,6 +1237,10 @@ vdev_validate(vdev_t *vd)
void
vdev_close(vdev_t *vd)
{
spa_t *spa = vd->vdev_spa;
ASSERT(spa_config_held(spa, SCL_STATE_ALL, RW_WRITER) == SCL_STATE_ALL);
vd->vdev_ops->vdev_op_close(vd);
vdev_cache_purge(vd);
@ -1212,6 +1277,7 @@ vdev_reopen(vdev_t *vd)
if (vd->vdev_aux) {
(void) vdev_validate_aux(vd);
if (vdev_readable(vd) && vdev_writeable(vd) &&
vd->vdev_aux == &spa->spa_l2cache &&
!l2arc_vdev_present(vd)) {
uint64_t size = vdev_get_rsize(vd);
l2arc_add_vdev(spa, vd,
@ -1294,34 +1360,88 @@ vdev_dirty(vdev_t *vd, int flags, void *arg, uint64_t txg)
(void) txg_list_add(&vd->vdev_spa->spa_vdev_txg_list, vd, txg);
}
/*
* DTLs.
*
* A vdev's DTL (dirty time log) is the set of transaction groups for which
* the vdev has less than perfect replication. There are three kinds of DTL:
*
* DTL_MISSING: txgs for which the vdev has no valid copies of the data
*
* DTL_PARTIAL: txgs for which data is available, but not fully replicated
*
* DTL_SCRUB: the txgs that could not be repaired by the last scrub; upon
* scrub completion, DTL_SCRUB replaces DTL_MISSING in the range of
* txgs that was scrubbed.
*
* DTL_OUTAGE: txgs which cannot currently be read, whether due to
* persistent errors or just some device being offline.
* Unlike the other three, the DTL_OUTAGE map is not generally
* maintained; it's only computed when needed, typically to
* determine whether a device can be detached.
*
* For leaf vdevs, DTL_MISSING and DTL_PARTIAL are identical: the device
* either has the data or it doesn't.
*
* For interior vdevs such as mirror and RAID-Z the picture is more complex.
* A vdev's DTL_PARTIAL is the union of its children's DTL_PARTIALs, because
* if any child is less than fully replicated, then so is its parent.
* A vdev's DTL_MISSING is a modified union of its children's DTL_MISSINGs,
* comprising only those txgs which appear in 'maxfaults' or more children;
* those are the txgs we don't have enough replication to read. For example,
* double-parity RAID-Z can tolerate up to two missing devices (maxfaults == 2);
* thus, its DTL_MISSING consists of the set of txgs that appear in more than
* two child DTL_MISSING maps.
*
* It should be clear from the above that to compute the DTLs and outage maps
* for all vdevs, it suffices to know just the leaf vdevs' DTL_MISSING maps.
* Therefore, that is all we keep on disk. When loading the pool, or after
* a configuration change, we generate all other DTLs from first principles.
*/
void
vdev_dtl_dirty(space_map_t *sm, uint64_t txg, uint64_t size)
vdev_dtl_dirty(vdev_t *vd, vdev_dtl_type_t t, uint64_t txg, uint64_t size)
{
space_map_t *sm = &vd->vdev_dtl[t];
ASSERT(t < DTL_TYPES);
ASSERT(vd != vd->vdev_spa->spa_root_vdev);
mutex_enter(sm->sm_lock);
if (!space_map_contains(sm, txg, size))
space_map_add(sm, txg, size);
mutex_exit(sm->sm_lock);
}
int
vdev_dtl_contains(space_map_t *sm, uint64_t txg, uint64_t size)
boolean_t
vdev_dtl_contains(vdev_t *vd, vdev_dtl_type_t t, uint64_t txg, uint64_t size)
{
int dirty;
space_map_t *sm = &vd->vdev_dtl[t];
boolean_t dirty = B_FALSE;
/*
* Quick test without the lock -- covers the common case that
* there are no dirty time segments.
*/
if (sm->sm_space == 0)
return (0);
ASSERT(t < DTL_TYPES);
ASSERT(vd != vd->vdev_spa->spa_root_vdev);
mutex_enter(sm->sm_lock);
dirty = space_map_contains(sm, txg, size);
if (sm->sm_space != 0)
dirty = space_map_contains(sm, txg, size);
mutex_exit(sm->sm_lock);
return (dirty);
}
boolean_t
vdev_dtl_empty(vdev_t *vd, vdev_dtl_type_t t)
{
space_map_t *sm = &vd->vdev_dtl[t];
boolean_t empty;
mutex_enter(sm->sm_lock);
empty = (sm->sm_space == 0);
mutex_exit(sm->sm_lock);
return (empty);
}
/*
* Reassess DTLs after a config change or scrub completion.
*/
@ -1329,11 +1449,19 @@ void
vdev_dtl_reassess(vdev_t *vd, uint64_t txg, uint64_t scrub_txg, int scrub_done)
{
spa_t *spa = vd->vdev_spa;
int c;
avl_tree_t reftree;
int minref;
ASSERT(spa_config_held(spa, SCL_CONFIG, RW_READER));
ASSERT(spa_config_held(spa, SCL_ALL, RW_READER) != 0);
if (vd->vdev_children == 0) {
for (int c = 0; c < vd->vdev_children; c++)
vdev_dtl_reassess(vd->vdev_child[c], txg,
scrub_txg, scrub_done);
if (vd == spa->spa_root_vdev)
return;
if (vd->vdev_ops->vdev_op_leaf) {
mutex_enter(&vd->vdev_dtl_lock);
if (scrub_txg != 0 &&
(spa->spa_scrub_started || spa->spa_scrub_errors == 0)) {
@ -1344,12 +1472,38 @@ vdev_dtl_reassess(vdev_t *vd, uint64_t txg, uint64_t scrub_txg, int scrub_done)
* will be valid, so excise the old region and
* fold in the scrub dtl. Otherwise, leave the
* dtl as-is if there was an error.
*
* There's little trick here: to excise the beginning
* of the DTL_MISSING map, we put it into a reference
* tree and then add a segment with refcnt -1 that
* covers the range [0, scrub_txg). This means
* that each txg in that range has refcnt -1 or 0.
* We then add DTL_SCRUB with a refcnt of 2, so that
* entries in the range [0, scrub_txg) will have a
* positive refcnt -- either 1 or 2. We then convert
* the reference tree into the new DTL_MISSING map.
*/
space_map_excise(&vd->vdev_dtl_map, 0, scrub_txg);
space_map_union(&vd->vdev_dtl_map, &vd->vdev_dtl_scrub);
space_map_ref_create(&reftree);
space_map_ref_add_map(&reftree,
&vd->vdev_dtl[DTL_MISSING], 1);
space_map_ref_add_seg(&reftree, 0, scrub_txg, -1);
space_map_ref_add_map(&reftree,
&vd->vdev_dtl[DTL_SCRUB], 2);
space_map_ref_generate_map(&reftree,
&vd->vdev_dtl[DTL_MISSING], 1);
space_map_ref_destroy(&reftree);
}
space_map_vacate(&vd->vdev_dtl[DTL_PARTIAL], NULL, NULL);
space_map_walk(&vd->vdev_dtl[DTL_MISSING],
space_map_add, &vd->vdev_dtl[DTL_PARTIAL]);
if (scrub_done)
space_map_vacate(&vd->vdev_dtl_scrub, NULL, NULL);
space_map_vacate(&vd->vdev_dtl[DTL_SCRUB], NULL, NULL);
space_map_vacate(&vd->vdev_dtl[DTL_OUTAGE], NULL, NULL);
if (!vdev_readable(vd))
space_map_add(&vd->vdev_dtl[DTL_OUTAGE], 0, -1ULL);
else
space_map_walk(&vd->vdev_dtl[DTL_MISSING],
space_map_add, &vd->vdev_dtl[DTL_OUTAGE]);
mutex_exit(&vd->vdev_dtl_lock);
if (txg != 0)
@ -1357,35 +1511,36 @@ vdev_dtl_reassess(vdev_t *vd, uint64_t txg, uint64_t scrub_txg, int scrub_done)
return;
}
/*
* Make sure the DTLs are always correct under the scrub lock.
*/
if (vd == spa->spa_root_vdev)
mutex_enter(&spa->spa_scrub_lock);
mutex_enter(&vd->vdev_dtl_lock);
space_map_vacate(&vd->vdev_dtl_map, NULL, NULL);
space_map_vacate(&vd->vdev_dtl_scrub, NULL, NULL);
mutex_exit(&vd->vdev_dtl_lock);
for (c = 0; c < vd->vdev_children; c++) {
vdev_t *cvd = vd->vdev_child[c];
vdev_dtl_reassess(cvd, txg, scrub_txg, scrub_done);
mutex_enter(&vd->vdev_dtl_lock);
space_map_union(&vd->vdev_dtl_map, &cvd->vdev_dtl_map);
space_map_union(&vd->vdev_dtl_scrub, &cvd->vdev_dtl_scrub);
mutex_exit(&vd->vdev_dtl_lock);
for (int t = 0; t < DTL_TYPES; t++) {
/* account for child's outage in parent's missing map */
int s = (t == DTL_MISSING) ? DTL_OUTAGE: t;
if (t == DTL_SCRUB)
continue; /* leaf vdevs only */
if (t == DTL_PARTIAL)
minref = 1; /* i.e. non-zero */
else if (vd->vdev_nparity != 0)
minref = vd->vdev_nparity + 1; /* RAID-Z */
else
minref = vd->vdev_children; /* any kind of mirror */
space_map_ref_create(&reftree);
for (int c = 0; c < vd->vdev_children; c++) {
vdev_t *cvd = vd->vdev_child[c];
mutex_enter(&cvd->vdev_dtl_lock);
space_map_ref_add_map(&reftree, &cvd->vdev_dtl[s], 1);
mutex_exit(&cvd->vdev_dtl_lock);
}
space_map_ref_generate_map(&reftree, &vd->vdev_dtl[t], minref);
space_map_ref_destroy(&reftree);
}
if (vd == spa->spa_root_vdev)
mutex_exit(&spa->spa_scrub_lock);
mutex_exit(&vd->vdev_dtl_lock);
}
static int
vdev_dtl_load(vdev_t *vd)
{
spa_t *spa = vd->vdev_spa;
space_map_obj_t *smo = &vd->vdev_dtl;
space_map_obj_t *smo = &vd->vdev_dtl_smo;
objset_t *mos = spa->spa_meta_objset;
dmu_buf_t *db;
int error;
@ -1403,7 +1558,8 @@ vdev_dtl_load(vdev_t *vd)
dmu_buf_rele(db, FTAG);
mutex_enter(&vd->vdev_dtl_lock);
error = space_map_load(&vd->vdev_dtl_map, NULL, SM_ALLOC, smo, mos);
error = space_map_load(&vd->vdev_dtl[DTL_MISSING],
NULL, SM_ALLOC, smo, mos);
mutex_exit(&vd->vdev_dtl_lock);
return (error);
@ -1413,8 +1569,8 @@ void
vdev_dtl_sync(vdev_t *vd, uint64_t txg)
{
spa_t *spa = vd->vdev_spa;
space_map_obj_t *smo = &vd->vdev_dtl;
space_map_t *sm = &vd->vdev_dtl_map;
space_map_obj_t *smo = &vd->vdev_dtl_smo;
space_map_t *sm = &vd->vdev_dtl[DTL_MISSING];
objset_t *mos = spa->spa_meta_objset;
space_map_t smsync;
kmutex_t smlock;
@ -1471,6 +1627,37 @@ vdev_dtl_sync(vdev_t *vd, uint64_t txg)
dmu_tx_commit(tx);
}
/*
* Determine whether the specified vdev can be offlined/detached/removed
* without losing data.
*/
boolean_t
vdev_dtl_required(vdev_t *vd)
{
spa_t *spa = vd->vdev_spa;
vdev_t *tvd = vd->vdev_top;
uint8_t cant_read = vd->vdev_cant_read;
boolean_t required;
ASSERT(spa_config_held(spa, SCL_STATE_ALL, RW_WRITER) == SCL_STATE_ALL);
if (vd == spa->spa_root_vdev || vd == tvd)
return (B_TRUE);
/*
* Temporarily mark the device as unreadable, and then determine
* whether this results in any DTL outages in the top-level vdev.
* If not, we can safely offline/detach/remove the device.
*/
vd->vdev_cant_read = B_TRUE;
vdev_dtl_reassess(tvd, 0, 0, B_FALSE);
required = !vdev_dtl_empty(tvd, DTL_OUTAGE);
vd->vdev_cant_read = cant_read;
vdev_dtl_reassess(tvd, 0, 0, B_FALSE);
return (required);
}
/*
* Determine if resilver is needed, and if so the txg range.
*/
@ -1483,19 +1670,19 @@ vdev_resilver_needed(vdev_t *vd, uint64_t *minp, uint64_t *maxp)
if (vd->vdev_children == 0) {
mutex_enter(&vd->vdev_dtl_lock);
if (vd->vdev_dtl_map.sm_space != 0 && vdev_writeable(vd)) {
if (vd->vdev_dtl[DTL_MISSING].sm_space != 0 &&
vdev_writeable(vd)) {
space_seg_t *ss;
ss = avl_first(&vd->vdev_dtl_map.sm_root);
ss = avl_first(&vd->vdev_dtl[DTL_MISSING].sm_root);
thismin = ss->ss_start - 1;
ss = avl_last(&vd->vdev_dtl_map.sm_root);
ss = avl_last(&vd->vdev_dtl[DTL_MISSING].sm_root);
thismax = ss->ss_end;
needed = B_TRUE;
}
mutex_exit(&vd->vdev_dtl_lock);
} else {
int c;
for (c = 0; c < vd->vdev_children; c++) {
for (int c = 0; c < vd->vdev_children; c++) {
vdev_t *cvd = vd->vdev_child[c];
uint64_t cmin, cmax;
@ -1517,12 +1704,10 @@ vdev_resilver_needed(vdev_t *vd, uint64_t *minp, uint64_t *maxp)
void
vdev_load(vdev_t *vd)
{
int c;
/*
* Recursively load all children.
*/
for (c = 0; c < vd->vdev_children; c++)
for (int c = 0; c < vd->vdev_children; c++)
vdev_load(vd->vdev_child[c]);
/*
@ -1742,11 +1927,7 @@ vdev_online(spa_t *spa, uint64_t guid, uint64_t flags, vdev_state_t *newstate)
vd->vdev_parent->vdev_child[0] == vd)
vd->vdev_unspare = B_TRUE;
(void) spa_vdev_state_exit(spa, vd, 0);
VERIFY3U(spa_scrub(spa, POOL_SCRUB_RESILVER), ==, 0);
return (0);
return (spa_vdev_state_exit(spa, vd, 0));
}
int
@ -1767,13 +1948,10 @@ vdev_offline(spa_t *spa, uint64_t guid, uint64_t flags)
*/
if (!vd->vdev_offline) {
/*
* If this device's top-level vdev has a non-empty DTL,
* don't allow the device to be offlined.
*
* XXX -- make this more precise by allowing the offline
* as long as the remaining devices don't have any DTL holes.
* If this device has the only valid copy of some data,
* don't allow it to be offlined.
*/
if (vd->vdev_top->vdev_dtl_map.sm_space != 0)
if (vd->vdev_aux == NULL && vdev_dtl_required(vd))
return (spa_vdev_state_exit(spa, NULL, EBUSY));
/*
@ -1783,7 +1961,7 @@ vdev_offline(spa_t *spa, uint64_t guid, uint64_t flags)
*/
vd->vdev_offline = B_TRUE;
vdev_reopen(vd->vdev_top);
if (vdev_is_dead(vd->vdev_top) && vd->vdev_aux == NULL) {
if (vd->vdev_aux == NULL && vdev_is_dead(vd->vdev_top)) {
vd->vdev_offline = B_FALSE;
vdev_reopen(vd->vdev_top);
return (spa_vdev_state_exit(spa, NULL, EBUSY));
@ -1863,13 +2041,17 @@ vdev_writeable(vdev_t *vd)
boolean_t
vdev_allocatable(vdev_t *vd)
{
uint64_t state = vd->vdev_state;
/*
* We currently allow allocations from vdevs which maybe in the
* We currently allow allocations from vdevs which may be in the
* process of reopening (i.e. VDEV_STATE_CLOSED). If the device
* fails to reopen then we'll catch it later when we're holding
* the proper locks.
* the proper locks. Note that we have to get the vdev state
* in a local variable because although it changes atomically,
* we're asking two separate questions about it.
*/
return (!(vdev_is_dead(vd) && vd->vdev_state != VDEV_STATE_CLOSED) &&
return (!(state < VDEV_STATE_DEGRADED && state != VDEV_STATE_CLOSED) &&
!vd->vdev_cant_write);
}
@ -1939,7 +2121,8 @@ vdev_clear_stats(vdev_t *vd)
void
vdev_stat_update(zio_t *zio, uint64_t psize)
{
vdev_t *rvd = zio->io_spa->spa_root_vdev;
spa_t *spa = zio->io_spa;
vdev_t *rvd = spa->spa_root_vdev;
vdev_t *vd = zio->io_vd ? zio->io_vd : rvd;
vdev_t *pvd;
uint64_t txg = zio->io_txg;
@ -1972,21 +2155,23 @@ vdev_stat_update(zio_t *zio, uint64_t psize)
return;
ASSERT(vd == zio->io_vd);
if (!(flags & ZIO_FLAG_IO_BYPASS)) {
mutex_enter(&vd->vdev_stat_lock);
vs->vs_ops[type]++;
vs->vs_bytes[type] += psize;
mutex_exit(&vd->vdev_stat_lock);
}
if (flags & ZIO_FLAG_IO_BYPASS)
return;
mutex_enter(&vd->vdev_stat_lock);
if (flags & ZIO_FLAG_IO_REPAIR) {
ASSERT(zio->io_delegate_list == NULL);
mutex_enter(&vd->vdev_stat_lock);
if (flags & ZIO_FLAG_SCRUB_THREAD)
vs->vs_scrub_repaired += psize;
else
if (flags & ZIO_FLAG_SELF_HEAL)
vs->vs_self_healed += psize;
mutex_exit(&vd->vdev_stat_lock);
}
vs->vs_ops[type]++;
vs->vs_bytes[type] += psize;
mutex_exit(&vd->vdev_stat_lock);
return;
}
@ -1994,29 +2179,49 @@ vdev_stat_update(zio_t *zio, uint64_t psize)
return;
mutex_enter(&vd->vdev_stat_lock);
if (type == ZIO_TYPE_READ) {
if (type == ZIO_TYPE_READ && !vdev_is_dead(vd)) {
if (zio->io_error == ECKSUM)
vs->vs_checksum_errors++;
else
vs->vs_read_errors++;
}
if (type == ZIO_TYPE_WRITE)
if (type == ZIO_TYPE_WRITE && !vdev_is_dead(vd))
vs->vs_write_errors++;
mutex_exit(&vd->vdev_stat_lock);
if (type == ZIO_TYPE_WRITE && txg != 0 && vd->vdev_children == 0) {
if (flags & ZIO_FLAG_SCRUB_THREAD) {
ASSERT(flags & ZIO_FLAG_IO_REPAIR);
for (pvd = vd; pvd != NULL; pvd = pvd->vdev_parent)
vdev_dtl_dirty(&pvd->vdev_dtl_scrub, txg, 1);
}
if (!(flags & ZIO_FLAG_IO_REPAIR)) {
if (vdev_dtl_contains(&vd->vdev_dtl_map, txg, 1))
if (type == ZIO_TYPE_WRITE && txg != 0 &&
(!(flags & ZIO_FLAG_IO_REPAIR) ||
(flags & ZIO_FLAG_SCRUB_THREAD))) {
/*
* This is either a normal write (not a repair), or it's a
* repair induced by the scrub thread. In the normal case,
* we commit the DTL change in the same txg as the block
* was born. In the scrub-induced repair case, we know that
* scrubs run in first-pass syncing context, so we commit
* the DTL change in spa->spa_syncing_txg.
*
* We currently do not make DTL entries for failed spontaneous
* self-healing writes triggered by normal (non-scrubbing)
* reads, because we have no transactional context in which to
* do so -- and it's not clear that it'd be desirable anyway.
*/
if (vd->vdev_ops->vdev_op_leaf) {
uint64_t commit_txg = txg;
if (flags & ZIO_FLAG_SCRUB_THREAD) {
ASSERT(flags & ZIO_FLAG_IO_REPAIR);
ASSERT(spa_sync_pass(spa) == 1);
vdev_dtl_dirty(vd, DTL_SCRUB, txg, 1);
commit_txg = spa->spa_syncing_txg;
}
ASSERT(commit_txg >= spa->spa_syncing_txg);
if (vdev_dtl_contains(vd, DTL_MISSING, txg, 1))
return;
vdev_dirty(vd->vdev_top, VDD_DTL, vd, txg);
for (pvd = vd; pvd != NULL; pvd = pvd->vdev_parent)
vdev_dtl_dirty(&pvd->vdev_dtl_map, txg, 1);
for (pvd = vd; pvd != rvd; pvd = pvd->vdev_parent)
vdev_dtl_dirty(pvd, DTL_PARTIAL, txg, 1);
vdev_dirty(vd->vdev_top, VDD_DTL, vd, commit_txg);
}
if (vd != rvd)
vdev_dtl_dirty(vd, DTL_MISSING, txg, 1);
}
}
@ -2111,8 +2316,8 @@ vdev_config_dirty(vdev_t *vd)
int c;
/*
* If this is an aux vdev (as with l2cache devices), then we update the
* vdev config manually and set the sync flag.
* If this is an aux vdev (as with l2cache and spare devices), then we
* update the vdev config manually and set the sync flag.
*/
if (vd->vdev_aux != NULL) {
spa_aux_vdev_t *sav = vd->vdev_aux;
@ -2134,8 +2339,11 @@ vdev_config_dirty(vdev_t *vd)
sav->sav_sync = B_TRUE;
VERIFY(nvlist_lookup_nvlist_array(sav->sav_config,
ZPOOL_CONFIG_L2CACHE, &aux, &naux) == 0);
if (nvlist_lookup_nvlist_array(sav->sav_config,
ZPOOL_CONFIG_L2CACHE, &aux, &naux) != 0) {
VERIFY(nvlist_lookup_nvlist_array(sav->sav_config,
ZPOOL_CONFIG_SPARES, &aux, &naux) == 0);
}
ASSERT(c < naux);
@ -2229,7 +2437,8 @@ vdev_state_clean(vdev_t *vd)
void
vdev_propagate_state(vdev_t *vd)
{
vdev_t *rvd = vd->vdev_spa->spa_root_vdev;
spa_t *spa = vd->vdev_spa;
vdev_t *rvd = spa->spa_root_vdev;
int degraded = 0, faulted = 0;
int corrupted = 0;
int c;
@ -2240,7 +2449,7 @@ vdev_propagate_state(vdev_t *vd)
child = vd->vdev_child[c];
if (!vdev_readable(child) ||
(!vdev_writeable(child) && (spa_mode & FWRITE))) {
(!vdev_writeable(child) && spa_writeable(spa))) {
/*
* Root special: if there is a top-level log
* device, treat the root vdev as if it were
@ -2340,7 +2549,6 @@ vdev_set_state(vdev_t *vd, boolean_t isopen, vdev_state_t state, vdev_aux_t aux)
* an error.
*/
if (spa->spa_load_state == SPA_LOAD_IMPORT &&
!spa->spa_import_faulted &&
vd->vdev_ops->vdev_op_leaf)
vd->vdev_not_present = 1;
@ -2399,8 +2607,8 @@ vdev_set_state(vdev_t *vd, boolean_t isopen, vdev_state_t state, vdev_aux_t aux)
vd->vdev_removed = B_FALSE;
}
if (!isopen)
vdev_propagate_state(vd);
if (!isopen && vd->vdev_parent)
vdev_propagate_state(vd->vdev_parent);
}
/*

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -215,23 +215,23 @@ vdev_cache_hit(vdev_cache_t *vc, vdev_cache_entry_t *ve, zio_t *zio)
* Fill a previously allocated cache entry with data.
*/
static void
vdev_cache_fill(zio_t *zio)
vdev_cache_fill(zio_t *fio)
{
vdev_t *vd = zio->io_vd;
vdev_t *vd = fio->io_vd;
vdev_cache_t *vc = &vd->vdev_cache;
vdev_cache_entry_t *ve = zio->io_private;
zio_t *dio;
vdev_cache_entry_t *ve = fio->io_private;
zio_t *pio;
ASSERT(zio->io_size == VCBS);
ASSERT(fio->io_size == VCBS);
/*
* Add data to the cache.
*/
mutex_enter(&vc->vc_lock);
ASSERT(ve->ve_fill_io == zio);
ASSERT(ve->ve_offset == zio->io_offset);
ASSERT(ve->ve_data == zio->io_data);
ASSERT(ve->ve_fill_io == fio);
ASSERT(ve->ve_offset == fio->io_offset);
ASSERT(ve->ve_data == fio->io_data);
ve->ve_fill_io = NULL;
@ -240,20 +240,13 @@ vdev_cache_fill(zio_t *zio)
* any reads that were queued up before the missed update are still
* valid, so we can satisfy them from this line before we evict it.
*/
for (dio = zio->io_delegate_list; dio; dio = dio->io_delegate_next)
vdev_cache_hit(vc, ve, dio);
while ((pio = zio_walk_parents(fio)) != NULL)
vdev_cache_hit(vc, ve, pio);
if (zio->io_error || ve->ve_missed_update)
if (fio->io_error || ve->ve_missed_update)
vdev_cache_evict(vc, ve);
mutex_exit(&vc->vc_lock);
while ((dio = zio->io_delegate_list) != NULL) {
zio->io_delegate_list = dio->io_delegate_next;
dio->io_delegate_next = NULL;
dio->io_error = zio->io_error;
zio_execute(dio);
}
}
/*
@ -296,9 +289,8 @@ vdev_cache_read(zio_t *zio)
}
if ((fio = ve->ve_fill_io) != NULL) {
zio->io_delegate_next = fio->io_delegate_list;
fio->io_delegate_list = zio;
zio_vdev_io_bypass(zio);
zio_add_child(zio, fio);
mutex_exit(&vc->vc_lock);
VDCSTAT_BUMP(vdc_stat_delegations);
return (0);
@ -308,7 +300,6 @@ vdev_cache_read(zio_t *zio)
zio_vdev_io_bypass(zio);
mutex_exit(&vc->vc_lock);
zio_execute(zio);
VDCSTAT_BUMP(vdc_stat_hits);
return (0);
}
@ -325,8 +316,8 @@ vdev_cache_read(zio_t *zio)
ZIO_FLAG_DONT_CACHE, vdev_cache_fill, ve);
ve->ve_fill_io = fio;
fio->io_delegate_list = zio;
zio_vdev_io_bypass(zio);
zio_add_child(zio, fio);
mutex_exit(&vc->vc_lock);
zio_nowait(fio);

View File

@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@ -47,6 +47,7 @@ typedef struct vdev_disk_buf {
static int
vdev_disk_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)
{
spa_t *spa = vd->vdev_spa;
vdev_disk_t *dvd;
struct dk_minfo dkm;
int error;
@ -95,7 +96,7 @@ vdev_disk_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)
error = EINVAL; /* presume failure */
if (vd->vdev_path != NULL && !spa_is_root(vd->vdev_spa)) {
if (vd->vdev_path != NULL && !spa_is_root(spa)) {
ddi_devid_t devid;
if (vd->vdev_wholedisk == -1ULL) {
@ -105,18 +106,18 @@ vdev_disk_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)
(void) snprintf(buf, len, "%ss0", vd->vdev_path);
if (ldi_open_by_name(buf, spa_mode, kcred,
if (ldi_open_by_name(buf, spa_mode(spa), kcred,
&lh, zfs_li) == 0) {
spa_strfree(vd->vdev_path);
vd->vdev_path = buf;
vd->vdev_wholedisk = 1ULL;
(void) ldi_close(lh, spa_mode, kcred);
(void) ldi_close(lh, spa_mode(spa), kcred);
} else {
kmem_free(buf, len);
}
}
error = ldi_open_by_name(vd->vdev_path, spa_mode, kcred,
error = ldi_open_by_name(vd->vdev_path, spa_mode(spa), kcred,
&dvd->vd_lh, zfs_li);
/*
@ -126,7 +127,8 @@ vdev_disk_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)
ldi_get_devid(dvd->vd_lh, &devid) == 0) {
if (ddi_devid_compare(devid, dvd->vd_devid) != 0) {
error = EINVAL;
(void) ldi_close(dvd->vd_lh, spa_mode, kcred);
(void) ldi_close(dvd->vd_lh, spa_mode(spa),
kcred);
dvd->vd_lh = NULL;
}
ddi_devid_free(devid);
@ -146,7 +148,7 @@ vdev_disk_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)
*/
if (error != 0 && vd->vdev_devid != NULL)
error = ldi_open_by_devid(dvd->vd_devid, dvd->vd_minor,
spa_mode, kcred, &dvd->vd_lh, zfs_li);
spa_mode(spa), kcred, &dvd->vd_lh, zfs_li);
/*
* If all else fails, then try opening by physical path (if available)
@ -156,8 +158,8 @@ vdev_disk_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)
*/
if (error) {
if (vd->vdev_physpath != NULL &&
(dev = ddi_pathname_to_dev_t(vd->vdev_physpath)) != ENODEV)
error = ldi_open_by_dev(&dev, OTYP_BLK, spa_mode,
(dev = ddi_pathname_to_dev_t(vd->vdev_physpath)) != NODEV)
error = ldi_open_by_dev(&dev, OTYP_BLK, spa_mode(spa),
kcred, &dvd->vd_lh, zfs_li);
/*
@ -165,10 +167,9 @@ vdev_disk_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)
* as above. This hasn't been used in a very long time and we
* don't need to propagate its oddities to this edge condition.
*/
if (error && vd->vdev_path != NULL &&
!spa_is_root(vd->vdev_spa))
error = ldi_open_by_name(vd->vdev_path, spa_mode, kcred,
&dvd->vd_lh, zfs_li);
if (error && vd->vdev_path != NULL && !spa_is_root(spa))
error = ldi_open_by_name(vd->vdev_path, spa_mode(spa),
kcred, &dvd->vd_lh, zfs_li);
}
if (error) {
@ -253,7 +254,7 @@ vdev_disk_close(vdev_t *vd)
ddi_devid_free(dvd->vd_devid);
if (dvd->vd_lh != NULL)
(void) ldi_close(dvd->vd_lh, spa_mode, kcred);
(void) ldi_close(dvd->vd_lh, spa_mode(vd->vdev_spa), kcred);
kmem_free(dvd, sizeof (vdev_disk_t));
vd->vdev_tsd = NULL;
@ -469,7 +470,7 @@ vdev_disk_read_rootlabel(char *devpath, char *devid, nvlist_t **config)
if (devid != NULL && ddi_devid_str_decode(devid, &tmpdevid,
&minor_name) == 0) {
error = ldi_open_by_devid(tmpdevid, minor_name,
spa_mode, kcred, &vd_lh, zfs_li);
FREAD, kcred, &vd_lh, zfs_li);
ddi_devid_free(tmpdevid);
ddi_devid_str_free(minor_name);
}
@ -492,8 +493,7 @@ vdev_disk_read_rootlabel(char *devpath, char *devid, nvlist_t **config)
/* read vdev label */
offset = vdev_label_offset(size, l, 0);
if (vdev_disk_physio(vd_lh, (caddr_t)label,
VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE +
VDEV_PHYS_SIZE, offset, B_READ) != 0)
VDEV_SKIP_SIZE + VDEV_PHYS_SIZE, offset, B_READ) != 0)
continue;
if (nvlist_unpack(label->vl_vdev_phys.vp_nvlist,

View File

@ -61,7 +61,7 @@ vdev_file_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)
*/
ASSERT(vd->vdev_path != NULL && vd->vdev_path[0] == '/');
error = vn_openat(vd->vdev_path + 1, UIO_SYSSPACE,
spa_mode | FOFFMAX, 0, &vp, 0, 0, rootdir, -1);
spa_mode(vd->vdev_spa) | FOFFMAX, 0, &vp, 0, 0, rootdir, -1);
if (error) {
vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED;
@ -75,7 +75,7 @@ vdev_file_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)
* Make sure it's a regular file.
*/
if (vp->v_type != VREG) {
(void) VOP_CLOSE(vp, spa_mode, 1, 0, kcred, NULL);
(void) VOP_CLOSE(vp, spa_mode(vd->vdev_spa), 1, 0, kcred, NULL);
vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED;
return (ENODEV);
}
@ -90,7 +90,7 @@ vdev_file_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)
VOP_UNLOCK(vp, 0);
VFS_UNLOCK_GIANT(vfslocked);
if (error) {
(void) VOP_CLOSE(vp, spa_mode, 1, 0, kcred, NULL);
(void) VOP_CLOSE(vp, spa_mode(vd->vdev_spa), 1, 0, kcred, NULL);
vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED;
return (error);
}
@ -110,7 +110,8 @@ vdev_file_close(vdev_t *vd)
return;
if (vf->vf_vnode != NULL)
(void) VOP_CLOSE(vf->vf_vnode, spa_mode, 1, 0, kcred, NULL);
(void) VOP_CLOSE(vf->vf_vnode, spa_mode(vd->vdev_spa), 1, 0,
kcred, NULL);
kmem_free(vf, sizeof (vdev_file_t));
vd->vdev_tsd = NULL;
}

Some files were not shown because too many files have changed in this diff Show More