Update vendor-sys/opensolaris to last OpenSolaris state (13149:b23a4dab3d50)
Add ZFS bits to vendor-sys/opensolaris Obtained from: https://hg.openindiana.org/upstream/oracle/onnv-gate
This commit is contained in:
parent
e27f30edd4
commit
39f5422299
384
OPENSOLARIS.LICENSE
Normal file
384
OPENSOLARIS.LICENSE
Normal file
@ -0,0 +1,384 @@
|
||||
Unless otherwise noted, all files in this distribution are released
|
||||
under the Common Development and Distribution License (CDDL).
|
||||
Exceptions are noted within the associated source files.
|
||||
|
||||
--------------------------------------------------------------------
|
||||
|
||||
|
||||
COMMON DEVELOPMENT AND DISTRIBUTION LICENSE Version 1.0
|
||||
|
||||
1. Definitions.
|
||||
|
||||
1.1. "Contributor" means each individual or entity that creates
|
||||
or contributes to the creation of Modifications.
|
||||
|
||||
1.2. "Contributor Version" means the combination of the Original
|
||||
Software, prior Modifications used by a Contributor (if any),
|
||||
and the Modifications made by that particular Contributor.
|
||||
|
||||
1.3. "Covered Software" means (a) the Original Software, or (b)
|
||||
Modifications, or (c) the combination of files containing
|
||||
Original Software with files containing Modifications, in
|
||||
each case including portions thereof.
|
||||
|
||||
1.4. "Executable" means the Covered Software in any form other
|
||||
than Source Code.
|
||||
|
||||
1.5. "Initial Developer" means the individual or entity that first
|
||||
makes Original Software available under this License.
|
||||
|
||||
1.6. "Larger Work" means a work which combines Covered Software or
|
||||
portions thereof with code not governed by the terms of this
|
||||
License.
|
||||
|
||||
1.7. "License" means this document.
|
||||
|
||||
1.8. "Licensable" means having the right to grant, to the maximum
|
||||
extent possible, whether at the time of the initial grant or
|
||||
subsequently acquired, any and all of the rights conveyed
|
||||
herein.
|
||||
|
||||
1.9. "Modifications" means the Source Code and Executable form of
|
||||
any of the following:
|
||||
|
||||
A. Any file that results from an addition to, deletion from or
|
||||
modification of the contents of a file containing Original
|
||||
Software or previous Modifications;
|
||||
|
||||
B. Any new file that contains any part of the Original
|
||||
Software or previous Modifications; or
|
||||
|
||||
C. Any new file that is contributed or otherwise made
|
||||
available under the terms of this License.
|
||||
|
||||
1.10. "Original Software" means the Source Code and Executable
|
||||
form of computer software code that is originally released
|
||||
under this License.
|
||||
|
||||
1.11. "Patent Claims" means any patent claim(s), now owned or
|
||||
hereafter acquired, including without limitation, method,
|
||||
process, and apparatus claims, in any patent Licensable by
|
||||
grantor.
|
||||
|
||||
1.12. "Source Code" means (a) the common form of computer software
|
||||
code in which modifications are made and (b) associated
|
||||
documentation included in or with such code.
|
||||
|
||||
1.13. "You" (or "Your") means an individual or a legal entity
|
||||
exercising rights under, and complying with all of the terms
|
||||
of, this License. For legal entities, "You" includes any
|
||||
entity which controls, is controlled by, or is under common
|
||||
control with You. For purposes of this definition,
|
||||
"control" means (a) the power, direct or indirect, to cause
|
||||
the direction or management of such entity, whether by
|
||||
contract or otherwise, or (b) ownership of more than fifty
|
||||
percent (50%) of the outstanding shares or beneficial
|
||||
ownership of such entity.
|
||||
|
||||
2. License Grants.
|
||||
|
||||
2.1. The Initial Developer Grant.
|
||||
|
||||
Conditioned upon Your compliance with Section 3.1 below and
|
||||
subject to third party intellectual property claims, the Initial
|
||||
Developer hereby grants You a world-wide, royalty-free,
|
||||
non-exclusive license:
|
||||
|
||||
(a) under intellectual property rights (other than patent or
|
||||
trademark) Licensable by Initial Developer, to use,
|
||||
reproduce, modify, display, perform, sublicense and
|
||||
distribute the Original Software (or portions thereof),
|
||||
with or without Modifications, and/or as part of a Larger
|
||||
Work; and
|
||||
|
||||
(b) under Patent Claims infringed by the making, using or
|
||||
selling of Original Software, to make, have made, use,
|
||||
practice, sell, and offer for sale, and/or otherwise
|
||||
dispose of the Original Software (or portions thereof).
|
||||
|
||||
(c) The licenses granted in Sections 2.1(a) and (b) are
|
||||
effective on the date Initial Developer first distributes
|
||||
or otherwise makes the Original Software available to a
|
||||
third party under the terms of this License.
|
||||
|
||||
(d) Notwithstanding Section 2.1(b) above, no patent license is
|
||||
granted: (1) for code that You delete from the Original
|
||||
Software, or (2) for infringements caused by: (i) the
|
||||
modification of the Original Software, or (ii) the
|
||||
combination of the Original Software with other software
|
||||
or devices.
|
||||
|
||||
2.2. Contributor Grant.
|
||||
|
||||
Conditioned upon Your compliance with Section 3.1 below and
|
||||
subject to third party intellectual property claims, each
|
||||
Contributor hereby grants You a world-wide, royalty-free,
|
||||
non-exclusive license:
|
||||
|
||||
(a) under intellectual property rights (other than patent or
|
||||
trademark) Licensable by Contributor to use, reproduce,
|
||||
modify, display, perform, sublicense and distribute the
|
||||
Modifications created by such Contributor (or portions
|
||||
thereof), either on an unmodified basis, with other
|
||||
Modifications, as Covered Software and/or as part of a
|
||||
Larger Work; and
|
||||
|
||||
(b) under Patent Claims infringed by the making, using, or
|
||||
selling of Modifications made by that Contributor either
|
||||
alone and/or in combination with its Contributor Version
|
||||
(or portions of such combination), to make, use, sell,
|
||||
offer for sale, have made, and/or otherwise dispose of:
|
||||
(1) Modifications made by that Contributor (or portions
|
||||
thereof); and (2) the combination of Modifications made by
|
||||
that Contributor with its Contributor Version (or portions
|
||||
of such combination).
|
||||
|
||||
(c) The licenses granted in Sections 2.2(a) and 2.2(b) are
|
||||
effective on the date Contributor first distributes or
|
||||
otherwise makes the Modifications available to a third
|
||||
party.
|
||||
|
||||
(d) Notwithstanding Section 2.2(b) above, no patent license is
|
||||
granted: (1) for any code that Contributor has deleted
|
||||
from the Contributor Version; (2) for infringements caused
|
||||
by: (i) third party modifications of Contributor Version,
|
||||
or (ii) the combination of Modifications made by that
|
||||
Contributor with other software (except as part of the
|
||||
Contributor Version) or other devices; or (3) under Patent
|
||||
Claims infringed by Covered Software in the absence of
|
||||
Modifications made by that Contributor.
|
||||
|
||||
3. Distribution Obligations.
|
||||
|
||||
3.1. Availability of Source Code.
|
||||
|
||||
Any Covered Software that You distribute or otherwise make
|
||||
available in Executable form must also be made available in Source
|
||||
Code form and that Source Code form must be distributed only under
|
||||
the terms of this License. You must include a copy of this
|
||||
License with every copy of the Source Code form of the Covered
|
||||
Software You distribute or otherwise make available. You must
|
||||
inform recipients of any such Covered Software in Executable form
|
||||
as to how they can obtain such Covered Software in Source Code
|
||||
form in a reasonable manner on or through a medium customarily
|
||||
used for software exchange.
|
||||
|
||||
3.2. Modifications.
|
||||
|
||||
The Modifications that You create or to which You contribute are
|
||||
governed by the terms of this License. You represent that You
|
||||
believe Your Modifications are Your original creation(s) and/or
|
||||
You have sufficient rights to grant the rights conveyed by this
|
||||
License.
|
||||
|
||||
3.3. Required Notices.
|
||||
|
||||
You must include a notice in each of Your Modifications that
|
||||
identifies You as the Contributor of the Modification. You may
|
||||
not remove or alter any copyright, patent or trademark notices
|
||||
contained within the Covered Software, or any notices of licensing
|
||||
or any descriptive text giving attribution to any Contributor or
|
||||
the Initial Developer.
|
||||
|
||||
3.4. Application of Additional Terms.
|
||||
|
||||
You may not offer or impose any terms on any Covered Software in
|
||||
Source Code form that alters or restricts the applicable version
|
||||
of this License or the recipients' rights hereunder. You may
|
||||
choose to offer, and to charge a fee for, warranty, support,
|
||||
indemnity or liability obligations to one or more recipients of
|
||||
Covered Software. However, you may do so only on Your own behalf,
|
||||
and not on behalf of the Initial Developer or any Contributor.
|
||||
You must make it absolutely clear that any such warranty, support,
|
||||
indemnity or liability obligation is offered by You alone, and You
|
||||
hereby agree to indemnify the Initial Developer and every
|
||||
Contributor for any liability incurred by the Initial Developer or
|
||||
such Contributor as a result of warranty, support, indemnity or
|
||||
liability terms You offer.
|
||||
|
||||
3.5. Distribution of Executable Versions.
|
||||
|
||||
You may distribute the Executable form of the Covered Software
|
||||
under the terms of this License or under the terms of a license of
|
||||
Your choice, which may contain terms different from this License,
|
||||
provided that You are in compliance with the terms of this License
|
||||
and that the license for the Executable form does not attempt to
|
||||
limit or alter the recipient's rights in the Source Code form from
|
||||
the rights set forth in this License. If You distribute the
|
||||
Covered Software in Executable form under a different license, You
|
||||
must make it absolutely clear that any terms which differ from
|
||||
this License are offered by You alone, not by the Initial
|
||||
Developer or Contributor. You hereby agree to indemnify the
|
||||
Initial Developer and every Contributor for any liability incurred
|
||||
by the Initial Developer or such Contributor as a result of any
|
||||
such terms You offer.
|
||||
|
||||
3.6. Larger Works.
|
||||
|
||||
You may create a Larger Work by combining Covered Software with
|
||||
other code not governed by the terms of this License and
|
||||
distribute the Larger Work as a single product. In such a case,
|
||||
You must make sure the requirements of this License are fulfilled
|
||||
for the Covered Software.
|
||||
|
||||
4. Versions of the License.
|
||||
|
||||
4.1. New Versions.
|
||||
|
||||
Sun Microsystems, Inc. is the initial license steward and may
|
||||
publish revised and/or new versions of this License from time to
|
||||
time. Each version will be given a distinguishing version number.
|
||||
Except as provided in Section 4.3, no one other than the license
|
||||
steward has the right to modify this License.
|
||||
|
||||
4.2. Effect of New Versions.
|
||||
|
||||
You may always continue to use, distribute or otherwise make the
|
||||
Covered Software available under the terms of the version of the
|
||||
License under which You originally received the Covered Software.
|
||||
If the Initial Developer includes a notice in the Original
|
||||
Software prohibiting it from being distributed or otherwise made
|
||||
available under any subsequent version of the License, You must
|
||||
distribute and make the Covered Software available under the terms
|
||||
of the version of the License under which You originally received
|
||||
the Covered Software. Otherwise, You may also choose to use,
|
||||
distribute or otherwise make the Covered Software available under
|
||||
the terms of any subsequent version of the License published by
|
||||
the license steward.
|
||||
|
||||
4.3. Modified Versions.
|
||||
|
||||
When You are an Initial Developer and You want to create a new
|
||||
license for Your Original Software, You may create and use a
|
||||
modified version of this License if You: (a) rename the license
|
||||
and remove any references to the name of the license steward
|
||||
(except to note that the license differs from this License); and
|
||||
(b) otherwise make it clear that the license contains terms which
|
||||
differ from this License.
|
||||
|
||||
5. DISCLAIMER OF WARRANTY.
|
||||
|
||||
COVERED SOFTWARE IS PROVIDED UNDER THIS LICENSE ON AN "AS IS"
|
||||
BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
|
||||
INCLUDING, WITHOUT LIMITATION, WARRANTIES THAT THE COVERED
|
||||
SOFTWARE IS FREE OF DEFECTS, MERCHANTABLE, FIT FOR A PARTICULAR
|
||||
PURPOSE OR NON-INFRINGING. THE ENTIRE RISK AS TO THE QUALITY AND
|
||||
PERFORMANCE OF THE COVERED SOFTWARE IS WITH YOU. SHOULD ANY
|
||||
COVERED SOFTWARE PROVE DEFECTIVE IN ANY RESPECT, YOU (NOT THE
|
||||
INITIAL DEVELOPER OR ANY OTHER CONTRIBUTOR) ASSUME THE COST OF ANY
|
||||
NECESSARY SERVICING, REPAIR OR CORRECTION. THIS DISCLAIMER OF
|
||||
WARRANTY CONSTITUTES AN ESSENTIAL PART OF THIS LICENSE. NO USE OF
|
||||
ANY COVERED SOFTWARE IS AUTHORIZED HEREUNDER EXCEPT UNDER THIS
|
||||
DISCLAIMER.
|
||||
|
||||
6. TERMINATION.
|
||||
|
||||
6.1. This License and the rights granted hereunder will terminate
|
||||
automatically if You fail to comply with terms herein and fail to
|
||||
cure such breach within 30 days of becoming aware of the breach.
|
||||
Provisions which, by their nature, must remain in effect beyond
|
||||
the termination of this License shall survive.
|
||||
|
||||
6.2. If You assert a patent infringement claim (excluding
|
||||
declaratory judgment actions) against Initial Developer or a
|
||||
Contributor (the Initial Developer or Contributor against whom You
|
||||
assert such claim is referred to as "Participant") alleging that
|
||||
the Participant Software (meaning the Contributor Version where
|
||||
the Participant is a Contributor or the Original Software where
|
||||
the Participant is the Initial Developer) directly or indirectly
|
||||
infringes any patent, then any and all rights granted directly or
|
||||
indirectly to You by such Participant, the Initial Developer (if
|
||||
the Initial Developer is not the Participant) and all Contributors
|
||||
under Sections 2.1 and/or 2.2 of this License shall, upon 60 days
|
||||
notice from Participant terminate prospectively and automatically
|
||||
at the expiration of such 60 day notice period, unless if within
|
||||
such 60 day period You withdraw Your claim with respect to the
|
||||
Participant Software against such Participant either unilaterally
|
||||
or pursuant to a written agreement with Participant.
|
||||
|
||||
6.3. In the event of termination under Sections 6.1 or 6.2 above,
|
||||
all end user licenses that have been validly granted by You or any
|
||||
distributor hereunder prior to termination (excluding licenses
|
||||
granted to You by any distributor) shall survive termination.
|
||||
|
||||
7. LIMITATION OF LIABILITY.
|
||||
|
||||
UNDER NO CIRCUMSTANCES AND UNDER NO LEGAL THEORY, WHETHER TORT
|
||||
(INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE, SHALL YOU, THE
|
||||
INITIAL DEVELOPER, ANY OTHER CONTRIBUTOR, OR ANY DISTRIBUTOR OF
|
||||
COVERED SOFTWARE, OR ANY SUPPLIER OF ANY OF SUCH PARTIES, BE
|
||||
LIABLE TO ANY PERSON FOR ANY INDIRECT, SPECIAL, INCIDENTAL, OR
|
||||
CONSEQUENTIAL DAMAGES OF ANY CHARACTER INCLUDING, WITHOUT
|
||||
LIMITATION, DAMAGES FOR LOST PROFITS, LOSS OF GOODWILL, WORK
|
||||
STOPPAGE, COMPUTER FAILURE OR MALFUNCTION, OR ANY AND ALL OTHER
|
||||
COMMERCIAL DAMAGES OR LOSSES, EVEN IF SUCH PARTY SHALL HAVE BEEN
|
||||
INFORMED OF THE POSSIBILITY OF SUCH DAMAGES. THIS LIMITATION OF
|
||||
LIABILITY SHALL NOT APPLY TO LIABILITY FOR DEATH OR PERSONAL
|
||||
INJURY RESULTING FROM SUCH PARTY'S NEGLIGENCE TO THE EXTENT
|
||||
APPLICABLE LAW PROHIBITS SUCH LIMITATION. SOME JURISDICTIONS DO
|
||||
NOT ALLOW THE EXCLUSION OR LIMITATION OF INCIDENTAL OR
|
||||
CONSEQUENTIAL DAMAGES, SO THIS EXCLUSION AND LIMITATION MAY NOT
|
||||
APPLY TO YOU.
|
||||
|
||||
8. U.S. GOVERNMENT END USERS.
|
||||
|
||||
The Covered Software is a "commercial item," as that term is
|
||||
defined in 48 C.F.R. 2.101 (Oct. 1995), consisting of "commercial
|
||||
computer software" (as that term is defined at 48
|
||||
C.F.R. 252.227-7014(a)(1)) and "commercial computer software
|
||||
documentation" as such terms are used in 48 C.F.R. 12.212
|
||||
(Sept. 1995). Consistent with 48 C.F.R. 12.212 and 48
|
||||
C.F.R. 227.7202-1 through 227.7202-4 (June 1995), all
|
||||
U.S. Government End Users acquire Covered Software with only those
|
||||
rights set forth herein. This U.S. Government Rights clause is in
|
||||
lieu of, and supersedes, any other FAR, DFAR, or other clause or
|
||||
provision that addresses Government rights in computer software
|
||||
under this License.
|
||||
|
||||
9. MISCELLANEOUS.
|
||||
|
||||
This License represents the complete agreement concerning subject
|
||||
matter hereof. If any provision of this License is held to be
|
||||
unenforceable, such provision shall be reformed only to the extent
|
||||
necessary to make it enforceable. This License shall be governed
|
||||
by the law of the jurisdiction specified in a notice contained
|
||||
within the Original Software (except to the extent applicable law,
|
||||
if any, provides otherwise), excluding such jurisdiction's
|
||||
conflict-of-law provisions. Any litigation relating to this
|
||||
License shall be subject to the jurisdiction of the courts located
|
||||
in the jurisdiction and venue specified in a notice contained
|
||||
within the Original Software, with the losing party responsible
|
||||
for costs, including, without limitation, court costs and
|
||||
reasonable attorneys' fees and expenses. The application of the
|
||||
United Nations Convention on Contracts for the International Sale
|
||||
of Goods is expressly excluded. Any law or regulation which
|
||||
provides that the language of a contract shall be construed
|
||||
against the drafter shall not apply to this License. You agree
|
||||
that You alone are responsible for compliance with the United
|
||||
States export administration regulations (and the export control
|
||||
laws and regulation of any other countries) when You use,
|
||||
distribute or otherwise make available any Covered Software.
|
||||
|
||||
10. RESPONSIBILITY FOR CLAIMS.
|
||||
|
||||
As between Initial Developer and the Contributors, each party is
|
||||
responsible for claims and damages arising, directly or
|
||||
indirectly, out of its utilization of rights under this License
|
||||
and You agree to work with Initial Developer and Contributors to
|
||||
distribute such responsibility on an equitable basis. Nothing
|
||||
herein is intended or shall be deemed to constitute any admission
|
||||
of liability.
|
||||
|
||||
--------------------------------------------------------------------
|
||||
|
||||
NOTICE PURSUANT TO SECTION 9 OF THE COMMON DEVELOPMENT AND
|
||||
DISTRIBUTION LICENSE (CDDL)
|
||||
|
||||
For Covered Software in this distribution, this License shall
|
||||
be governed by the laws of the State of California (excluding
|
||||
conflict-of-law provisions).
|
||||
|
||||
Any litigation relating to this License shall be subject to the
|
||||
jurisdiction of the Federal Courts of the Northern District of
|
||||
California and the state courts of the State of California, with
|
||||
venue lying in Santa Clara County, California.
|
1755
common/acl/acl_common.c
Normal file
1755
common/acl/acl_common.c
Normal file
File diff suppressed because it is too large
Load Diff
59
common/acl/acl_common.h
Normal file
59
common/acl/acl_common.h
Normal file
@ -0,0 +1,59 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _ACL_COMMON_H
|
||||
#define _ACL_COMMON_H
|
||||
|
||||
#include <sys/types.h>
|
||||
#include <sys/acl.h>
|
||||
#include <sys/stat.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
extern ace_t trivial_acl[6];
|
||||
|
||||
extern int acltrivial(const char *);
|
||||
extern void adjust_ace_pair(ace_t *pair, mode_t mode);
|
||||
extern void adjust_ace_pair_common(void *, size_t, size_t, mode_t);
|
||||
extern int ace_trivial(ace_t *acep, int aclcnt);
|
||||
extern int ace_trivial_common(void *, int,
|
||||
uint64_t (*walk)(void *, uint64_t, int aclcnt, uint16_t *, uint16_t *,
|
||||
uint32_t *mask));
|
||||
extern acl_t *acl_alloc(acl_type_t);
|
||||
extern void acl_free(acl_t *aclp);
|
||||
extern int acl_translate(acl_t *aclp, int target_flavor,
|
||||
int isdir, uid_t owner, gid_t group);
|
||||
void ksort(caddr_t v, int n, int s, int (*f)());
|
||||
int cmp2acls(void *a, void *b);
|
||||
int acl_trivial_create(mode_t mode, ace_t **acl, int *count);
|
||||
void acl_trivial_access_masks(mode_t mode, uint32_t *allow0, uint32_t *deny1,
|
||||
uint32_t *deny2, uint32_t *owner, uint32_t *group, uint32_t *everyone);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _ACL_COMMON_H */
|
573
common/atomic/amd64/atomic.s
Normal file
573
common/atomic/amd64/atomic.s
Normal file
@ -0,0 +1,573 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
|
||||
/*
|
||||
* Copyright (c) 2004, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
.file "atomic.s"
|
||||
|
||||
#include <sys/asm_linkage.h>
|
||||
|
||||
#if defined(_KERNEL)
|
||||
/*
|
||||
* Legacy kernel interfaces; they will go away (eventually).
|
||||
*/
|
||||
ANSI_PRAGMA_WEAK2(cas8,atomic_cas_8,function)
|
||||
ANSI_PRAGMA_WEAK2(cas32,atomic_cas_32,function)
|
||||
ANSI_PRAGMA_WEAK2(cas64,atomic_cas_64,function)
|
||||
ANSI_PRAGMA_WEAK2(caslong,atomic_cas_ulong,function)
|
||||
ANSI_PRAGMA_WEAK2(casptr,atomic_cas_ptr,function)
|
||||
ANSI_PRAGMA_WEAK2(atomic_and_long,atomic_and_ulong,function)
|
||||
ANSI_PRAGMA_WEAK2(atomic_or_long,atomic_or_ulong,function)
|
||||
#endif
|
||||
|
||||
ENTRY(atomic_inc_8)
|
||||
ALTENTRY(atomic_inc_uchar)
|
||||
lock
|
||||
incb (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_inc_uchar)
|
||||
SET_SIZE(atomic_inc_8)
|
||||
|
||||
ENTRY(atomic_inc_16)
|
||||
ALTENTRY(atomic_inc_ushort)
|
||||
lock
|
||||
incw (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_inc_ushort)
|
||||
SET_SIZE(atomic_inc_16)
|
||||
|
||||
ENTRY(atomic_inc_32)
|
||||
ALTENTRY(atomic_inc_uint)
|
||||
lock
|
||||
incl (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_inc_uint)
|
||||
SET_SIZE(atomic_inc_32)
|
||||
|
||||
ENTRY(atomic_inc_64)
|
||||
ALTENTRY(atomic_inc_ulong)
|
||||
lock
|
||||
incq (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_inc_ulong)
|
||||
SET_SIZE(atomic_inc_64)
|
||||
|
||||
ENTRY(atomic_inc_8_nv)
|
||||
ALTENTRY(atomic_inc_uchar_nv)
|
||||
xorl %eax, %eax / clear upper bits of %eax return register
|
||||
incb %al / %al = 1
|
||||
lock
|
||||
xaddb %al, (%rdi) / %al = old value, (%rdi) = new value
|
||||
incb %al / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_inc_uchar_nv)
|
||||
SET_SIZE(atomic_inc_8_nv)
|
||||
|
||||
ENTRY(atomic_inc_16_nv)
|
||||
ALTENTRY(atomic_inc_ushort_nv)
|
||||
xorl %eax, %eax / clear upper bits of %eax return register
|
||||
incw %ax / %ax = 1
|
||||
lock
|
||||
xaddw %ax, (%rdi) / %ax = old value, (%rdi) = new value
|
||||
incw %ax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_inc_ushort_nv)
|
||||
SET_SIZE(atomic_inc_16_nv)
|
||||
|
||||
ENTRY(atomic_inc_32_nv)
|
||||
ALTENTRY(atomic_inc_uint_nv)
|
||||
xorl %eax, %eax / %eax = 0
|
||||
incl %eax / %eax = 1
|
||||
lock
|
||||
xaddl %eax, (%rdi) / %eax = old value, (%rdi) = new value
|
||||
incl %eax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_inc_uint_nv)
|
||||
SET_SIZE(atomic_inc_32_nv)
|
||||
|
||||
ENTRY(atomic_inc_64_nv)
|
||||
ALTENTRY(atomic_inc_ulong_nv)
|
||||
xorq %rax, %rax / %rax = 0
|
||||
incq %rax / %rax = 1
|
||||
lock
|
||||
xaddq %rax, (%rdi) / %rax = old value, (%rdi) = new value
|
||||
incq %rax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_inc_ulong_nv)
|
||||
SET_SIZE(atomic_inc_64_nv)
|
||||
|
||||
ENTRY(atomic_dec_8)
|
||||
ALTENTRY(atomic_dec_uchar)
|
||||
lock
|
||||
decb (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_dec_uchar)
|
||||
SET_SIZE(atomic_dec_8)
|
||||
|
||||
ENTRY(atomic_dec_16)
|
||||
ALTENTRY(atomic_dec_ushort)
|
||||
lock
|
||||
decw (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_dec_ushort)
|
||||
SET_SIZE(atomic_dec_16)
|
||||
|
||||
ENTRY(atomic_dec_32)
|
||||
ALTENTRY(atomic_dec_uint)
|
||||
lock
|
||||
decl (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_dec_uint)
|
||||
SET_SIZE(atomic_dec_32)
|
||||
|
||||
ENTRY(atomic_dec_64)
|
||||
ALTENTRY(atomic_dec_ulong)
|
||||
lock
|
||||
decq (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_dec_ulong)
|
||||
SET_SIZE(atomic_dec_64)
|
||||
|
||||
ENTRY(atomic_dec_8_nv)
|
||||
ALTENTRY(atomic_dec_uchar_nv)
|
||||
xorl %eax, %eax / clear upper bits of %eax return register
|
||||
decb %al / %al = -1
|
||||
lock
|
||||
xaddb %al, (%rdi) / %al = old value, (%rdi) = new value
|
||||
decb %al / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_dec_uchar_nv)
|
||||
SET_SIZE(atomic_dec_8_nv)
|
||||
|
||||
ENTRY(atomic_dec_16_nv)
|
||||
ALTENTRY(atomic_dec_ushort_nv)
|
||||
xorl %eax, %eax / clear upper bits of %eax return register
|
||||
decw %ax / %ax = -1
|
||||
lock
|
||||
xaddw %ax, (%rdi) / %ax = old value, (%rdi) = new value
|
||||
decw %ax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_dec_ushort_nv)
|
||||
SET_SIZE(atomic_dec_16_nv)
|
||||
|
||||
ENTRY(atomic_dec_32_nv)
|
||||
ALTENTRY(atomic_dec_uint_nv)
|
||||
xorl %eax, %eax / %eax = 0
|
||||
decl %eax / %eax = -1
|
||||
lock
|
||||
xaddl %eax, (%rdi) / %eax = old value, (%rdi) = new value
|
||||
decl %eax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_dec_uint_nv)
|
||||
SET_SIZE(atomic_dec_32_nv)
|
||||
|
||||
ENTRY(atomic_dec_64_nv)
|
||||
ALTENTRY(atomic_dec_ulong_nv)
|
||||
xorq %rax, %rax / %rax = 0
|
||||
decq %rax / %rax = -1
|
||||
lock
|
||||
xaddq %rax, (%rdi) / %rax = old value, (%rdi) = new value
|
||||
decq %rax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_dec_ulong_nv)
|
||||
SET_SIZE(atomic_dec_64_nv)
|
||||
|
||||
ENTRY(atomic_add_8)
|
||||
ALTENTRY(atomic_add_char)
|
||||
lock
|
||||
addb %sil, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_add_char)
|
||||
SET_SIZE(atomic_add_8)
|
||||
|
||||
ENTRY(atomic_add_16)
|
||||
ALTENTRY(atomic_add_short)
|
||||
lock
|
||||
addw %si, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_add_short)
|
||||
SET_SIZE(atomic_add_16)
|
||||
|
||||
ENTRY(atomic_add_32)
|
||||
ALTENTRY(atomic_add_int)
|
||||
lock
|
||||
addl %esi, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_add_int)
|
||||
SET_SIZE(atomic_add_32)
|
||||
|
||||
ENTRY(atomic_add_64)
|
||||
ALTENTRY(atomic_add_ptr)
|
||||
ALTENTRY(atomic_add_long)
|
||||
lock
|
||||
addq %rsi, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_add_long)
|
||||
SET_SIZE(atomic_add_ptr)
|
||||
SET_SIZE(atomic_add_64)
|
||||
|
||||
ENTRY(atomic_or_8)
|
||||
ALTENTRY(atomic_or_uchar)
|
||||
lock
|
||||
orb %sil, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_or_uchar)
|
||||
SET_SIZE(atomic_or_8)
|
||||
|
||||
ENTRY(atomic_or_16)
|
||||
ALTENTRY(atomic_or_ushort)
|
||||
lock
|
||||
orw %si, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_or_ushort)
|
||||
SET_SIZE(atomic_or_16)
|
||||
|
||||
ENTRY(atomic_or_32)
|
||||
ALTENTRY(atomic_or_uint)
|
||||
lock
|
||||
orl %esi, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_or_uint)
|
||||
SET_SIZE(atomic_or_32)
|
||||
|
||||
ENTRY(atomic_or_64)
|
||||
ALTENTRY(atomic_or_ulong)
|
||||
lock
|
||||
orq %rsi, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_or_ulong)
|
||||
SET_SIZE(atomic_or_64)
|
||||
|
||||
ENTRY(atomic_and_8)
|
||||
ALTENTRY(atomic_and_uchar)
|
||||
lock
|
||||
andb %sil, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_and_uchar)
|
||||
SET_SIZE(atomic_and_8)
|
||||
|
||||
ENTRY(atomic_and_16)
|
||||
ALTENTRY(atomic_and_ushort)
|
||||
lock
|
||||
andw %si, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_and_ushort)
|
||||
SET_SIZE(atomic_and_16)
|
||||
|
||||
ENTRY(atomic_and_32)
|
||||
ALTENTRY(atomic_and_uint)
|
||||
lock
|
||||
andl %esi, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_and_uint)
|
||||
SET_SIZE(atomic_and_32)
|
||||
|
||||
ENTRY(atomic_and_64)
|
||||
ALTENTRY(atomic_and_ulong)
|
||||
lock
|
||||
andq %rsi, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_and_ulong)
|
||||
SET_SIZE(atomic_and_64)
|
||||
|
||||
ENTRY(atomic_add_8_nv)
|
||||
ALTENTRY(atomic_add_char_nv)
|
||||
movzbl %sil, %eax / %al = delta addend, clear upper bits
|
||||
lock
|
||||
xaddb %sil, (%rdi) / %sil = old value, (%rdi) = sum
|
||||
addb %sil, %al / new value = original value + delta
|
||||
ret
|
||||
SET_SIZE(atomic_add_char_nv)
|
||||
SET_SIZE(atomic_add_8_nv)
|
||||
|
||||
ENTRY(atomic_add_16_nv)
|
||||
ALTENTRY(atomic_add_short_nv)
|
||||
movzwl %si, %eax / %ax = delta addend, clean upper bits
|
||||
lock
|
||||
xaddw %si, (%rdi) / %si = old value, (%rdi) = sum
|
||||
addw %si, %ax / new value = original value + delta
|
||||
ret
|
||||
SET_SIZE(atomic_add_short_nv)
|
||||
SET_SIZE(atomic_add_16_nv)
|
||||
|
||||
ENTRY(atomic_add_32_nv)
|
||||
ALTENTRY(atomic_add_int_nv)
|
||||
mov %esi, %eax / %eax = delta addend
|
||||
lock
|
||||
xaddl %esi, (%rdi) / %esi = old value, (%rdi) = sum
|
||||
add %esi, %eax / new value = original value + delta
|
||||
ret
|
||||
SET_SIZE(atomic_add_int_nv)
|
||||
SET_SIZE(atomic_add_32_nv)
|
||||
|
||||
ENTRY(atomic_add_64_nv)
|
||||
ALTENTRY(atomic_add_ptr_nv)
|
||||
ALTENTRY(atomic_add_long_nv)
|
||||
mov %rsi, %rax / %rax = delta addend
|
||||
lock
|
||||
xaddq %rsi, (%rdi) / %rsi = old value, (%rdi) = sum
|
||||
addq %rsi, %rax / new value = original value + delta
|
||||
ret
|
||||
SET_SIZE(atomic_add_long_nv)
|
||||
SET_SIZE(atomic_add_ptr_nv)
|
||||
SET_SIZE(atomic_add_64_nv)
|
||||
|
||||
ENTRY(atomic_and_8_nv)
|
||||
ALTENTRY(atomic_and_uchar_nv)
|
||||
movb (%rdi), %al / %al = old value
|
||||
1:
|
||||
movb %sil, %cl
|
||||
andb %al, %cl / %cl = new value
|
||||
lock
|
||||
cmpxchgb %cl, (%rdi) / try to stick it in
|
||||
jne 1b
|
||||
movzbl %cl, %eax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_and_uchar_nv)
|
||||
SET_SIZE(atomic_and_8_nv)
|
||||
|
||||
ENTRY(atomic_and_16_nv)
|
||||
ALTENTRY(atomic_and_ushort_nv)
|
||||
movw (%rdi), %ax / %ax = old value
|
||||
1:
|
||||
movw %si, %cx
|
||||
andw %ax, %cx / %cx = new value
|
||||
lock
|
||||
cmpxchgw %cx, (%rdi) / try to stick it in
|
||||
jne 1b
|
||||
movzwl %cx, %eax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_and_ushort_nv)
|
||||
SET_SIZE(atomic_and_16_nv)
|
||||
|
||||
ENTRY(atomic_and_32_nv)
|
||||
ALTENTRY(atomic_and_uint_nv)
|
||||
movl (%rdi), %eax
|
||||
1:
|
||||
movl %esi, %ecx
|
||||
andl %eax, %ecx
|
||||
lock
|
||||
cmpxchgl %ecx, (%rdi)
|
||||
jne 1b
|
||||
movl %ecx, %eax
|
||||
ret
|
||||
SET_SIZE(atomic_and_uint_nv)
|
||||
SET_SIZE(atomic_and_32_nv)
|
||||
|
||||
ENTRY(atomic_and_64_nv)
|
||||
ALTENTRY(atomic_and_ulong_nv)
|
||||
movq (%rdi), %rax
|
||||
1:
|
||||
movq %rsi, %rcx
|
||||
andq %rax, %rcx
|
||||
lock
|
||||
cmpxchgq %rcx, (%rdi)
|
||||
jne 1b
|
||||
movq %rcx, %rax
|
||||
ret
|
||||
SET_SIZE(atomic_and_ulong_nv)
|
||||
SET_SIZE(atomic_and_64_nv)
|
||||
|
||||
ENTRY(atomic_or_8_nv)
|
||||
ALTENTRY(atomic_or_uchar_nv)
|
||||
movb (%rdi), %al / %al = old value
|
||||
1:
|
||||
movb %sil, %cl
|
||||
orb %al, %cl / %cl = new value
|
||||
lock
|
||||
cmpxchgb %cl, (%rdi) / try to stick it in
|
||||
jne 1b
|
||||
movzbl %cl, %eax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_or_uchar_nv)
|
||||
SET_SIZE(atomic_or_8_nv)
|
||||
|
||||
ENTRY(atomic_or_16_nv)
|
||||
ALTENTRY(atomic_or_ushort_nv)
|
||||
movw (%rdi), %ax / %ax = old value
|
||||
1:
|
||||
movw %si, %cx
|
||||
orw %ax, %cx / %cx = new value
|
||||
lock
|
||||
cmpxchgw %cx, (%rdi) / try to stick it in
|
||||
jne 1b
|
||||
movzwl %cx, %eax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_or_ushort_nv)
|
||||
SET_SIZE(atomic_or_16_nv)
|
||||
|
||||
ENTRY(atomic_or_32_nv)
|
||||
ALTENTRY(atomic_or_uint_nv)
|
||||
movl (%rdi), %eax
|
||||
1:
|
||||
movl %esi, %ecx
|
||||
orl %eax, %ecx
|
||||
lock
|
||||
cmpxchgl %ecx, (%rdi)
|
||||
jne 1b
|
||||
movl %ecx, %eax
|
||||
ret
|
||||
SET_SIZE(atomic_or_uint_nv)
|
||||
SET_SIZE(atomic_or_32_nv)
|
||||
|
||||
ENTRY(atomic_or_64_nv)
|
||||
ALTENTRY(atomic_or_ulong_nv)
|
||||
movq (%rdi), %rax
|
||||
1:
|
||||
movq %rsi, %rcx
|
||||
orq %rax, %rcx
|
||||
lock
|
||||
cmpxchgq %rcx, (%rdi)
|
||||
jne 1b
|
||||
movq %rcx, %rax
|
||||
ret
|
||||
SET_SIZE(atomic_or_ulong_nv)
|
||||
SET_SIZE(atomic_or_64_nv)
|
||||
|
||||
ENTRY(atomic_cas_8)
|
||||
ALTENTRY(atomic_cas_uchar)
|
||||
movzbl %sil, %eax
|
||||
lock
|
||||
cmpxchgb %dl, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_cas_uchar)
|
||||
SET_SIZE(atomic_cas_8)
|
||||
|
||||
ENTRY(atomic_cas_16)
|
||||
ALTENTRY(atomic_cas_ushort)
|
||||
movzwl %si, %eax
|
||||
lock
|
||||
cmpxchgw %dx, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_cas_ushort)
|
||||
SET_SIZE(atomic_cas_16)
|
||||
|
||||
ENTRY(atomic_cas_32)
|
||||
ALTENTRY(atomic_cas_uint)
|
||||
movl %esi, %eax
|
||||
lock
|
||||
cmpxchgl %edx, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_cas_uint)
|
||||
SET_SIZE(atomic_cas_32)
|
||||
|
||||
ENTRY(atomic_cas_64)
|
||||
ALTENTRY(atomic_cas_ulong)
|
||||
ALTENTRY(atomic_cas_ptr)
|
||||
movq %rsi, %rax
|
||||
lock
|
||||
cmpxchgq %rdx, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_cas_ptr)
|
||||
SET_SIZE(atomic_cas_ulong)
|
||||
SET_SIZE(atomic_cas_64)
|
||||
|
||||
ENTRY(atomic_swap_8)
|
||||
ALTENTRY(atomic_swap_uchar)
|
||||
movzbl %sil, %eax
|
||||
lock
|
||||
xchgb %al, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_swap_uchar)
|
||||
SET_SIZE(atomic_swap_8)
|
||||
|
||||
ENTRY(atomic_swap_16)
|
||||
ALTENTRY(atomic_swap_ushort)
|
||||
movzwl %si, %eax
|
||||
lock
|
||||
xchgw %ax, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_swap_ushort)
|
||||
SET_SIZE(atomic_swap_16)
|
||||
|
||||
ENTRY(atomic_swap_32)
|
||||
ALTENTRY(atomic_swap_uint)
|
||||
movl %esi, %eax
|
||||
lock
|
||||
xchgl %eax, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_swap_uint)
|
||||
SET_SIZE(atomic_swap_32)
|
||||
|
||||
ENTRY(atomic_swap_64)
|
||||
ALTENTRY(atomic_swap_ulong)
|
||||
ALTENTRY(atomic_swap_ptr)
|
||||
movq %rsi, %rax
|
||||
lock
|
||||
xchgq %rax, (%rdi)
|
||||
ret
|
||||
SET_SIZE(atomic_swap_ptr)
|
||||
SET_SIZE(atomic_swap_ulong)
|
||||
SET_SIZE(atomic_swap_64)
|
||||
|
||||
ENTRY(atomic_set_long_excl)
|
||||
xorl %eax, %eax
|
||||
lock
|
||||
btsq %rsi, (%rdi)
|
||||
jnc 1f
|
||||
decl %eax / return -1
|
||||
1:
|
||||
ret
|
||||
SET_SIZE(atomic_set_long_excl)
|
||||
|
||||
ENTRY(atomic_clear_long_excl)
|
||||
xorl %eax, %eax
|
||||
lock
|
||||
btrq %rsi, (%rdi)
|
||||
jc 1f
|
||||
decl %eax / return -1
|
||||
1:
|
||||
ret
|
||||
SET_SIZE(atomic_clear_long_excl)
|
||||
|
||||
#if !defined(_KERNEL)
|
||||
|
||||
/*
|
||||
* NOTE: membar_enter, and membar_exit are identical routines.
|
||||
* We define them separately, instead of using an ALTENTRY
|
||||
* definitions to alias them together, so that DTrace and
|
||||
* debuggers will see a unique address for them, allowing
|
||||
* more accurate tracing.
|
||||
*/
|
||||
|
||||
ENTRY(membar_enter)
|
||||
mfence
|
||||
ret
|
||||
SET_SIZE(membar_enter)
|
||||
|
||||
ENTRY(membar_exit)
|
||||
mfence
|
||||
ret
|
||||
SET_SIZE(membar_exit)
|
||||
|
||||
ENTRY(membar_producer)
|
||||
sfence
|
||||
ret
|
||||
SET_SIZE(membar_producer)
|
||||
|
||||
ENTRY(membar_consumer)
|
||||
lfence
|
||||
ret
|
||||
SET_SIZE(membar_consumer)
|
||||
|
||||
#endif /* !_KERNEL */
|
720
common/atomic/i386/atomic.s
Normal file
720
common/atomic/i386/atomic.s
Normal file
@ -0,0 +1,720 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
|
||||
/*
|
||||
* Copyright 2010 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
.file "atomic.s"
|
||||
|
||||
#include <sys/asm_linkage.h>
|
||||
|
||||
#if defined(_KERNEL)
|
||||
/*
|
||||
* Legacy kernel interfaces; they will go away (eventually).
|
||||
*/
|
||||
ANSI_PRAGMA_WEAK2(cas8,atomic_cas_8,function)
|
||||
ANSI_PRAGMA_WEAK2(cas32,atomic_cas_32,function)
|
||||
ANSI_PRAGMA_WEAK2(cas64,atomic_cas_64,function)
|
||||
ANSI_PRAGMA_WEAK2(caslong,atomic_cas_ulong,function)
|
||||
ANSI_PRAGMA_WEAK2(casptr,atomic_cas_ptr,function)
|
||||
ANSI_PRAGMA_WEAK2(atomic_and_long,atomic_and_ulong,function)
|
||||
ANSI_PRAGMA_WEAK2(atomic_or_long,atomic_or_ulong,function)
|
||||
#endif
|
||||
|
||||
ENTRY(atomic_inc_8)
|
||||
ALTENTRY(atomic_inc_uchar)
|
||||
movl 4(%esp), %eax
|
||||
lock
|
||||
incb (%eax)
|
||||
ret
|
||||
SET_SIZE(atomic_inc_uchar)
|
||||
SET_SIZE(atomic_inc_8)
|
||||
|
||||
ENTRY(atomic_inc_16)
|
||||
ALTENTRY(atomic_inc_ushort)
|
||||
movl 4(%esp), %eax
|
||||
lock
|
||||
incw (%eax)
|
||||
ret
|
||||
SET_SIZE(atomic_inc_ushort)
|
||||
SET_SIZE(atomic_inc_16)
|
||||
|
||||
ENTRY(atomic_inc_32)
|
||||
ALTENTRY(atomic_inc_uint)
|
||||
ALTENTRY(atomic_inc_ulong)
|
||||
movl 4(%esp), %eax
|
||||
lock
|
||||
incl (%eax)
|
||||
ret
|
||||
SET_SIZE(atomic_inc_ulong)
|
||||
SET_SIZE(atomic_inc_uint)
|
||||
SET_SIZE(atomic_inc_32)
|
||||
|
||||
ENTRY(atomic_inc_8_nv)
|
||||
ALTENTRY(atomic_inc_uchar_nv)
|
||||
movl 4(%esp), %edx / %edx = target address
|
||||
xorl %eax, %eax / clear upper bits of %eax
|
||||
incb %al / %al = 1
|
||||
lock
|
||||
xaddb %al, (%edx) / %al = old value, inc (%edx)
|
||||
incb %al / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_inc_uchar_nv)
|
||||
SET_SIZE(atomic_inc_8_nv)
|
||||
|
||||
ENTRY(atomic_inc_16_nv)
|
||||
ALTENTRY(atomic_inc_ushort_nv)
|
||||
movl 4(%esp), %edx / %edx = target address
|
||||
xorl %eax, %eax / clear upper bits of %eax
|
||||
incw %ax / %ax = 1
|
||||
lock
|
||||
xaddw %ax, (%edx) / %ax = old value, inc (%edx)
|
||||
incw %ax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_inc_ushort_nv)
|
||||
SET_SIZE(atomic_inc_16_nv)
|
||||
|
||||
ENTRY(atomic_inc_32_nv)
|
||||
ALTENTRY(atomic_inc_uint_nv)
|
||||
ALTENTRY(atomic_inc_ulong_nv)
|
||||
movl 4(%esp), %edx / %edx = target address
|
||||
xorl %eax, %eax / %eax = 0
|
||||
incl %eax / %eax = 1
|
||||
lock
|
||||
xaddl %eax, (%edx) / %eax = old value, inc (%edx)
|
||||
incl %eax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_inc_ulong_nv)
|
||||
SET_SIZE(atomic_inc_uint_nv)
|
||||
SET_SIZE(atomic_inc_32_nv)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_inc_64 and atomic_inc_64_nv are ever
|
||||
* separated, you need to also edit the libc i386 platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_inc_64_nv.
|
||||
*/
|
||||
ENTRY(atomic_inc_64)
|
||||
ALTENTRY(atomic_inc_64_nv)
|
||||
pushl %edi
|
||||
pushl %ebx
|
||||
movl 12(%esp), %edi / %edi = target address
|
||||
movl (%edi), %eax
|
||||
movl 4(%edi), %edx / %edx:%eax = old value
|
||||
1:
|
||||
xorl %ebx, %ebx
|
||||
xorl %ecx, %ecx
|
||||
incl %ebx / %ecx:%ebx = 1
|
||||
addl %eax, %ebx
|
||||
adcl %edx, %ecx / add in the carry from inc
|
||||
lock
|
||||
cmpxchg8b (%edi) / try to stick it in
|
||||
jne 1b
|
||||
movl %ebx, %eax
|
||||
movl %ecx, %edx / return new value
|
||||
popl %ebx
|
||||
popl %edi
|
||||
ret
|
||||
SET_SIZE(atomic_inc_64_nv)
|
||||
SET_SIZE(atomic_inc_64)
|
||||
|
||||
ENTRY(atomic_dec_8)
|
||||
ALTENTRY(atomic_dec_uchar)
|
||||
movl 4(%esp), %eax
|
||||
lock
|
||||
decb (%eax)
|
||||
ret
|
||||
SET_SIZE(atomic_dec_uchar)
|
||||
SET_SIZE(atomic_dec_8)
|
||||
|
||||
ENTRY(atomic_dec_16)
|
||||
ALTENTRY(atomic_dec_ushort)
|
||||
movl 4(%esp), %eax
|
||||
lock
|
||||
decw (%eax)
|
||||
ret
|
||||
SET_SIZE(atomic_dec_ushort)
|
||||
SET_SIZE(atomic_dec_16)
|
||||
|
||||
ENTRY(atomic_dec_32)
|
||||
ALTENTRY(atomic_dec_uint)
|
||||
ALTENTRY(atomic_dec_ulong)
|
||||
movl 4(%esp), %eax
|
||||
lock
|
||||
decl (%eax)
|
||||
ret
|
||||
SET_SIZE(atomic_dec_ulong)
|
||||
SET_SIZE(atomic_dec_uint)
|
||||
SET_SIZE(atomic_dec_32)
|
||||
|
||||
ENTRY(atomic_dec_8_nv)
|
||||
ALTENTRY(atomic_dec_uchar_nv)
|
||||
movl 4(%esp), %edx / %edx = target address
|
||||
xorl %eax, %eax / zero upper bits of %eax
|
||||
decb %al / %al = -1
|
||||
lock
|
||||
xaddb %al, (%edx) / %al = old value, dec (%edx)
|
||||
decb %al / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_dec_uchar_nv)
|
||||
SET_SIZE(atomic_dec_8_nv)
|
||||
|
||||
ENTRY(atomic_dec_16_nv)
|
||||
ALTENTRY(atomic_dec_ushort_nv)
|
||||
movl 4(%esp), %edx / %edx = target address
|
||||
xorl %eax, %eax / zero upper bits of %eax
|
||||
decw %ax / %ax = -1
|
||||
lock
|
||||
xaddw %ax, (%edx) / %ax = old value, dec (%edx)
|
||||
decw %ax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_dec_ushort_nv)
|
||||
SET_SIZE(atomic_dec_16_nv)
|
||||
|
||||
ENTRY(atomic_dec_32_nv)
|
||||
ALTENTRY(atomic_dec_uint_nv)
|
||||
ALTENTRY(atomic_dec_ulong_nv)
|
||||
movl 4(%esp), %edx / %edx = target address
|
||||
xorl %eax, %eax / %eax = 0
|
||||
decl %eax / %eax = -1
|
||||
lock
|
||||
xaddl %eax, (%edx) / %eax = old value, dec (%edx)
|
||||
decl %eax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_dec_ulong_nv)
|
||||
SET_SIZE(atomic_dec_uint_nv)
|
||||
SET_SIZE(atomic_dec_32_nv)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_dec_64 and atomic_dec_64_nv are ever
|
||||
* separated, it is important to edit the libc i386 platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_dec_64_nv.
|
||||
*/
|
||||
ENTRY(atomic_dec_64)
|
||||
ALTENTRY(atomic_dec_64_nv)
|
||||
pushl %edi
|
||||
pushl %ebx
|
||||
movl 12(%esp), %edi / %edi = target address
|
||||
movl (%edi), %eax
|
||||
movl 4(%edi), %edx / %edx:%eax = old value
|
||||
1:
|
||||
xorl %ebx, %ebx
|
||||
xorl %ecx, %ecx
|
||||
not %ecx
|
||||
not %ebx / %ecx:%ebx = -1
|
||||
addl %eax, %ebx
|
||||
adcl %edx, %ecx / add in the carry from inc
|
||||
lock
|
||||
cmpxchg8b (%edi) / try to stick it in
|
||||
jne 1b
|
||||
movl %ebx, %eax
|
||||
movl %ecx, %edx / return new value
|
||||
popl %ebx
|
||||
popl %edi
|
||||
ret
|
||||
SET_SIZE(atomic_dec_64_nv)
|
||||
SET_SIZE(atomic_dec_64)
|
||||
|
||||
ENTRY(atomic_add_8)
|
||||
ALTENTRY(atomic_add_char)
|
||||
movl 4(%esp), %eax
|
||||
movl 8(%esp), %ecx
|
||||
lock
|
||||
addb %cl, (%eax)
|
||||
ret
|
||||
SET_SIZE(atomic_add_char)
|
||||
SET_SIZE(atomic_add_8)
|
||||
|
||||
ENTRY(atomic_add_16)
|
||||
ALTENTRY(atomic_add_short)
|
||||
movl 4(%esp), %eax
|
||||
movl 8(%esp), %ecx
|
||||
lock
|
||||
addw %cx, (%eax)
|
||||
ret
|
||||
SET_SIZE(atomic_add_short)
|
||||
SET_SIZE(atomic_add_16)
|
||||
|
||||
ENTRY(atomic_add_32)
|
||||
ALTENTRY(atomic_add_int)
|
||||
ALTENTRY(atomic_add_ptr)
|
||||
ALTENTRY(atomic_add_long)
|
||||
movl 4(%esp), %eax
|
||||
movl 8(%esp), %ecx
|
||||
lock
|
||||
addl %ecx, (%eax)
|
||||
ret
|
||||
SET_SIZE(atomic_add_long)
|
||||
SET_SIZE(atomic_add_ptr)
|
||||
SET_SIZE(atomic_add_int)
|
||||
SET_SIZE(atomic_add_32)
|
||||
|
||||
ENTRY(atomic_or_8)
|
||||
ALTENTRY(atomic_or_uchar)
|
||||
movl 4(%esp), %eax
|
||||
movb 8(%esp), %cl
|
||||
lock
|
||||
orb %cl, (%eax)
|
||||
ret
|
||||
SET_SIZE(atomic_or_uchar)
|
||||
SET_SIZE(atomic_or_8)
|
||||
|
||||
ENTRY(atomic_or_16)
|
||||
ALTENTRY(atomic_or_ushort)
|
||||
movl 4(%esp), %eax
|
||||
movw 8(%esp), %cx
|
||||
lock
|
||||
orw %cx, (%eax)
|
||||
ret
|
||||
SET_SIZE(atomic_or_ushort)
|
||||
SET_SIZE(atomic_or_16)
|
||||
|
||||
ENTRY(atomic_or_32)
|
||||
ALTENTRY(atomic_or_uint)
|
||||
ALTENTRY(atomic_or_ulong)
|
||||
movl 4(%esp), %eax
|
||||
movl 8(%esp), %ecx
|
||||
lock
|
||||
orl %ecx, (%eax)
|
||||
ret
|
||||
SET_SIZE(atomic_or_ulong)
|
||||
SET_SIZE(atomic_or_uint)
|
||||
SET_SIZE(atomic_or_32)
|
||||
|
||||
ENTRY(atomic_and_8)
|
||||
ALTENTRY(atomic_and_uchar)
|
||||
movl 4(%esp), %eax
|
||||
movb 8(%esp), %cl
|
||||
lock
|
||||
andb %cl, (%eax)
|
||||
ret
|
||||
SET_SIZE(atomic_and_uchar)
|
||||
SET_SIZE(atomic_and_8)
|
||||
|
||||
ENTRY(atomic_and_16)
|
||||
ALTENTRY(atomic_and_ushort)
|
||||
movl 4(%esp), %eax
|
||||
movw 8(%esp), %cx
|
||||
lock
|
||||
andw %cx, (%eax)
|
||||
ret
|
||||
SET_SIZE(atomic_and_ushort)
|
||||
SET_SIZE(atomic_and_16)
|
||||
|
||||
ENTRY(atomic_and_32)
|
||||
ALTENTRY(atomic_and_uint)
|
||||
ALTENTRY(atomic_and_ulong)
|
||||
movl 4(%esp), %eax
|
||||
movl 8(%esp), %ecx
|
||||
lock
|
||||
andl %ecx, (%eax)
|
||||
ret
|
||||
SET_SIZE(atomic_and_ulong)
|
||||
SET_SIZE(atomic_and_uint)
|
||||
SET_SIZE(atomic_and_32)
|
||||
|
||||
ENTRY(atomic_add_8_nv)
|
||||
ALTENTRY(atomic_add_char_nv)
|
||||
movl 4(%esp), %edx / %edx = target address
|
||||
movb 8(%esp), %cl / %cl = delta
|
||||
movzbl %cl, %eax / %al = delta, zero extended
|
||||
lock
|
||||
xaddb %cl, (%edx) / %cl = old value, (%edx) = sum
|
||||
addb %cl, %al / return old value plus delta
|
||||
ret
|
||||
SET_SIZE(atomic_add_char_nv)
|
||||
SET_SIZE(atomic_add_8_nv)
|
||||
|
||||
ENTRY(atomic_add_16_nv)
|
||||
ALTENTRY(atomic_add_short_nv)
|
||||
movl 4(%esp), %edx / %edx = target address
|
||||
movw 8(%esp), %cx / %cx = delta
|
||||
movzwl %cx, %eax / %ax = delta, zero extended
|
||||
lock
|
||||
xaddw %cx, (%edx) / %cx = old value, (%edx) = sum
|
||||
addw %cx, %ax / return old value plus delta
|
||||
ret
|
||||
SET_SIZE(atomic_add_short_nv)
|
||||
SET_SIZE(atomic_add_16_nv)
|
||||
|
||||
ENTRY(atomic_add_32_nv)
|
||||
ALTENTRY(atomic_add_int_nv)
|
||||
ALTENTRY(atomic_add_ptr_nv)
|
||||
ALTENTRY(atomic_add_long_nv)
|
||||
movl 4(%esp), %edx / %edx = target address
|
||||
movl 8(%esp), %eax / %eax = delta
|
||||
movl %eax, %ecx / %ecx = delta
|
||||
lock
|
||||
xaddl %eax, (%edx) / %eax = old value, (%edx) = sum
|
||||
addl %ecx, %eax / return old value plus delta
|
||||
ret
|
||||
SET_SIZE(atomic_add_long_nv)
|
||||
SET_SIZE(atomic_add_ptr_nv)
|
||||
SET_SIZE(atomic_add_int_nv)
|
||||
SET_SIZE(atomic_add_32_nv)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_add_64 and atomic_add_64_nv are ever
|
||||
* separated, it is important to edit the libc i386 platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_add_64_nv.
|
||||
*/
|
||||
ENTRY(atomic_add_64)
|
||||
ALTENTRY(atomic_add_64_nv)
|
||||
pushl %edi
|
||||
pushl %ebx
|
||||
movl 12(%esp), %edi / %edi = target address
|
||||
movl (%edi), %eax
|
||||
movl 4(%edi), %edx / %edx:%eax = old value
|
||||
1:
|
||||
movl 16(%esp), %ebx
|
||||
movl 20(%esp), %ecx / %ecx:%ebx = delta
|
||||
addl %eax, %ebx
|
||||
adcl %edx, %ecx / %ecx:%ebx = new value
|
||||
lock
|
||||
cmpxchg8b (%edi) / try to stick it in
|
||||
jne 1b
|
||||
movl %ebx, %eax
|
||||
movl %ecx, %edx / return new value
|
||||
popl %ebx
|
||||
popl %edi
|
||||
ret
|
||||
SET_SIZE(atomic_add_64_nv)
|
||||
SET_SIZE(atomic_add_64)
|
||||
|
||||
ENTRY(atomic_or_8_nv)
|
||||
ALTENTRY(atomic_or_uchar_nv)
|
||||
movl 4(%esp), %edx / %edx = target address
|
||||
movb (%edx), %al / %al = old value
|
||||
1:
|
||||
movl 8(%esp), %ecx / %ecx = delta
|
||||
orb %al, %cl / %cl = new value
|
||||
lock
|
||||
cmpxchgb %cl, (%edx) / try to stick it in
|
||||
jne 1b
|
||||
movzbl %cl, %eax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_or_uchar_nv)
|
||||
SET_SIZE(atomic_or_8_nv)
|
||||
|
||||
ENTRY(atomic_or_16_nv)
|
||||
ALTENTRY(atomic_or_ushort_nv)
|
||||
movl 4(%esp), %edx / %edx = target address
|
||||
movw (%edx), %ax / %ax = old value
|
||||
1:
|
||||
movl 8(%esp), %ecx / %ecx = delta
|
||||
orw %ax, %cx / %cx = new value
|
||||
lock
|
||||
cmpxchgw %cx, (%edx) / try to stick it in
|
||||
jne 1b
|
||||
movzwl %cx, %eax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_or_ushort_nv)
|
||||
SET_SIZE(atomic_or_16_nv)
|
||||
|
||||
ENTRY(atomic_or_32_nv)
|
||||
ALTENTRY(atomic_or_uint_nv)
|
||||
ALTENTRY(atomic_or_ulong_nv)
|
||||
movl 4(%esp), %edx / %edx = target address
|
||||
movl (%edx), %eax / %eax = old value
|
||||
1:
|
||||
movl 8(%esp), %ecx / %ecx = delta
|
||||
orl %eax, %ecx / %ecx = new value
|
||||
lock
|
||||
cmpxchgl %ecx, (%edx) / try to stick it in
|
||||
jne 1b
|
||||
movl %ecx, %eax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_or_ulong_nv)
|
||||
SET_SIZE(atomic_or_uint_nv)
|
||||
SET_SIZE(atomic_or_32_nv)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_or_64 and atomic_or_64_nv are ever
|
||||
* separated, it is important to edit the libc i386 platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_or_64_nv.
|
||||
*/
|
||||
ENTRY(atomic_or_64)
|
||||
ALTENTRY(atomic_or_64_nv)
|
||||
pushl %edi
|
||||
pushl %ebx
|
||||
movl 12(%esp), %edi / %edi = target address
|
||||
movl (%edi), %eax
|
||||
movl 4(%edi), %edx / %edx:%eax = old value
|
||||
1:
|
||||
movl 16(%esp), %ebx
|
||||
movl 20(%esp), %ecx / %ecx:%ebx = delta
|
||||
orl %eax, %ebx
|
||||
orl %edx, %ecx / %ecx:%ebx = new value
|
||||
lock
|
||||
cmpxchg8b (%edi) / try to stick it in
|
||||
jne 1b
|
||||
movl %ebx, %eax
|
||||
movl %ecx, %edx / return new value
|
||||
popl %ebx
|
||||
popl %edi
|
||||
ret
|
||||
SET_SIZE(atomic_or_64_nv)
|
||||
SET_SIZE(atomic_or_64)
|
||||
|
||||
ENTRY(atomic_and_8_nv)
|
||||
ALTENTRY(atomic_and_uchar_nv)
|
||||
movl 4(%esp), %edx / %edx = target address
|
||||
movb (%edx), %al / %al = old value
|
||||
1:
|
||||
movl 8(%esp), %ecx / %ecx = delta
|
||||
andb %al, %cl / %cl = new value
|
||||
lock
|
||||
cmpxchgb %cl, (%edx) / try to stick it in
|
||||
jne 1b
|
||||
movzbl %cl, %eax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_and_uchar_nv)
|
||||
SET_SIZE(atomic_and_8_nv)
|
||||
|
||||
ENTRY(atomic_and_16_nv)
|
||||
ALTENTRY(atomic_and_ushort_nv)
|
||||
movl 4(%esp), %edx / %edx = target address
|
||||
movw (%edx), %ax / %ax = old value
|
||||
1:
|
||||
movl 8(%esp), %ecx / %ecx = delta
|
||||
andw %ax, %cx / %cx = new value
|
||||
lock
|
||||
cmpxchgw %cx, (%edx) / try to stick it in
|
||||
jne 1b
|
||||
movzwl %cx, %eax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_and_ushort_nv)
|
||||
SET_SIZE(atomic_and_16_nv)
|
||||
|
||||
ENTRY(atomic_and_32_nv)
|
||||
ALTENTRY(atomic_and_uint_nv)
|
||||
ALTENTRY(atomic_and_ulong_nv)
|
||||
movl 4(%esp), %edx / %edx = target address
|
||||
movl (%edx), %eax / %eax = old value
|
||||
1:
|
||||
movl 8(%esp), %ecx / %ecx = delta
|
||||
andl %eax, %ecx / %ecx = new value
|
||||
lock
|
||||
cmpxchgl %ecx, (%edx) / try to stick it in
|
||||
jne 1b
|
||||
movl %ecx, %eax / return new value
|
||||
ret
|
||||
SET_SIZE(atomic_and_ulong_nv)
|
||||
SET_SIZE(atomic_and_uint_nv)
|
||||
SET_SIZE(atomic_and_32_nv)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_and_64 and atomic_and_64_nv are ever
|
||||
* separated, it is important to edit the libc i386 platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_and_64_nv.
|
||||
*/
|
||||
ENTRY(atomic_and_64)
|
||||
ALTENTRY(atomic_and_64_nv)
|
||||
pushl %edi
|
||||
pushl %ebx
|
||||
movl 12(%esp), %edi / %edi = target address
|
||||
movl (%edi), %eax
|
||||
movl 4(%edi), %edx / %edx:%eax = old value
|
||||
1:
|
||||
movl 16(%esp), %ebx
|
||||
movl 20(%esp), %ecx / %ecx:%ebx = delta
|
||||
andl %eax, %ebx
|
||||
andl %edx, %ecx / %ecx:%ebx = new value
|
||||
lock
|
||||
cmpxchg8b (%edi) / try to stick it in
|
||||
jne 1b
|
||||
movl %ebx, %eax
|
||||
movl %ecx, %edx / return new value
|
||||
popl %ebx
|
||||
popl %edi
|
||||
ret
|
||||
SET_SIZE(atomic_and_64_nv)
|
||||
SET_SIZE(atomic_and_64)
|
||||
|
||||
ENTRY(atomic_cas_8)
|
||||
ALTENTRY(atomic_cas_uchar)
|
||||
movl 4(%esp), %edx
|
||||
movzbl 8(%esp), %eax
|
||||
movb 12(%esp), %cl
|
||||
lock
|
||||
cmpxchgb %cl, (%edx)
|
||||
ret
|
||||
SET_SIZE(atomic_cas_uchar)
|
||||
SET_SIZE(atomic_cas_8)
|
||||
|
||||
ENTRY(atomic_cas_16)
|
||||
ALTENTRY(atomic_cas_ushort)
|
||||
movl 4(%esp), %edx
|
||||
movzwl 8(%esp), %eax
|
||||
movw 12(%esp), %cx
|
||||
lock
|
||||
cmpxchgw %cx, (%edx)
|
||||
ret
|
||||
SET_SIZE(atomic_cas_ushort)
|
||||
SET_SIZE(atomic_cas_16)
|
||||
|
||||
ENTRY(atomic_cas_32)
|
||||
ALTENTRY(atomic_cas_uint)
|
||||
ALTENTRY(atomic_cas_ulong)
|
||||
ALTENTRY(atomic_cas_ptr)
|
||||
movl 4(%esp), %edx
|
||||
movl 8(%esp), %eax
|
||||
movl 12(%esp), %ecx
|
||||
lock
|
||||
cmpxchgl %ecx, (%edx)
|
||||
ret
|
||||
SET_SIZE(atomic_cas_ptr)
|
||||
SET_SIZE(atomic_cas_ulong)
|
||||
SET_SIZE(atomic_cas_uint)
|
||||
SET_SIZE(atomic_cas_32)
|
||||
|
||||
ENTRY(atomic_cas_64)
|
||||
pushl %ebx
|
||||
pushl %esi
|
||||
movl 12(%esp), %esi
|
||||
movl 16(%esp), %eax
|
||||
movl 20(%esp), %edx
|
||||
movl 24(%esp), %ebx
|
||||
movl 28(%esp), %ecx
|
||||
lock
|
||||
cmpxchg8b (%esi)
|
||||
popl %esi
|
||||
popl %ebx
|
||||
ret
|
||||
SET_SIZE(atomic_cas_64)
|
||||
|
||||
ENTRY(atomic_swap_8)
|
||||
ALTENTRY(atomic_swap_uchar)
|
||||
movl 4(%esp), %edx
|
||||
movzbl 8(%esp), %eax
|
||||
lock
|
||||
xchgb %al, (%edx)
|
||||
ret
|
||||
SET_SIZE(atomic_swap_uchar)
|
||||
SET_SIZE(atomic_swap_8)
|
||||
|
||||
ENTRY(atomic_swap_16)
|
||||
ALTENTRY(atomic_swap_ushort)
|
||||
movl 4(%esp), %edx
|
||||
movzwl 8(%esp), %eax
|
||||
lock
|
||||
xchgw %ax, (%edx)
|
||||
ret
|
||||
SET_SIZE(atomic_swap_ushort)
|
||||
SET_SIZE(atomic_swap_16)
|
||||
|
||||
ENTRY(atomic_swap_32)
|
||||
ALTENTRY(atomic_swap_uint)
|
||||
ALTENTRY(atomic_swap_ptr)
|
||||
ALTENTRY(atomic_swap_ulong)
|
||||
movl 4(%esp), %edx
|
||||
movl 8(%esp), %eax
|
||||
lock
|
||||
xchgl %eax, (%edx)
|
||||
ret
|
||||
SET_SIZE(atomic_swap_ulong)
|
||||
SET_SIZE(atomic_swap_ptr)
|
||||
SET_SIZE(atomic_swap_uint)
|
||||
SET_SIZE(atomic_swap_32)
|
||||
|
||||
ENTRY(atomic_swap_64)
|
||||
pushl %esi
|
||||
pushl %ebx
|
||||
movl 12(%esp), %esi
|
||||
movl 16(%esp), %ebx
|
||||
movl 20(%esp), %ecx
|
||||
movl (%esi), %eax
|
||||
movl 4(%esi), %edx / %edx:%eax = old value
|
||||
1:
|
||||
lock
|
||||
cmpxchg8b (%esi)
|
||||
jne 1b
|
||||
popl %ebx
|
||||
popl %esi
|
||||
ret
|
||||
SET_SIZE(atomic_swap_64)
|
||||
|
||||
ENTRY(atomic_set_long_excl)
|
||||
movl 4(%esp), %edx / %edx = target address
|
||||
movl 8(%esp), %ecx / %ecx = bit id
|
||||
xorl %eax, %eax
|
||||
lock
|
||||
btsl %ecx, (%edx)
|
||||
jnc 1f
|
||||
decl %eax / return -1
|
||||
1:
|
||||
ret
|
||||
SET_SIZE(atomic_set_long_excl)
|
||||
|
||||
ENTRY(atomic_clear_long_excl)
|
||||
movl 4(%esp), %edx / %edx = target address
|
||||
movl 8(%esp), %ecx / %ecx = bit id
|
||||
xorl %eax, %eax
|
||||
lock
|
||||
btrl %ecx, (%edx)
|
||||
jc 1f
|
||||
decl %eax / return -1
|
||||
1:
|
||||
ret
|
||||
SET_SIZE(atomic_clear_long_excl)
|
||||
|
||||
#if !defined(_KERNEL)
|
||||
|
||||
/*
|
||||
* NOTE: membar_enter, membar_exit, membar_producer, and
|
||||
* membar_consumer are all identical routines. We define them
|
||||
* separately, instead of using ALTENTRY definitions to alias them
|
||||
* together, so that DTrace and debuggers will see a unique address
|
||||
* for them, allowing more accurate tracing.
|
||||
*/
|
||||
|
||||
|
||||
ENTRY(membar_enter)
|
||||
lock
|
||||
xorl $0, (%esp)
|
||||
ret
|
||||
SET_SIZE(membar_enter)
|
||||
|
||||
ENTRY(membar_exit)
|
||||
lock
|
||||
xorl $0, (%esp)
|
||||
ret
|
||||
SET_SIZE(membar_exit)
|
||||
|
||||
ENTRY(membar_producer)
|
||||
lock
|
||||
xorl $0, (%esp)
|
||||
ret
|
||||
SET_SIZE(membar_producer)
|
||||
|
||||
ENTRY(membar_consumer)
|
||||
lock
|
||||
xorl $0, (%esp)
|
||||
ret
|
||||
SET_SIZE(membar_consumer)
|
||||
|
||||
#endif /* !_KERNEL */
|
801
common/atomic/sparc/atomic.s
Normal file
801
common/atomic/sparc/atomic.s
Normal file
@ -0,0 +1,801 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
|
||||
/*
|
||||
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
.file "atomic.s"
|
||||
|
||||
#include <sys/asm_linkage.h>
|
||||
|
||||
#if defined(_KERNEL)
|
||||
/*
|
||||
* Legacy kernel interfaces; they will go away (eventually).
|
||||
*/
|
||||
ANSI_PRAGMA_WEAK2(cas8,atomic_cas_8,function)
|
||||
ANSI_PRAGMA_WEAK2(cas32,atomic_cas_32,function)
|
||||
ANSI_PRAGMA_WEAK2(cas64,atomic_cas_64,function)
|
||||
ANSI_PRAGMA_WEAK2(caslong,atomic_cas_ulong,function)
|
||||
ANSI_PRAGMA_WEAK2(casptr,atomic_cas_ptr,function)
|
||||
ANSI_PRAGMA_WEAK2(atomic_and_long,atomic_and_ulong,function)
|
||||
ANSI_PRAGMA_WEAK2(atomic_or_long,atomic_or_ulong,function)
|
||||
ANSI_PRAGMA_WEAK2(swapl,atomic_swap_32,function)
|
||||
#endif
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_inc_8 and atomic_inc_8_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_inc_8_nv.
|
||||
*/
|
||||
ENTRY(atomic_inc_8)
|
||||
ALTENTRY(atomic_inc_8_nv)
|
||||
ALTENTRY(atomic_inc_uchar)
|
||||
ALTENTRY(atomic_inc_uchar_nv)
|
||||
ba add_8
|
||||
add %g0, 1, %o1
|
||||
SET_SIZE(atomic_inc_uchar_nv)
|
||||
SET_SIZE(atomic_inc_uchar)
|
||||
SET_SIZE(atomic_inc_8_nv)
|
||||
SET_SIZE(atomic_inc_8)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_dec_8 and atomic_dec_8_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_dec_8_nv.
|
||||
*/
|
||||
ENTRY(atomic_dec_8)
|
||||
ALTENTRY(atomic_dec_8_nv)
|
||||
ALTENTRY(atomic_dec_uchar)
|
||||
ALTENTRY(atomic_dec_uchar_nv)
|
||||
ba add_8
|
||||
sub %g0, 1, %o1
|
||||
SET_SIZE(atomic_dec_uchar_nv)
|
||||
SET_SIZE(atomic_dec_uchar)
|
||||
SET_SIZE(atomic_dec_8_nv)
|
||||
SET_SIZE(atomic_dec_8)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_add_8 and atomic_add_8_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_add_8_nv.
|
||||
*/
|
||||
ENTRY(atomic_add_8)
|
||||
ALTENTRY(atomic_add_8_nv)
|
||||
ALTENTRY(atomic_add_char)
|
||||
ALTENTRY(atomic_add_char_nv)
|
||||
add_8:
|
||||
and %o0, 0x3, %o4 ! %o4 = byte offset, left-to-right
|
||||
xor %o4, 0x3, %g1 ! %g1 = byte offset, right-to-left
|
||||
sll %g1, 3, %g1 ! %g1 = bit offset, right-to-left
|
||||
set 0xff, %o3 ! %o3 = mask
|
||||
sll %o3, %g1, %o3 ! %o3 = shifted to bit offset
|
||||
sll %o1, %g1, %o1 ! %o1 = shifted to bit offset
|
||||
and %o1, %o3, %o1 ! %o1 = single byte value
|
||||
andn %o0, 0x3, %o0 ! %o0 = word address
|
||||
ld [%o0], %o2 ! read old value
|
||||
1:
|
||||
add %o2, %o1, %o5 ! add value to the old value
|
||||
and %o5, %o3, %o5 ! clear other bits
|
||||
andn %o2, %o3, %o4 ! clear target bits
|
||||
or %o4, %o5, %o5 ! insert the new value
|
||||
cas [%o0], %o2, %o5
|
||||
cmp %o2, %o5
|
||||
bne,a,pn %icc, 1b
|
||||
mov %o5, %o2 ! %o2 = old value
|
||||
add %o2, %o1, %o5
|
||||
and %o5, %o3, %o5
|
||||
retl
|
||||
srl %o5, %g1, %o0 ! %o0 = new value
|
||||
SET_SIZE(atomic_add_char_nv)
|
||||
SET_SIZE(atomic_add_char)
|
||||
SET_SIZE(atomic_add_8_nv)
|
||||
SET_SIZE(atomic_add_8)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_inc_16 and atomic_inc_16_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_inc_16_nv.
|
||||
*/
|
||||
ENTRY(atomic_inc_16)
|
||||
ALTENTRY(atomic_inc_16_nv)
|
||||
ALTENTRY(atomic_inc_ushort)
|
||||
ALTENTRY(atomic_inc_ushort_nv)
|
||||
ba add_16
|
||||
add %g0, 1, %o1
|
||||
SET_SIZE(atomic_inc_ushort_nv)
|
||||
SET_SIZE(atomic_inc_ushort)
|
||||
SET_SIZE(atomic_inc_16_nv)
|
||||
SET_SIZE(atomic_inc_16)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_dec_16 and atomic_dec_16_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_dec_16_nv.
|
||||
*/
|
||||
ENTRY(atomic_dec_16)
|
||||
ALTENTRY(atomic_dec_16_nv)
|
||||
ALTENTRY(atomic_dec_ushort)
|
||||
ALTENTRY(atomic_dec_ushort_nv)
|
||||
ba add_16
|
||||
sub %g0, 1, %o1
|
||||
SET_SIZE(atomic_dec_ushort_nv)
|
||||
SET_SIZE(atomic_dec_ushort)
|
||||
SET_SIZE(atomic_dec_16_nv)
|
||||
SET_SIZE(atomic_dec_16)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_add_16 and atomic_add_16_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_add_16_nv.
|
||||
*/
|
||||
ENTRY(atomic_add_16)
|
||||
ALTENTRY(atomic_add_16_nv)
|
||||
ALTENTRY(atomic_add_short)
|
||||
ALTENTRY(atomic_add_short_nv)
|
||||
add_16:
|
||||
and %o0, 0x2, %o4 ! %o4 = byte offset, left-to-right
|
||||
xor %o4, 0x2, %g1 ! %g1 = byte offset, right-to-left
|
||||
sll %o4, 3, %o4 ! %o4 = bit offset, left-to-right
|
||||
sll %g1, 3, %g1 ! %g1 = bit offset, right-to-left
|
||||
sethi %hi(0xffff0000), %o3 ! %o3 = mask
|
||||
srl %o3, %o4, %o3 ! %o3 = shifted to bit offset
|
||||
sll %o1, %g1, %o1 ! %o1 = shifted to bit offset
|
||||
and %o1, %o3, %o1 ! %o1 = single short value
|
||||
andn %o0, 0x2, %o0 ! %o0 = word address
|
||||
! if low-order bit is 1, we will properly get an alignment fault here
|
||||
ld [%o0], %o2 ! read old value
|
||||
1:
|
||||
add %o1, %o2, %o5 ! add value to the old value
|
||||
and %o5, %o3, %o5 ! clear other bits
|
||||
andn %o2, %o3, %o4 ! clear target bits
|
||||
or %o4, %o5, %o5 ! insert the new value
|
||||
cas [%o0], %o2, %o5
|
||||
cmp %o2, %o5
|
||||
bne,a,pn %icc, 1b
|
||||
mov %o5, %o2 ! %o2 = old value
|
||||
add %o1, %o2, %o5
|
||||
and %o5, %o3, %o5
|
||||
retl
|
||||
srl %o5, %g1, %o0 ! %o0 = new value
|
||||
SET_SIZE(atomic_add_short_nv)
|
||||
SET_SIZE(atomic_add_short)
|
||||
SET_SIZE(atomic_add_16_nv)
|
||||
SET_SIZE(atomic_add_16)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_inc_32 and atomic_inc_32_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_inc_32_nv.
|
||||
*/
|
||||
ENTRY(atomic_inc_32)
|
||||
ALTENTRY(atomic_inc_32_nv)
|
||||
ALTENTRY(atomic_inc_uint)
|
||||
ALTENTRY(atomic_inc_uint_nv)
|
||||
ALTENTRY(atomic_inc_ulong)
|
||||
ALTENTRY(atomic_inc_ulong_nv)
|
||||
ba add_32
|
||||
add %g0, 1, %o1
|
||||
SET_SIZE(atomic_inc_ulong_nv)
|
||||
SET_SIZE(atomic_inc_ulong)
|
||||
SET_SIZE(atomic_inc_uint_nv)
|
||||
SET_SIZE(atomic_inc_uint)
|
||||
SET_SIZE(atomic_inc_32_nv)
|
||||
SET_SIZE(atomic_inc_32)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_dec_32 and atomic_dec_32_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_dec_32_nv.
|
||||
*/
|
||||
ENTRY(atomic_dec_32)
|
||||
ALTENTRY(atomic_dec_32_nv)
|
||||
ALTENTRY(atomic_dec_uint)
|
||||
ALTENTRY(atomic_dec_uint_nv)
|
||||
ALTENTRY(atomic_dec_ulong)
|
||||
ALTENTRY(atomic_dec_ulong_nv)
|
||||
ba add_32
|
||||
sub %g0, 1, %o1
|
||||
SET_SIZE(atomic_dec_ulong_nv)
|
||||
SET_SIZE(atomic_dec_ulong)
|
||||
SET_SIZE(atomic_dec_uint_nv)
|
||||
SET_SIZE(atomic_dec_uint)
|
||||
SET_SIZE(atomic_dec_32_nv)
|
||||
SET_SIZE(atomic_dec_32)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_add_32 and atomic_add_32_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_add_32_nv.
|
||||
*/
|
||||
ENTRY(atomic_add_32)
|
||||
ALTENTRY(atomic_add_32_nv)
|
||||
ALTENTRY(atomic_add_int)
|
||||
ALTENTRY(atomic_add_int_nv)
|
||||
ALTENTRY(atomic_add_ptr)
|
||||
ALTENTRY(atomic_add_ptr_nv)
|
||||
ALTENTRY(atomic_add_long)
|
||||
ALTENTRY(atomic_add_long_nv)
|
||||
add_32:
|
||||
ld [%o0], %o2
|
||||
1:
|
||||
add %o2, %o1, %o3
|
||||
cas [%o0], %o2, %o3
|
||||
cmp %o2, %o3
|
||||
bne,a,pn %icc, 1b
|
||||
mov %o3, %o2
|
||||
retl
|
||||
add %o2, %o1, %o0 ! return new value
|
||||
SET_SIZE(atomic_add_long_nv)
|
||||
SET_SIZE(atomic_add_long)
|
||||
SET_SIZE(atomic_add_ptr_nv)
|
||||
SET_SIZE(atomic_add_ptr)
|
||||
SET_SIZE(atomic_add_int_nv)
|
||||
SET_SIZE(atomic_add_int)
|
||||
SET_SIZE(atomic_add_32_nv)
|
||||
SET_SIZE(atomic_add_32)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_inc_64 and atomic_inc_64_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_inc_64_nv.
|
||||
*/
|
||||
ENTRY(atomic_inc_64)
|
||||
ALTENTRY(atomic_inc_64_nv)
|
||||
ba add_64
|
||||
add %g0, 1, %o1
|
||||
SET_SIZE(atomic_inc_64_nv)
|
||||
SET_SIZE(atomic_inc_64)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_dec_64 and atomic_dec_64_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_dec_64_nv.
|
||||
*/
|
||||
ENTRY(atomic_dec_64)
|
||||
ALTENTRY(atomic_dec_64_nv)
|
||||
ba add_64
|
||||
sub %g0, 1, %o1
|
||||
SET_SIZE(atomic_dec_64_nv)
|
||||
SET_SIZE(atomic_dec_64)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_add_64 and atomic_add_64_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_add_64_nv.
|
||||
*/
|
||||
ENTRY(atomic_add_64)
|
||||
ALTENTRY(atomic_add_64_nv)
|
||||
sllx %o1, 32, %o1 ! upper 32 in %o1, lower in %o2
|
||||
srl %o2, 0, %o2
|
||||
add %o1, %o2, %o1 ! convert 2 32-bit args into 1 64-bit
|
||||
add_64:
|
||||
ldx [%o0], %o2
|
||||
1:
|
||||
add %o2, %o1, %o3
|
||||
casx [%o0], %o2, %o3
|
||||
cmp %o2, %o3
|
||||
bne,a,pn %xcc, 1b
|
||||
mov %o3, %o2
|
||||
add %o2, %o1, %o1 ! return lower 32-bits in %o1
|
||||
retl
|
||||
srlx %o1, 32, %o0 ! return upper 32-bits in %o0
|
||||
SET_SIZE(atomic_add_64_nv)
|
||||
SET_SIZE(atomic_add_64)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_or_8 and atomic_or_8_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_or_8_nv.
|
||||
*/
|
||||
ENTRY(atomic_or_8)
|
||||
ALTENTRY(atomic_or_8_nv)
|
||||
ALTENTRY(atomic_or_uchar)
|
||||
ALTENTRY(atomic_or_uchar_nv)
|
||||
and %o0, 0x3, %o4 ! %o4 = byte offset, left-to-right
|
||||
xor %o4, 0x3, %g1 ! %g1 = byte offset, right-to-left
|
||||
sll %g1, 3, %g1 ! %g1 = bit offset, right-to-left
|
||||
set 0xff, %o3 ! %o3 = mask
|
||||
sll %o3, %g1, %o3 ! %o3 = shifted to bit offset
|
||||
sll %o1, %g1, %o1 ! %o1 = shifted to bit offset
|
||||
and %o1, %o3, %o1 ! %o1 = single byte value
|
||||
andn %o0, 0x3, %o0 ! %o0 = word address
|
||||
ld [%o0], %o2 ! read old value
|
||||
1:
|
||||
or %o2, %o1, %o5 ! or in the new value
|
||||
cas [%o0], %o2, %o5
|
||||
cmp %o2, %o5
|
||||
bne,a,pn %icc, 1b
|
||||
mov %o5, %o2 ! %o2 = old value
|
||||
or %o2, %o1, %o5
|
||||
and %o5, %o3, %o5
|
||||
retl
|
||||
srl %o5, %g1, %o0 ! %o0 = new value
|
||||
SET_SIZE(atomic_or_uchar_nv)
|
||||
SET_SIZE(atomic_or_uchar)
|
||||
SET_SIZE(atomic_or_8_nv)
|
||||
SET_SIZE(atomic_or_8)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_or_16 and atomic_or_16_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_or_16_nv.
|
||||
*/
|
||||
ENTRY(atomic_or_16)
|
||||
ALTENTRY(atomic_or_16_nv)
|
||||
ALTENTRY(atomic_or_ushort)
|
||||
ALTENTRY(atomic_or_ushort_nv)
|
||||
and %o0, 0x2, %o4 ! %o4 = byte offset, left-to-right
|
||||
xor %o4, 0x2, %g1 ! %g1 = byte offset, right-to-left
|
||||
sll %o4, 3, %o4 ! %o4 = bit offset, left-to-right
|
||||
sll %g1, 3, %g1 ! %g1 = bit offset, right-to-left
|
||||
sethi %hi(0xffff0000), %o3 ! %o3 = mask
|
||||
srl %o3, %o4, %o3 ! %o3 = shifted to bit offset
|
||||
sll %o1, %g1, %o1 ! %o1 = shifted to bit offset
|
||||
and %o1, %o3, %o1 ! %o1 = single short value
|
||||
andn %o0, 0x2, %o0 ! %o0 = word address
|
||||
! if low-order bit is 1, we will properly get an alignment fault here
|
||||
ld [%o0], %o2 ! read old value
|
||||
1:
|
||||
or %o2, %o1, %o5 ! or in the new value
|
||||
cas [%o0], %o2, %o5
|
||||
cmp %o2, %o5
|
||||
bne,a,pn %icc, 1b
|
||||
mov %o5, %o2 ! %o2 = old value
|
||||
or %o2, %o1, %o5 ! or in the new value
|
||||
and %o5, %o3, %o5
|
||||
retl
|
||||
srl %o5, %g1, %o0 ! %o0 = new value
|
||||
SET_SIZE(atomic_or_ushort_nv)
|
||||
SET_SIZE(atomic_or_ushort)
|
||||
SET_SIZE(atomic_or_16_nv)
|
||||
SET_SIZE(atomic_or_16)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_or_32 and atomic_or_32_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_or_32_nv.
|
||||
*/
|
||||
ENTRY(atomic_or_32)
|
||||
ALTENTRY(atomic_or_32_nv)
|
||||
ALTENTRY(atomic_or_uint)
|
||||
ALTENTRY(atomic_or_uint_nv)
|
||||
ALTENTRY(atomic_or_ulong)
|
||||
ALTENTRY(atomic_or_ulong_nv)
|
||||
ld [%o0], %o2
|
||||
1:
|
||||
or %o2, %o1, %o3
|
||||
cas [%o0], %o2, %o3
|
||||
cmp %o2, %o3
|
||||
bne,a,pn %icc, 1b
|
||||
mov %o3, %o2
|
||||
retl
|
||||
or %o2, %o1, %o0 ! return new value
|
||||
SET_SIZE(atomic_or_ulong_nv)
|
||||
SET_SIZE(atomic_or_ulong)
|
||||
SET_SIZE(atomic_or_uint_nv)
|
||||
SET_SIZE(atomic_or_uint)
|
||||
SET_SIZE(atomic_or_32_nv)
|
||||
SET_SIZE(atomic_or_32)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_or_64 and atomic_or_64_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_or_64_nv.
|
||||
*/
|
||||
ENTRY(atomic_or_64)
|
||||
ALTENTRY(atomic_or_64_nv)
|
||||
sllx %o1, 32, %o1 ! upper 32 in %o1, lower in %o2
|
||||
srl %o2, 0, %o2
|
||||
add %o1, %o2, %o1 ! convert 2 32-bit args into 1 64-bit
|
||||
ldx [%o0], %o2
|
||||
1:
|
||||
or %o2, %o1, %o3
|
||||
casx [%o0], %o2, %o3
|
||||
cmp %o2, %o3
|
||||
bne,a,pn %xcc, 1b
|
||||
mov %o3, %o2
|
||||
or %o2, %o1, %o1 ! return lower 32-bits in %o1
|
||||
retl
|
||||
srlx %o1, 32, %o0 ! return upper 32-bits in %o0
|
||||
SET_SIZE(atomic_or_64_nv)
|
||||
SET_SIZE(atomic_or_64)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_and_8 and atomic_and_8_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_and_8_nv.
|
||||
*/
|
||||
ENTRY(atomic_and_8)
|
||||
ALTENTRY(atomic_and_8_nv)
|
||||
ALTENTRY(atomic_and_uchar)
|
||||
ALTENTRY(atomic_and_uchar_nv)
|
||||
and %o0, 0x3, %o4 ! %o4 = byte offset, left-to-right
|
||||
xor %o4, 0x3, %g1 ! %g1 = byte offset, right-to-left
|
||||
sll %g1, 3, %g1 ! %g1 = bit offset, right-to-left
|
||||
set 0xff, %o3 ! %o3 = mask
|
||||
sll %o3, %g1, %o3 ! %o3 = shifted to bit offset
|
||||
sll %o1, %g1, %o1 ! %o1 = shifted to bit offset
|
||||
orn %o1, %o3, %o1 ! all ones in other bytes
|
||||
andn %o0, 0x3, %o0 ! %o0 = word address
|
||||
ld [%o0], %o2 ! read old value
|
||||
1:
|
||||
and %o2, %o1, %o5 ! and in the new value
|
||||
cas [%o0], %o2, %o5
|
||||
cmp %o2, %o5
|
||||
bne,a,pn %icc, 1b
|
||||
mov %o5, %o2 ! %o2 = old value
|
||||
and %o2, %o1, %o5
|
||||
and %o5, %o3, %o5
|
||||
retl
|
||||
srl %o5, %g1, %o0 ! %o0 = new value
|
||||
SET_SIZE(atomic_and_uchar_nv)
|
||||
SET_SIZE(atomic_and_uchar)
|
||||
SET_SIZE(atomic_and_8_nv)
|
||||
SET_SIZE(atomic_and_8)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_and_16 and atomic_and_16_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_and_16_nv.
|
||||
*/
|
||||
ENTRY(atomic_and_16)
|
||||
ALTENTRY(atomic_and_16_nv)
|
||||
ALTENTRY(atomic_and_ushort)
|
||||
ALTENTRY(atomic_and_ushort_nv)
|
||||
and %o0, 0x2, %o4 ! %o4 = byte offset, left-to-right
|
||||
xor %o4, 0x2, %g1 ! %g1 = byte offset, right-to-left
|
||||
sll %o4, 3, %o4 ! %o4 = bit offset, left-to-right
|
||||
sll %g1, 3, %g1 ! %g1 = bit offset, right-to-left
|
||||
sethi %hi(0xffff0000), %o3 ! %o3 = mask
|
||||
srl %o3, %o4, %o3 ! %o3 = shifted to bit offset
|
||||
sll %o1, %g1, %o1 ! %o1 = shifted to bit offset
|
||||
orn %o1, %o3, %o1 ! all ones in the other half
|
||||
andn %o0, 0x2, %o0 ! %o0 = word address
|
||||
! if low-order bit is 1, we will properly get an alignment fault here
|
||||
ld [%o0], %o2 ! read old value
|
||||
1:
|
||||
and %o2, %o1, %o5 ! and in the new value
|
||||
cas [%o0], %o2, %o5
|
||||
cmp %o2, %o5
|
||||
bne,a,pn %icc, 1b
|
||||
mov %o5, %o2 ! %o2 = old value
|
||||
and %o2, %o1, %o5
|
||||
and %o5, %o3, %o5
|
||||
retl
|
||||
srl %o5, %g1, %o0 ! %o0 = new value
|
||||
SET_SIZE(atomic_and_ushort_nv)
|
||||
SET_SIZE(atomic_and_ushort)
|
||||
SET_SIZE(atomic_and_16_nv)
|
||||
SET_SIZE(atomic_and_16)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_and_32 and atomic_and_32_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_and_32_nv.
|
||||
*/
|
||||
ENTRY(atomic_and_32)
|
||||
ALTENTRY(atomic_and_32_nv)
|
||||
ALTENTRY(atomic_and_uint)
|
||||
ALTENTRY(atomic_and_uint_nv)
|
||||
ALTENTRY(atomic_and_ulong)
|
||||
ALTENTRY(atomic_and_ulong_nv)
|
||||
ld [%o0], %o2
|
||||
1:
|
||||
and %o2, %o1, %o3
|
||||
cas [%o0], %o2, %o3
|
||||
cmp %o2, %o3
|
||||
bne,a,pn %icc, 1b
|
||||
mov %o3, %o2
|
||||
retl
|
||||
and %o2, %o1, %o0 ! return new value
|
||||
SET_SIZE(atomic_and_ulong_nv)
|
||||
SET_SIZE(atomic_and_ulong)
|
||||
SET_SIZE(atomic_and_uint_nv)
|
||||
SET_SIZE(atomic_and_uint)
|
||||
SET_SIZE(atomic_and_32_nv)
|
||||
SET_SIZE(atomic_and_32)
|
||||
|
||||
/*
|
||||
* NOTE: If atomic_and_64 and atomic_and_64_nv are ever
|
||||
* separated, you need to also edit the libc sparc platform
|
||||
* specific mapfile and remove the NODYNSORT attribute
|
||||
* from atomic_and_64_nv.
|
||||
*/
|
||||
ENTRY(atomic_and_64)
|
||||
ALTENTRY(atomic_and_64_nv)
|
||||
sllx %o1, 32, %o1 ! upper 32 in %o1, lower in %o2
|
||||
srl %o2, 0, %o2
|
||||
add %o1, %o2, %o1 ! convert 2 32-bit args into 1 64-bit
|
||||
ldx [%o0], %o2
|
||||
1:
|
||||
and %o2, %o1, %o3
|
||||
casx [%o0], %o2, %o3
|
||||
cmp %o2, %o3
|
||||
bne,a,pn %xcc, 1b
|
||||
mov %o3, %o2
|
||||
and %o2, %o1, %o1 ! return lower 32-bits in %o1
|
||||
retl
|
||||
srlx %o1, 32, %o0 ! return upper 32-bits in %o0
|
||||
SET_SIZE(atomic_and_64_nv)
|
||||
SET_SIZE(atomic_and_64)
|
||||
|
||||
ENTRY(atomic_cas_8)
|
||||
ALTENTRY(atomic_cas_uchar)
|
||||
and %o0, 0x3, %o4 ! %o4 = byte offset, left-to-right
|
||||
xor %o4, 0x3, %g1 ! %g1 = byte offset, right-to-left
|
||||
sll %g1, 3, %g1 ! %g1 = bit offset, right-to-left
|
||||
set 0xff, %o3 ! %o3 = mask
|
||||
sll %o3, %g1, %o3 ! %o3 = shifted to bit offset
|
||||
sll %o1, %g1, %o1 ! %o1 = shifted to bit offset
|
||||
and %o1, %o3, %o1 ! %o1 = single byte value
|
||||
sll %o2, %g1, %o2 ! %o2 = shifted to bit offset
|
||||
and %o2, %o3, %o2 ! %o2 = single byte value
|
||||
andn %o0, 0x3, %o0 ! %o0 = word address
|
||||
ld [%o0], %o4 ! read old value
|
||||
1:
|
||||
andn %o4, %o3, %o4 ! clear target bits
|
||||
or %o4, %o2, %o5 ! insert the new value
|
||||
or %o4, %o1, %o4 ! insert the comparison value
|
||||
cas [%o0], %o4, %o5
|
||||
cmp %o4, %o5 ! did we succeed?
|
||||
be,pt %icc, 2f
|
||||
and %o5, %o3, %o4 ! isolate the old value
|
||||
cmp %o1, %o4 ! should we have succeeded?
|
||||
be,a,pt %icc, 1b ! yes, try again
|
||||
mov %o5, %o4 ! %o4 = old value
|
||||
2:
|
||||
retl
|
||||
srl %o4, %g1, %o0 ! %o0 = old value
|
||||
SET_SIZE(atomic_cas_uchar)
|
||||
SET_SIZE(atomic_cas_8)
|
||||
|
||||
ENTRY(atomic_cas_16)
|
||||
ALTENTRY(atomic_cas_ushort)
|
||||
and %o0, 0x2, %o4 ! %o4 = byte offset, left-to-right
|
||||
xor %o4, 0x2, %g1 ! %g1 = byte offset, right-to-left
|
||||
sll %o4, 3, %o4 ! %o4 = bit offset, left-to-right
|
||||
sll %g1, 3, %g1 ! %g1 = bit offset, right-to-left
|
||||
sethi %hi(0xffff0000), %o3 ! %o3 = mask
|
||||
srl %o3, %o4, %o3 ! %o3 = shifted to bit offset
|
||||
sll %o1, %g1, %o1 ! %o1 = shifted to bit offset
|
||||
and %o1, %o3, %o1 ! %o1 = single short value
|
||||
sll %o2, %g1, %o2 ! %o2 = shifted to bit offset
|
||||
and %o2, %o3, %o2 ! %o2 = single short value
|
||||
andn %o0, 0x2, %o0 ! %o0 = word address
|
||||
! if low-order bit is 1, we will properly get an alignment fault here
|
||||
ld [%o0], %o4 ! read old value
|
||||
1:
|
||||
andn %o4, %o3, %o4 ! clear target bits
|
||||
or %o4, %o2, %o5 ! insert the new value
|
||||
or %o4, %o1, %o4 ! insert the comparison value
|
||||
cas [%o0], %o4, %o5
|
||||
cmp %o4, %o5 ! did we succeed?
|
||||
be,pt %icc, 2f
|
||||
and %o5, %o3, %o4 ! isolate the old value
|
||||
cmp %o1, %o4 ! should we have succeeded?
|
||||
be,a,pt %icc, 1b ! yes, try again
|
||||
mov %o5, %o4 ! %o4 = old value
|
||||
2:
|
||||
retl
|
||||
srl %o4, %g1, %o0 ! %o0 = old value
|
||||
SET_SIZE(atomic_cas_ushort)
|
||||
SET_SIZE(atomic_cas_16)
|
||||
|
||||
ENTRY(atomic_cas_32)
|
||||
ALTENTRY(atomic_cas_uint)
|
||||
ALTENTRY(atomic_cas_ptr)
|
||||
ALTENTRY(atomic_cas_ulong)
|
||||
cas [%o0], %o1, %o2
|
||||
retl
|
||||
mov %o2, %o0
|
||||
SET_SIZE(atomic_cas_ulong)
|
||||
SET_SIZE(atomic_cas_ptr)
|
||||
SET_SIZE(atomic_cas_uint)
|
||||
SET_SIZE(atomic_cas_32)
|
||||
|
||||
ENTRY(atomic_cas_64)
|
||||
sllx %o1, 32, %o1 ! cmp's upper 32 in %o1, lower in %o2
|
||||
srl %o2, 0, %o2 ! convert 2 32-bit args into 1 64-bit
|
||||
add %o1, %o2, %o1
|
||||
sllx %o3, 32, %o2 ! newval upper 32 in %o3, lower in %o4
|
||||
srl %o4, 0, %o4 ! setup %o2 to have newval
|
||||
add %o2, %o4, %o2
|
||||
casx [%o0], %o1, %o2
|
||||
srl %o2, 0, %o1 ! return lower 32-bits in %o1
|
||||
retl
|
||||
srlx %o2, 32, %o0 ! return upper 32-bits in %o0
|
||||
SET_SIZE(atomic_cas_64)
|
||||
|
||||
ENTRY(atomic_swap_8)
|
||||
ALTENTRY(atomic_swap_uchar)
|
||||
and %o0, 0x3, %o4 ! %o4 = byte offset, left-to-right
|
||||
xor %o4, 0x3, %g1 ! %g1 = byte offset, right-to-left
|
||||
sll %g1, 3, %g1 ! %g1 = bit offset, right-to-left
|
||||
set 0xff, %o3 ! %o3 = mask
|
||||
sll %o3, %g1, %o3 ! %o3 = shifted to bit offset
|
||||
sll %o1, %g1, %o1 ! %o1 = shifted to bit offset
|
||||
and %o1, %o3, %o1 ! %o1 = single byte value
|
||||
andn %o0, 0x3, %o0 ! %o0 = word address
|
||||
ld [%o0], %o2 ! read old value
|
||||
1:
|
||||
andn %o2, %o3, %o5 ! clear target bits
|
||||
or %o5, %o1, %o5 ! insert the new value
|
||||
cas [%o0], %o2, %o5
|
||||
cmp %o2, %o5
|
||||
bne,a,pn %icc, 1b
|
||||
mov %o5, %o2 ! %o2 = old value
|
||||
and %o5, %o3, %o5
|
||||
retl
|
||||
srl %o5, %g1, %o0 ! %o0 = old value
|
||||
SET_SIZE(atomic_swap_uchar)
|
||||
SET_SIZE(atomic_swap_8)
|
||||
|
||||
ENTRY(atomic_swap_16)
|
||||
ALTENTRY(atomic_swap_ushort)
|
||||
and %o0, 0x2, %o4 ! %o4 = byte offset, left-to-right
|
||||
xor %o4, 0x2, %g1 ! %g1 = byte offset, right-to-left
|
||||
sll %o4, 3, %o4 ! %o4 = bit offset, left-to-right
|
||||
sll %g1, 3, %g1 ! %g1 = bit offset, right-to-left
|
||||
sethi %hi(0xffff0000), %o3 ! %o3 = mask
|
||||
srl %o3, %o4, %o3 ! %o3 = shifted to bit offset
|
||||
sll %o1, %g1, %o1 ! %o1 = shifted to bit offset
|
||||
and %o1, %o3, %o1 ! %o1 = single short value
|
||||
andn %o0, 0x2, %o0 ! %o0 = word address
|
||||
! if low-order bit is 1, we will properly get an alignment fault here
|
||||
ld [%o0], %o2 ! read old value
|
||||
1:
|
||||
andn %o2, %o3, %o5 ! clear target bits
|
||||
or %o5, %o1, %o5 ! insert the new value
|
||||
cas [%o0], %o2, %o5
|
||||
cmp %o2, %o5
|
||||
bne,a,pn %icc, 1b
|
||||
mov %o5, %o2 ! %o2 = old value
|
||||
and %o5, %o3, %o5
|
||||
retl
|
||||
srl %o5, %g1, %o0 ! %o0 = old value
|
||||
SET_SIZE(atomic_swap_ushort)
|
||||
SET_SIZE(atomic_swap_16)
|
||||
|
||||
ENTRY(atomic_swap_32)
|
||||
ALTENTRY(atomic_swap_uint)
|
||||
ALTENTRY(atomic_swap_ptr)
|
||||
ALTENTRY(atomic_swap_ulong)
|
||||
ld [%o0], %o2
|
||||
1:
|
||||
mov %o1, %o3
|
||||
cas [%o0], %o2, %o3
|
||||
cmp %o2, %o3
|
||||
bne,a,pn %icc, 1b
|
||||
mov %o3, %o2
|
||||
retl
|
||||
mov %o3, %o0
|
||||
SET_SIZE(atomic_swap_ulong)
|
||||
SET_SIZE(atomic_swap_ptr)
|
||||
SET_SIZE(atomic_swap_uint)
|
||||
SET_SIZE(atomic_swap_32)
|
||||
|
||||
ENTRY(atomic_swap_64)
|
||||
sllx %o1, 32, %o1 ! upper 32 in %o1, lower in %o2
|
||||
srl %o2, 0, %o2
|
||||
add %o1, %o2, %o1 ! convert 2 32-bit args into 1 64-bit
|
||||
ldx [%o0], %o2
|
||||
1:
|
||||
mov %o1, %o3
|
||||
casx [%o0], %o2, %o3
|
||||
cmp %o2, %o3
|
||||
bne,a,pn %xcc, 1b
|
||||
mov %o3, %o2
|
||||
srl %o3, 0, %o1 ! return lower 32-bits in %o1
|
||||
retl
|
||||
srlx %o3, 32, %o0 ! return upper 32-bits in %o0
|
||||
SET_SIZE(atomic_swap_64)
|
||||
|
||||
ENTRY(atomic_set_long_excl)
|
||||
mov 1, %o3
|
||||
slln %o3, %o1, %o3
|
||||
ldn [%o0], %o2
|
||||
1:
|
||||
andcc %o2, %o3, %g0 ! test if the bit is set
|
||||
bnz,a,pn %ncc, 2f ! if so, then fail out
|
||||
mov -1, %o0
|
||||
or %o2, %o3, %o4 ! set the bit, and try to commit it
|
||||
casn [%o0], %o2, %o4
|
||||
cmp %o2, %o4
|
||||
bne,a,pn %ncc, 1b ! failed to commit, try again
|
||||
mov %o4, %o2
|
||||
mov %g0, %o0
|
||||
2:
|
||||
retl
|
||||
nop
|
||||
SET_SIZE(atomic_set_long_excl)
|
||||
|
||||
ENTRY(atomic_clear_long_excl)
|
||||
mov 1, %o3
|
||||
slln %o3, %o1, %o3
|
||||
ldn [%o0], %o2
|
||||
1:
|
||||
andncc %o3, %o2, %g0 ! test if the bit is clear
|
||||
bnz,a,pn %ncc, 2f ! if so, then fail out
|
||||
mov -1, %o0
|
||||
andn %o2, %o3, %o4 ! clear the bit, and try to commit it
|
||||
casn [%o0], %o2, %o4
|
||||
cmp %o2, %o4
|
||||
bne,a,pn %ncc, 1b ! failed to commit, try again
|
||||
mov %o4, %o2
|
||||
mov %g0, %o0
|
||||
2:
|
||||
retl
|
||||
nop
|
||||
SET_SIZE(atomic_clear_long_excl)
|
||||
|
||||
#if !defined(_KERNEL)
|
||||
|
||||
/*
|
||||
* Spitfires and Blackbirds have a problem with membars in the
|
||||
* delay slot (SF_ERRATA_51). For safety's sake, we assume
|
||||
* that the whole world needs the workaround.
|
||||
*/
|
||||
ENTRY(membar_enter)
|
||||
membar #StoreLoad|#StoreStore
|
||||
retl
|
||||
nop
|
||||
SET_SIZE(membar_enter)
|
||||
|
||||
ENTRY(membar_exit)
|
||||
membar #LoadStore|#StoreStore
|
||||
retl
|
||||
nop
|
||||
SET_SIZE(membar_exit)
|
||||
|
||||
ENTRY(membar_producer)
|
||||
membar #StoreStore
|
||||
retl
|
||||
nop
|
||||
SET_SIZE(membar_producer)
|
||||
|
||||
ENTRY(membar_consumer)
|
||||
membar #LoadLoad
|
||||
retl
|
||||
nop
|
||||
SET_SIZE(membar_consumer)
|
||||
|
||||
#endif /* !_KERNEL */
|
1030
common/avl/avl.c
Normal file
1030
common/avl/avl.c
Normal file
File diff suppressed because it is too large
Load Diff
251
common/list/list.c
Normal file
251
common/list/list.c
Normal file
@ -0,0 +1,251 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
/*
|
||||
* Generic doubly-linked list implementation
|
||||
*/
|
||||
|
||||
#include <sys/list.h>
|
||||
#include <sys/list_impl.h>
|
||||
#include <sys/types.h>
|
||||
#include <sys/sysmacros.h>
|
||||
#ifdef _KERNEL
|
||||
#include <sys/debug.h>
|
||||
#else
|
||||
#include <assert.h>
|
||||
#define ASSERT(a) assert(a)
|
||||
#endif
|
||||
|
||||
#ifdef lint
|
||||
extern list_node_t *list_d2l(list_t *list, void *obj);
|
||||
#else
|
||||
#define list_d2l(a, obj) ((list_node_t *)(((char *)obj) + (a)->list_offset))
|
||||
#endif
|
||||
#define list_object(a, node) ((void *)(((char *)node) - (a)->list_offset))
|
||||
#define list_empty(a) ((a)->list_head.list_next == &(a)->list_head)
|
||||
|
||||
#define list_insert_after_node(list, node, object) { \
|
||||
list_node_t *lnew = list_d2l(list, object); \
|
||||
lnew->list_prev = (node); \
|
||||
lnew->list_next = (node)->list_next; \
|
||||
(node)->list_next->list_prev = lnew; \
|
||||
(node)->list_next = lnew; \
|
||||
}
|
||||
|
||||
#define list_insert_before_node(list, node, object) { \
|
||||
list_node_t *lnew = list_d2l(list, object); \
|
||||
lnew->list_next = (node); \
|
||||
lnew->list_prev = (node)->list_prev; \
|
||||
(node)->list_prev->list_next = lnew; \
|
||||
(node)->list_prev = lnew; \
|
||||
}
|
||||
|
||||
#define list_remove_node(node) \
|
||||
(node)->list_prev->list_next = (node)->list_next; \
|
||||
(node)->list_next->list_prev = (node)->list_prev; \
|
||||
(node)->list_next = (node)->list_prev = NULL
|
||||
|
||||
void
|
||||
list_create(list_t *list, size_t size, size_t offset)
|
||||
{
|
||||
ASSERT(list);
|
||||
ASSERT(size > 0);
|
||||
ASSERT(size >= offset + sizeof (list_node_t));
|
||||
|
||||
list->list_size = size;
|
||||
list->list_offset = offset;
|
||||
list->list_head.list_next = list->list_head.list_prev =
|
||||
&list->list_head;
|
||||
}
|
||||
|
||||
void
|
||||
list_destroy(list_t *list)
|
||||
{
|
||||
list_node_t *node = &list->list_head;
|
||||
|
||||
ASSERT(list);
|
||||
ASSERT(list->list_head.list_next == node);
|
||||
ASSERT(list->list_head.list_prev == node);
|
||||
|
||||
node->list_next = node->list_prev = NULL;
|
||||
}
|
||||
|
||||
void
|
||||
list_insert_after(list_t *list, void *object, void *nobject)
|
||||
{
|
||||
if (object == NULL) {
|
||||
list_insert_head(list, nobject);
|
||||
} else {
|
||||
list_node_t *lold = list_d2l(list, object);
|
||||
list_insert_after_node(list, lold, nobject);
|
||||
}
|
||||
}
|
||||
|
||||
void
|
||||
list_insert_before(list_t *list, void *object, void *nobject)
|
||||
{
|
||||
if (object == NULL) {
|
||||
list_insert_tail(list, nobject);
|
||||
} else {
|
||||
list_node_t *lold = list_d2l(list, object);
|
||||
list_insert_before_node(list, lold, nobject);
|
||||
}
|
||||
}
|
||||
|
||||
void
|
||||
list_insert_head(list_t *list, void *object)
|
||||
{
|
||||
list_node_t *lold = &list->list_head;
|
||||
list_insert_after_node(list, lold, object);
|
||||
}
|
||||
|
||||
void
|
||||
list_insert_tail(list_t *list, void *object)
|
||||
{
|
||||
list_node_t *lold = &list->list_head;
|
||||
list_insert_before_node(list, lold, object);
|
||||
}
|
||||
|
||||
void
|
||||
list_remove(list_t *list, void *object)
|
||||
{
|
||||
list_node_t *lold = list_d2l(list, object);
|
||||
ASSERT(!list_empty(list));
|
||||
ASSERT(lold->list_next != NULL);
|
||||
list_remove_node(lold);
|
||||
}
|
||||
|
||||
void *
|
||||
list_remove_head(list_t *list)
|
||||
{
|
||||
list_node_t *head = list->list_head.list_next;
|
||||
if (head == &list->list_head)
|
||||
return (NULL);
|
||||
list_remove_node(head);
|
||||
return (list_object(list, head));
|
||||
}
|
||||
|
||||
void *
|
||||
list_remove_tail(list_t *list)
|
||||
{
|
||||
list_node_t *tail = list->list_head.list_prev;
|
||||
if (tail == &list->list_head)
|
||||
return (NULL);
|
||||
list_remove_node(tail);
|
||||
return (list_object(list, tail));
|
||||
}
|
||||
|
||||
void *
|
||||
list_head(list_t *list)
|
||||
{
|
||||
if (list_empty(list))
|
||||
return (NULL);
|
||||
return (list_object(list, list->list_head.list_next));
|
||||
}
|
||||
|
||||
void *
|
||||
list_tail(list_t *list)
|
||||
{
|
||||
if (list_empty(list))
|
||||
return (NULL);
|
||||
return (list_object(list, list->list_head.list_prev));
|
||||
}
|
||||
|
||||
void *
|
||||
list_next(list_t *list, void *object)
|
||||
{
|
||||
list_node_t *node = list_d2l(list, object);
|
||||
|
||||
if (node->list_next != &list->list_head)
|
||||
return (list_object(list, node->list_next));
|
||||
|
||||
return (NULL);
|
||||
}
|
||||
|
||||
void *
|
||||
list_prev(list_t *list, void *object)
|
||||
{
|
||||
list_node_t *node = list_d2l(list, object);
|
||||
|
||||
if (node->list_prev != &list->list_head)
|
||||
return (list_object(list, node->list_prev));
|
||||
|
||||
return (NULL);
|
||||
}
|
||||
|
||||
/*
|
||||
* Insert src list after dst list. Empty src list thereafter.
|
||||
*/
|
||||
void
|
||||
list_move_tail(list_t *dst, list_t *src)
|
||||
{
|
||||
list_node_t *dstnode = &dst->list_head;
|
||||
list_node_t *srcnode = &src->list_head;
|
||||
|
||||
ASSERT(dst->list_size == src->list_size);
|
||||
ASSERT(dst->list_offset == src->list_offset);
|
||||
|
||||
if (list_empty(src))
|
||||
return;
|
||||
|
||||
dstnode->list_prev->list_next = srcnode->list_next;
|
||||
srcnode->list_next->list_prev = dstnode->list_prev;
|
||||
dstnode->list_prev = srcnode->list_prev;
|
||||
srcnode->list_prev->list_next = dstnode;
|
||||
|
||||
/* empty src list */
|
||||
srcnode->list_next = srcnode->list_prev = srcnode;
|
||||
}
|
||||
|
||||
void
|
||||
list_link_replace(list_node_t *lold, list_node_t *lnew)
|
||||
{
|
||||
ASSERT(list_link_active(lold));
|
||||
ASSERT(!list_link_active(lnew));
|
||||
|
||||
lnew->list_next = lold->list_next;
|
||||
lnew->list_prev = lold->list_prev;
|
||||
lold->list_prev->list_next = lnew;
|
||||
lold->list_next->list_prev = lnew;
|
||||
lold->list_next = lold->list_prev = NULL;
|
||||
}
|
||||
|
||||
void
|
||||
list_link_init(list_node_t *link)
|
||||
{
|
||||
link->list_next = NULL;
|
||||
link->list_prev = NULL;
|
||||
}
|
||||
|
||||
int
|
||||
list_link_active(list_node_t *link)
|
||||
{
|
||||
return (link->list_next != NULL);
|
||||
}
|
||||
|
||||
int
|
||||
list_is_empty(list_t *list)
|
||||
{
|
||||
return (list_empty(list));
|
||||
}
|
3297
common/nvpair/nvpair.c
Normal file
3297
common/nvpair/nvpair.c
Normal file
File diff suppressed because it is too large
Load Diff
120
common/nvpair/nvpair_alloc_fixed.c
Normal file
120
common/nvpair/nvpair_alloc_fixed.c
Normal file
@ -0,0 +1,120 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
|
||||
/*
|
||||
* Copyright 2006 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#pragma ident "%Z%%M% %I% %E% SMI"
|
||||
|
||||
#include <sys/stropts.h>
|
||||
#include <sys/isa_defs.h>
|
||||
#include <sys/nvpair.h>
|
||||
#include <sys/sysmacros.h>
|
||||
#if defined(_KERNEL) && !defined(_BOOT)
|
||||
#include <sys/varargs.h>
|
||||
#else
|
||||
#include <stdarg.h>
|
||||
#include <strings.h>
|
||||
#endif
|
||||
|
||||
/*
|
||||
* This allocator is very simple.
|
||||
* - it uses a pre-allocated buffer for memory allocations.
|
||||
* - it does _not_ free memory in the pre-allocated buffer.
|
||||
*
|
||||
* The reason for the selected implemention is simplicity.
|
||||
* This allocator is designed for the usage in interrupt context when
|
||||
* the caller may not wait for free memory.
|
||||
*/
|
||||
|
||||
/* pre-allocated buffer for memory allocations */
|
||||
typedef struct nvbuf {
|
||||
uintptr_t nvb_buf; /* address of pre-allocated buffer */
|
||||
uintptr_t nvb_lim; /* limit address in the buffer */
|
||||
uintptr_t nvb_cur; /* current address in the buffer */
|
||||
} nvbuf_t;
|
||||
|
||||
/*
|
||||
* Initialize the pre-allocated buffer allocator. The caller needs to supply
|
||||
*
|
||||
* buf address of pre-allocated buffer
|
||||
* bufsz size of pre-allocated buffer
|
||||
*
|
||||
* nv_fixed_init() calculates the remaining members of nvbuf_t.
|
||||
*/
|
||||
static int
|
||||
nv_fixed_init(nv_alloc_t *nva, va_list valist)
|
||||
{
|
||||
uintptr_t base = va_arg(valist, uintptr_t);
|
||||
uintptr_t lim = base + va_arg(valist, size_t);
|
||||
nvbuf_t *nvb = (nvbuf_t *)P2ROUNDUP(base, sizeof (uintptr_t));
|
||||
|
||||
if (base == 0 || (uintptr_t)&nvb[1] > lim)
|
||||
return (EINVAL);
|
||||
|
||||
nvb->nvb_buf = (uintptr_t)&nvb[0];
|
||||
nvb->nvb_cur = (uintptr_t)&nvb[1];
|
||||
nvb->nvb_lim = lim;
|
||||
nva->nva_arg = nvb;
|
||||
|
||||
return (0);
|
||||
}
|
||||
|
||||
static void *
|
||||
nv_fixed_alloc(nv_alloc_t *nva, size_t size)
|
||||
{
|
||||
nvbuf_t *nvb = nva->nva_arg;
|
||||
uintptr_t new = nvb->nvb_cur;
|
||||
|
||||
if (size == 0 || new + size > nvb->nvb_lim)
|
||||
return (NULL);
|
||||
|
||||
nvb->nvb_cur = P2ROUNDUP(new + size, sizeof (uintptr_t));
|
||||
|
||||
return ((void *)new);
|
||||
}
|
||||
|
||||
/*ARGSUSED*/
|
||||
static void
|
||||
nv_fixed_free(nv_alloc_t *nva, void *buf, size_t size)
|
||||
{
|
||||
/* don't free memory in the pre-allocated buffer */
|
||||
}
|
||||
|
||||
static void
|
||||
nv_fixed_reset(nv_alloc_t *nva)
|
||||
{
|
||||
nvbuf_t *nvb = nva->nva_arg;
|
||||
|
||||
nvb->nvb_cur = (uintptr_t)&nvb[1];
|
||||
}
|
||||
|
||||
const nv_alloc_ops_t nv_fixed_ops_def = {
|
||||
nv_fixed_init, /* nv_ao_init() */
|
||||
NULL, /* nv_ao_fini() */
|
||||
nv_fixed_alloc, /* nv_ao_alloc() */
|
||||
nv_fixed_free, /* nv_ao_free() */
|
||||
nv_fixed_reset /* nv_ao_reset() */
|
||||
};
|
||||
|
||||
const nv_alloc_ops_t *nv_fixed_ops = &nv_fixed_ops_def;
|
2132
common/unicode/u8_textprep.c
Normal file
2132
common/unicode/u8_textprep.c
Normal file
File diff suppressed because it is too large
Load Diff
202
common/zfs/zfs_comutil.c
Normal file
202
common/zfs/zfs_comutil.c
Normal file
@ -0,0 +1,202 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
/*
|
||||
* This file is intended for functions that ought to be common between user
|
||||
* land (libzfs) and the kernel. When many common routines need to be shared
|
||||
* then a separate file should to be created.
|
||||
*/
|
||||
|
||||
#if defined(_KERNEL)
|
||||
#include <sys/systm.h>
|
||||
#else
|
||||
#include <string.h>
|
||||
#endif
|
||||
|
||||
#include <sys/types.h>
|
||||
#include <sys/fs/zfs.h>
|
||||
#include <sys/int_limits.h>
|
||||
#include <sys/nvpair.h>
|
||||
#include "zfs_comutil.h"
|
||||
|
||||
/*
|
||||
* Are there allocatable vdevs?
|
||||
*/
|
||||
boolean_t
|
||||
zfs_allocatable_devs(nvlist_t *nv)
|
||||
{
|
||||
uint64_t is_log;
|
||||
uint_t c;
|
||||
nvlist_t **child;
|
||||
uint_t children;
|
||||
|
||||
if (nvlist_lookup_nvlist_array(nv, ZPOOL_CONFIG_CHILDREN,
|
||||
&child, &children) != 0) {
|
||||
return (B_FALSE);
|
||||
}
|
||||
for (c = 0; c < children; c++) {
|
||||
is_log = 0;
|
||||
(void) nvlist_lookup_uint64(child[c], ZPOOL_CONFIG_IS_LOG,
|
||||
&is_log);
|
||||
if (!is_log)
|
||||
return (B_TRUE);
|
||||
}
|
||||
return (B_FALSE);
|
||||
}
|
||||
|
||||
void
|
||||
zpool_get_rewind_policy(nvlist_t *nvl, zpool_rewind_policy_t *zrpp)
|
||||
{
|
||||
nvlist_t *policy;
|
||||
nvpair_t *elem;
|
||||
char *nm;
|
||||
|
||||
/* Defaults */
|
||||
zrpp->zrp_request = ZPOOL_NO_REWIND;
|
||||
zrpp->zrp_maxmeta = 0;
|
||||
zrpp->zrp_maxdata = UINT64_MAX;
|
||||
zrpp->zrp_txg = UINT64_MAX;
|
||||
|
||||
if (nvl == NULL)
|
||||
return;
|
||||
|
||||
elem = NULL;
|
||||
while ((elem = nvlist_next_nvpair(nvl, elem)) != NULL) {
|
||||
nm = nvpair_name(elem);
|
||||
if (strcmp(nm, ZPOOL_REWIND_POLICY) == 0) {
|
||||
if (nvpair_value_nvlist(elem, &policy) == 0)
|
||||
zpool_get_rewind_policy(policy, zrpp);
|
||||
return;
|
||||
} else if (strcmp(nm, ZPOOL_REWIND_REQUEST) == 0) {
|
||||
if (nvpair_value_uint32(elem, &zrpp->zrp_request) == 0)
|
||||
if (zrpp->zrp_request & ~ZPOOL_REWIND_POLICIES)
|
||||
zrpp->zrp_request = ZPOOL_NO_REWIND;
|
||||
} else if (strcmp(nm, ZPOOL_REWIND_REQUEST_TXG) == 0) {
|
||||
(void) nvpair_value_uint64(elem, &zrpp->zrp_txg);
|
||||
} else if (strcmp(nm, ZPOOL_REWIND_META_THRESH) == 0) {
|
||||
(void) nvpair_value_uint64(elem, &zrpp->zrp_maxmeta);
|
||||
} else if (strcmp(nm, ZPOOL_REWIND_DATA_THRESH) == 0) {
|
||||
(void) nvpair_value_uint64(elem, &zrpp->zrp_maxdata);
|
||||
}
|
||||
}
|
||||
if (zrpp->zrp_request == 0)
|
||||
zrpp->zrp_request = ZPOOL_NO_REWIND;
|
||||
}
|
||||
|
||||
typedef struct zfs_version_spa_map {
|
||||
int version_zpl;
|
||||
int version_spa;
|
||||
} zfs_version_spa_map_t;
|
||||
|
||||
/*
|
||||
* Keep this table in monotonically increasing version number order.
|
||||
*/
|
||||
static zfs_version_spa_map_t zfs_version_table[] = {
|
||||
{ZPL_VERSION_INITIAL, SPA_VERSION_INITIAL},
|
||||
{ZPL_VERSION_DIRENT_TYPE, SPA_VERSION_INITIAL},
|
||||
{ZPL_VERSION_FUID, SPA_VERSION_FUID},
|
||||
{ZPL_VERSION_USERSPACE, SPA_VERSION_USERSPACE},
|
||||
{ZPL_VERSION_SA, SPA_VERSION_SA},
|
||||
{0, 0}
|
||||
};
|
||||
|
||||
/*
|
||||
* Return the max zpl version for a corresponding spa version
|
||||
* -1 is returned if no mapping exists.
|
||||
*/
|
||||
int
|
||||
zfs_zpl_version_map(int spa_version)
|
||||
{
|
||||
int i;
|
||||
int version = -1;
|
||||
|
||||
for (i = 0; zfs_version_table[i].version_spa; i++) {
|
||||
if (spa_version >= zfs_version_table[i].version_spa)
|
||||
version = zfs_version_table[i].version_zpl;
|
||||
}
|
||||
|
||||
return (version);
|
||||
}
|
||||
|
||||
/*
|
||||
* Return the min spa version for a corresponding spa version
|
||||
* -1 is returned if no mapping exists.
|
||||
*/
|
||||
int
|
||||
zfs_spa_version_map(int zpl_version)
|
||||
{
|
||||
int i;
|
||||
int version = -1;
|
||||
|
||||
for (i = 0; zfs_version_table[i].version_zpl; i++) {
|
||||
if (zfs_version_table[i].version_zpl >= zpl_version)
|
||||
return (zfs_version_table[i].version_spa);
|
||||
}
|
||||
|
||||
return (version);
|
||||
}
|
||||
|
||||
const char *zfs_history_event_names[LOG_END] = {
|
||||
"invalid event",
|
||||
"pool create",
|
||||
"vdev add",
|
||||
"pool remove",
|
||||
"pool destroy",
|
||||
"pool export",
|
||||
"pool import",
|
||||
"vdev attach",
|
||||
"vdev replace",
|
||||
"vdev detach",
|
||||
"vdev online",
|
||||
"vdev offline",
|
||||
"vdev upgrade",
|
||||
"pool clear",
|
||||
"pool scrub",
|
||||
"pool property set",
|
||||
"create",
|
||||
"clone",
|
||||
"destroy",
|
||||
"destroy_begin_sync",
|
||||
"inherit",
|
||||
"property set",
|
||||
"quota set",
|
||||
"permission update",
|
||||
"permission remove",
|
||||
"permission who remove",
|
||||
"promote",
|
||||
"receive",
|
||||
"rename",
|
||||
"reservation set",
|
||||
"replay_inc_sync",
|
||||
"replay_full_sync",
|
||||
"rollback",
|
||||
"snapshot",
|
||||
"filesystem version upgrade",
|
||||
"refquota set",
|
||||
"refreservation set",
|
||||
"pool scrub done",
|
||||
"user hold",
|
||||
"user release",
|
||||
"pool split",
|
||||
};
|
46
common/zfs/zfs_comutil.h
Normal file
46
common/zfs/zfs_comutil.h
Normal file
@ -0,0 +1,46 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _ZFS_COMUTIL_H
|
||||
#define _ZFS_COMUTIL_H
|
||||
|
||||
#include <sys/fs/zfs.h>
|
||||
#include <sys/types.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
extern boolean_t zfs_allocatable_devs(nvlist_t *);
|
||||
extern void zpool_get_rewind_policy(nvlist_t *, zpool_rewind_policy_t *);
|
||||
|
||||
extern int zfs_zpl_version_map(int spa_version);
|
||||
extern int zfs_spa_version_map(int zpl_version);
|
||||
extern const char *zfs_history_event_names[LOG_END];
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _ZFS_COMUTIL_H */
|
237
common/zfs/zfs_deleg.c
Normal file
237
common/zfs/zfs_deleg.c
Normal file
@ -0,0 +1,237 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#if defined(_KERNEL)
|
||||
#include <sys/systm.h>
|
||||
#include <sys/sunddi.h>
|
||||
#include <sys/ctype.h>
|
||||
#else
|
||||
#include <stdio.h>
|
||||
#include <unistd.h>
|
||||
#include <strings.h>
|
||||
#include <libnvpair.h>
|
||||
#include <ctype.h>
|
||||
#endif
|
||||
/* XXX includes zfs_context.h, so why bother with the above? */
|
||||
#include <sys/dsl_deleg.h>
|
||||
#include "zfs_prop.h"
|
||||
#include "zfs_deleg.h"
|
||||
#include "zfs_namecheck.h"
|
||||
|
||||
/*
|
||||
* permission table
|
||||
*
|
||||
* Keep this table in sorted order
|
||||
*
|
||||
* This table is used for displaying all permissions for
|
||||
* zfs allow
|
||||
*/
|
||||
|
||||
zfs_deleg_perm_tab_t zfs_deleg_perm_tab[] = {
|
||||
{ZFS_DELEG_PERM_ALLOW, ZFS_DELEG_NOTE_ALLOW},
|
||||
{ZFS_DELEG_PERM_CLONE, ZFS_DELEG_NOTE_CLONE },
|
||||
{ZFS_DELEG_PERM_CREATE, ZFS_DELEG_NOTE_CREATE },
|
||||
{ZFS_DELEG_PERM_DESTROY, ZFS_DELEG_NOTE_DESTROY },
|
||||
{ZFS_DELEG_PERM_MOUNT, ZFS_DELEG_NOTE_MOUNT },
|
||||
{ZFS_DELEG_PERM_PROMOTE, ZFS_DELEG_NOTE_PROMOTE },
|
||||
{ZFS_DELEG_PERM_RECEIVE, ZFS_DELEG_NOTE_RECEIVE },
|
||||
{ZFS_DELEG_PERM_RENAME, ZFS_DELEG_NOTE_RENAME },
|
||||
{ZFS_DELEG_PERM_ROLLBACK, ZFS_DELEG_NOTE_ROLLBACK },
|
||||
{ZFS_DELEG_PERM_SNAPSHOT, ZFS_DELEG_NOTE_SNAPSHOT },
|
||||
{ZFS_DELEG_PERM_SHARE, ZFS_DELEG_NOTE_SHARE },
|
||||
{ZFS_DELEG_PERM_SEND, ZFS_DELEG_NOTE_NONE },
|
||||
{ZFS_DELEG_PERM_USERPROP, ZFS_DELEG_NOTE_USERPROP },
|
||||
{ZFS_DELEG_PERM_USERQUOTA, ZFS_DELEG_NOTE_USERQUOTA },
|
||||
{ZFS_DELEG_PERM_GROUPQUOTA, ZFS_DELEG_NOTE_GROUPQUOTA },
|
||||
{ZFS_DELEG_PERM_USERUSED, ZFS_DELEG_NOTE_USERUSED },
|
||||
{ZFS_DELEG_PERM_GROUPUSED, ZFS_DELEG_NOTE_GROUPUSED },
|
||||
{ZFS_DELEG_PERM_HOLD, ZFS_DELEG_NOTE_HOLD },
|
||||
{ZFS_DELEG_PERM_RELEASE, ZFS_DELEG_NOTE_RELEASE },
|
||||
{ZFS_DELEG_PERM_DIFF, ZFS_DELEG_NOTE_DIFF},
|
||||
{NULL, ZFS_DELEG_NOTE_NONE }
|
||||
};
|
||||
|
||||
static int
|
||||
zfs_valid_permission_name(const char *perm)
|
||||
{
|
||||
if (zfs_deleg_canonicalize_perm(perm))
|
||||
return (0);
|
||||
|
||||
return (permset_namecheck(perm, NULL, NULL));
|
||||
}
|
||||
|
||||
const char *
|
||||
zfs_deleg_canonicalize_perm(const char *perm)
|
||||
{
|
||||
int i;
|
||||
zfs_prop_t prop;
|
||||
|
||||
for (i = 0; zfs_deleg_perm_tab[i].z_perm != NULL; i++) {
|
||||
if (strcmp(perm, zfs_deleg_perm_tab[i].z_perm) == 0)
|
||||
return (perm);
|
||||
}
|
||||
|
||||
prop = zfs_name_to_prop(perm);
|
||||
if (prop != ZPROP_INVAL && zfs_prop_delegatable(prop))
|
||||
return (zfs_prop_to_name(prop));
|
||||
return (NULL);
|
||||
|
||||
}
|
||||
|
||||
static int
|
||||
zfs_validate_who(char *who)
|
||||
{
|
||||
char *p;
|
||||
|
||||
if (who[2] != ZFS_DELEG_FIELD_SEP_CHR)
|
||||
return (-1);
|
||||
|
||||
switch (who[0]) {
|
||||
case ZFS_DELEG_USER:
|
||||
case ZFS_DELEG_GROUP:
|
||||
case ZFS_DELEG_USER_SETS:
|
||||
case ZFS_DELEG_GROUP_SETS:
|
||||
if (who[1] != ZFS_DELEG_LOCAL && who[1] != ZFS_DELEG_DESCENDENT)
|
||||
return (-1);
|
||||
for (p = &who[3]; *p; p++)
|
||||
if (!isdigit(*p))
|
||||
return (-1);
|
||||
break;
|
||||
|
||||
case ZFS_DELEG_NAMED_SET:
|
||||
case ZFS_DELEG_NAMED_SET_SETS:
|
||||
if (who[1] != ZFS_DELEG_NA)
|
||||
return (-1);
|
||||
return (permset_namecheck(&who[3], NULL, NULL));
|
||||
|
||||
case ZFS_DELEG_CREATE:
|
||||
case ZFS_DELEG_CREATE_SETS:
|
||||
if (who[1] != ZFS_DELEG_NA)
|
||||
return (-1);
|
||||
if (who[3] != '\0')
|
||||
return (-1);
|
||||
break;
|
||||
|
||||
case ZFS_DELEG_EVERYONE:
|
||||
case ZFS_DELEG_EVERYONE_SETS:
|
||||
if (who[1] != ZFS_DELEG_LOCAL && who[1] != ZFS_DELEG_DESCENDENT)
|
||||
return (-1);
|
||||
if (who[3] != '\0')
|
||||
return (-1);
|
||||
break;
|
||||
|
||||
default:
|
||||
return (-1);
|
||||
}
|
||||
|
||||
return (0);
|
||||
}
|
||||
|
||||
int
|
||||
zfs_deleg_verify_nvlist(nvlist_t *nvp)
|
||||
{
|
||||
nvpair_t *who, *perm_name;
|
||||
nvlist_t *perms;
|
||||
int error;
|
||||
|
||||
if (nvp == NULL)
|
||||
return (-1);
|
||||
|
||||
who = nvlist_next_nvpair(nvp, NULL);
|
||||
if (who == NULL)
|
||||
return (-1);
|
||||
|
||||
do {
|
||||
if (zfs_validate_who(nvpair_name(who)))
|
||||
return (-1);
|
||||
|
||||
error = nvlist_lookup_nvlist(nvp, nvpair_name(who), &perms);
|
||||
|
||||
if (error && error != ENOENT)
|
||||
return (-1);
|
||||
if (error == ENOENT)
|
||||
continue;
|
||||
|
||||
perm_name = nvlist_next_nvpair(perms, NULL);
|
||||
if (perm_name == NULL) {
|
||||
return (-1);
|
||||
}
|
||||
do {
|
||||
error = zfs_valid_permission_name(
|
||||
nvpair_name(perm_name));
|
||||
if (error)
|
||||
return (-1);
|
||||
} while (perm_name = nvlist_next_nvpair(perms, perm_name));
|
||||
} while (who = nvlist_next_nvpair(nvp, who));
|
||||
return (0);
|
||||
}
|
||||
|
||||
/*
|
||||
* Construct the base attribute name. The base attribute names
|
||||
* are the "key" to locate the jump objects which contain the actual
|
||||
* permissions. The base attribute names are encoded based on
|
||||
* type of entry and whether it is a local or descendent permission.
|
||||
*
|
||||
* Arguments:
|
||||
* attr - attribute name return string, attribute is assumed to be
|
||||
* ZFS_MAX_DELEG_NAME long.
|
||||
* type - type of entry to construct
|
||||
* inheritchr - inheritance type (local,descendent, or NA for create and
|
||||
* permission set definitions
|
||||
* data - is either a permission set name or a 64 bit uid/gid.
|
||||
*/
|
||||
void
|
||||
zfs_deleg_whokey(char *attr, zfs_deleg_who_type_t type,
|
||||
char inheritchr, void *data)
|
||||
{
|
||||
int len = ZFS_MAX_DELEG_NAME;
|
||||
uint64_t *id = data;
|
||||
|
||||
switch (type) {
|
||||
case ZFS_DELEG_USER:
|
||||
case ZFS_DELEG_GROUP:
|
||||
case ZFS_DELEG_USER_SETS:
|
||||
case ZFS_DELEG_GROUP_SETS:
|
||||
(void) snprintf(attr, len, "%c%c%c%lld", type, inheritchr,
|
||||
ZFS_DELEG_FIELD_SEP_CHR, (longlong_t)*id);
|
||||
break;
|
||||
case ZFS_DELEG_NAMED_SET_SETS:
|
||||
case ZFS_DELEG_NAMED_SET:
|
||||
(void) snprintf(attr, len, "%c-%c%s", type,
|
||||
ZFS_DELEG_FIELD_SEP_CHR, (char *)data);
|
||||
break;
|
||||
case ZFS_DELEG_CREATE:
|
||||
case ZFS_DELEG_CREATE_SETS:
|
||||
(void) snprintf(attr, len, "%c-%c", type,
|
||||
ZFS_DELEG_FIELD_SEP_CHR);
|
||||
break;
|
||||
case ZFS_DELEG_EVERYONE:
|
||||
case ZFS_DELEG_EVERYONE_SETS:
|
||||
(void) snprintf(attr, len, "%c%c%c", type, inheritchr,
|
||||
ZFS_DELEG_FIELD_SEP_CHR);
|
||||
break;
|
||||
default:
|
||||
ASSERT(!"bad zfs_deleg_who_type_t");
|
||||
}
|
||||
}
|
85
common/zfs/zfs_deleg.h
Normal file
85
common/zfs/zfs_deleg.h
Normal file
@ -0,0 +1,85 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _ZFS_DELEG_H
|
||||
#define _ZFS_DELEG_H
|
||||
|
||||
#include <sys/fs/zfs.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
#define ZFS_DELEG_SET_NAME_CHR '@' /* set name lead char */
|
||||
#define ZFS_DELEG_FIELD_SEP_CHR '$' /* field separator */
|
||||
|
||||
/*
|
||||
* Max name length for a delegation attribute
|
||||
*/
|
||||
#define ZFS_MAX_DELEG_NAME 128
|
||||
|
||||
#define ZFS_DELEG_LOCAL 'l'
|
||||
#define ZFS_DELEG_DESCENDENT 'd'
|
||||
#define ZFS_DELEG_NA '-'
|
||||
|
||||
typedef enum {
|
||||
ZFS_DELEG_NOTE_CREATE,
|
||||
ZFS_DELEG_NOTE_DESTROY,
|
||||
ZFS_DELEG_NOTE_SNAPSHOT,
|
||||
ZFS_DELEG_NOTE_ROLLBACK,
|
||||
ZFS_DELEG_NOTE_CLONE,
|
||||
ZFS_DELEG_NOTE_PROMOTE,
|
||||
ZFS_DELEG_NOTE_RENAME,
|
||||
ZFS_DELEG_NOTE_RECEIVE,
|
||||
ZFS_DELEG_NOTE_ALLOW,
|
||||
ZFS_DELEG_NOTE_USERPROP,
|
||||
ZFS_DELEG_NOTE_MOUNT,
|
||||
ZFS_DELEG_NOTE_SHARE,
|
||||
ZFS_DELEG_NOTE_USERQUOTA,
|
||||
ZFS_DELEG_NOTE_GROUPQUOTA,
|
||||
ZFS_DELEG_NOTE_USERUSED,
|
||||
ZFS_DELEG_NOTE_GROUPUSED,
|
||||
ZFS_DELEG_NOTE_HOLD,
|
||||
ZFS_DELEG_NOTE_RELEASE,
|
||||
ZFS_DELEG_NOTE_DIFF,
|
||||
ZFS_DELEG_NOTE_NONE
|
||||
} zfs_deleg_note_t;
|
||||
|
||||
typedef struct zfs_deleg_perm_tab {
|
||||
char *z_perm;
|
||||
zfs_deleg_note_t z_note;
|
||||
} zfs_deleg_perm_tab_t;
|
||||
|
||||
extern zfs_deleg_perm_tab_t zfs_deleg_perm_tab[];
|
||||
|
||||
int zfs_deleg_verify_nvlist(nvlist_t *nvlist);
|
||||
void zfs_deleg_whokey(char *attr, zfs_deleg_who_type_t type,
|
||||
char checkflag, void *data);
|
||||
const char *zfs_deleg_canonicalize_perm(const char *perm);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _ZFS_DELEG_H */
|
246
common/zfs/zfs_fletcher.c
Normal file
246
common/zfs/zfs_fletcher.c
Normal file
@ -0,0 +1,246 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
/*
|
||||
* Fletcher Checksums
|
||||
* ------------------
|
||||
*
|
||||
* ZFS's 2nd and 4th order Fletcher checksums are defined by the following
|
||||
* recurrence relations:
|
||||
*
|
||||
* a = a + f
|
||||
* i i-1 i-1
|
||||
*
|
||||
* b = b + a
|
||||
* i i-1 i
|
||||
*
|
||||
* c = c + b (fletcher-4 only)
|
||||
* i i-1 i
|
||||
*
|
||||
* d = d + c (fletcher-4 only)
|
||||
* i i-1 i
|
||||
*
|
||||
* Where
|
||||
* a_0 = b_0 = c_0 = d_0 = 0
|
||||
* and
|
||||
* f_0 .. f_(n-1) are the input data.
|
||||
*
|
||||
* Using standard techniques, these translate into the following series:
|
||||
*
|
||||
* __n_ __n_
|
||||
* \ | \ |
|
||||
* a = > f b = > i * f
|
||||
* n /___| n - i n /___| n - i
|
||||
* i = 1 i = 1
|
||||
*
|
||||
*
|
||||
* __n_ __n_
|
||||
* \ | i*(i+1) \ | i*(i+1)*(i+2)
|
||||
* c = > ------- f d = > ------------- f
|
||||
* n /___| 2 n - i n /___| 6 n - i
|
||||
* i = 1 i = 1
|
||||
*
|
||||
* For fletcher-2, the f_is are 64-bit, and [ab]_i are 64-bit accumulators.
|
||||
* Since the additions are done mod (2^64), errors in the high bits may not
|
||||
* be noticed. For this reason, fletcher-2 is deprecated.
|
||||
*
|
||||
* For fletcher-4, the f_is are 32-bit, and [abcd]_i are 64-bit accumulators.
|
||||
* A conservative estimate of how big the buffer can get before we overflow
|
||||
* can be estimated using f_i = 0xffffffff for all i:
|
||||
*
|
||||
* % bc
|
||||
* f=2^32-1;d=0; for (i = 1; d<2^64; i++) { d += f*i*(i+1)*(i+2)/6 }; (i-1)*4
|
||||
* 2264
|
||||
* quit
|
||||
* %
|
||||
*
|
||||
* So blocks of up to 2k will not overflow. Our largest block size is
|
||||
* 128k, which has 32k 4-byte words, so we can compute the largest possible
|
||||
* accumulators, then divide by 2^64 to figure the max amount of overflow:
|
||||
*
|
||||
* % bc
|
||||
* a=b=c=d=0; f=2^32-1; for (i=1; i<=32*1024; i++) { a+=f; b+=a; c+=b; d+=c }
|
||||
* a/2^64;b/2^64;c/2^64;d/2^64
|
||||
* 0
|
||||
* 0
|
||||
* 1365
|
||||
* 11186858
|
||||
* quit
|
||||
* %
|
||||
*
|
||||
* So a and b cannot overflow. To make sure each bit of input has some
|
||||
* effect on the contents of c and d, we can look at what the factors of
|
||||
* the coefficients in the equations for c_n and d_n are. The number of 2s
|
||||
* in the factors determines the lowest set bit in the multiplier. Running
|
||||
* through the cases for n*(n+1)/2 reveals that the highest power of 2 is
|
||||
* 2^14, and for n*(n+1)*(n+2)/6 it is 2^15. So while some data may overflow
|
||||
* the 64-bit accumulators, every bit of every f_i effects every accumulator,
|
||||
* even for 128k blocks.
|
||||
*
|
||||
* If we wanted to make a stronger version of fletcher4 (fletcher4c?),
|
||||
* we could do our calculations mod (2^32 - 1) by adding in the carries
|
||||
* periodically, and store the number of carries in the top 32-bits.
|
||||
*
|
||||
* --------------------
|
||||
* Checksum Performance
|
||||
* --------------------
|
||||
*
|
||||
* There are two interesting components to checksum performance: cached and
|
||||
* uncached performance. With cached data, fletcher-2 is about four times
|
||||
* faster than fletcher-4. With uncached data, the performance difference is
|
||||
* negligible, since the cost of a cache fill dominates the processing time.
|
||||
* Even though fletcher-4 is slower than fletcher-2, it is still a pretty
|
||||
* efficient pass over the data.
|
||||
*
|
||||
* In normal operation, the data which is being checksummed is in a buffer
|
||||
* which has been filled either by:
|
||||
*
|
||||
* 1. a compression step, which will be mostly cached, or
|
||||
* 2. a bcopy() or copyin(), which will be uncached (because the
|
||||
* copy is cache-bypassing).
|
||||
*
|
||||
* For both cached and uncached data, both fletcher checksums are much faster
|
||||
* than sha-256, and slower than 'off', which doesn't touch the data at all.
|
||||
*/
|
||||
|
||||
#include <sys/types.h>
|
||||
#include <sys/sysmacros.h>
|
||||
#include <sys/byteorder.h>
|
||||
#include <sys/zio.h>
|
||||
#include <sys/spa.h>
|
||||
|
||||
void
|
||||
fletcher_2_native(const void *buf, uint64_t size, zio_cksum_t *zcp)
|
||||
{
|
||||
const uint64_t *ip = buf;
|
||||
const uint64_t *ipend = ip + (size / sizeof (uint64_t));
|
||||
uint64_t a0, b0, a1, b1;
|
||||
|
||||
for (a0 = b0 = a1 = b1 = 0; ip < ipend; ip += 2) {
|
||||
a0 += ip[0];
|
||||
a1 += ip[1];
|
||||
b0 += a0;
|
||||
b1 += a1;
|
||||
}
|
||||
|
||||
ZIO_SET_CHECKSUM(zcp, a0, a1, b0, b1);
|
||||
}
|
||||
|
||||
void
|
||||
fletcher_2_byteswap(const void *buf, uint64_t size, zio_cksum_t *zcp)
|
||||
{
|
||||
const uint64_t *ip = buf;
|
||||
const uint64_t *ipend = ip + (size / sizeof (uint64_t));
|
||||
uint64_t a0, b0, a1, b1;
|
||||
|
||||
for (a0 = b0 = a1 = b1 = 0; ip < ipend; ip += 2) {
|
||||
a0 += BSWAP_64(ip[0]);
|
||||
a1 += BSWAP_64(ip[1]);
|
||||
b0 += a0;
|
||||
b1 += a1;
|
||||
}
|
||||
|
||||
ZIO_SET_CHECKSUM(zcp, a0, a1, b0, b1);
|
||||
}
|
||||
|
||||
void
|
||||
fletcher_4_native(const void *buf, uint64_t size, zio_cksum_t *zcp)
|
||||
{
|
||||
const uint32_t *ip = buf;
|
||||
const uint32_t *ipend = ip + (size / sizeof (uint32_t));
|
||||
uint64_t a, b, c, d;
|
||||
|
||||
for (a = b = c = d = 0; ip < ipend; ip++) {
|
||||
a += ip[0];
|
||||
b += a;
|
||||
c += b;
|
||||
d += c;
|
||||
}
|
||||
|
||||
ZIO_SET_CHECKSUM(zcp, a, b, c, d);
|
||||
}
|
||||
|
||||
void
|
||||
fletcher_4_byteswap(const void *buf, uint64_t size, zio_cksum_t *zcp)
|
||||
{
|
||||
const uint32_t *ip = buf;
|
||||
const uint32_t *ipend = ip + (size / sizeof (uint32_t));
|
||||
uint64_t a, b, c, d;
|
||||
|
||||
for (a = b = c = d = 0; ip < ipend; ip++) {
|
||||
a += BSWAP_32(ip[0]);
|
||||
b += a;
|
||||
c += b;
|
||||
d += c;
|
||||
}
|
||||
|
||||
ZIO_SET_CHECKSUM(zcp, a, b, c, d);
|
||||
}
|
||||
|
||||
void
|
||||
fletcher_4_incremental_native(const void *buf, uint64_t size,
|
||||
zio_cksum_t *zcp)
|
||||
{
|
||||
const uint32_t *ip = buf;
|
||||
const uint32_t *ipend = ip + (size / sizeof (uint32_t));
|
||||
uint64_t a, b, c, d;
|
||||
|
||||
a = zcp->zc_word[0];
|
||||
b = zcp->zc_word[1];
|
||||
c = zcp->zc_word[2];
|
||||
d = zcp->zc_word[3];
|
||||
|
||||
for (; ip < ipend; ip++) {
|
||||
a += ip[0];
|
||||
b += a;
|
||||
c += b;
|
||||
d += c;
|
||||
}
|
||||
|
||||
ZIO_SET_CHECKSUM(zcp, a, b, c, d);
|
||||
}
|
||||
|
||||
void
|
||||
fletcher_4_incremental_byteswap(const void *buf, uint64_t size,
|
||||
zio_cksum_t *zcp)
|
||||
{
|
||||
const uint32_t *ip = buf;
|
||||
const uint32_t *ipend = ip + (size / sizeof (uint32_t));
|
||||
uint64_t a, b, c, d;
|
||||
|
||||
a = zcp->zc_word[0];
|
||||
b = zcp->zc_word[1];
|
||||
c = zcp->zc_word[2];
|
||||
d = zcp->zc_word[3];
|
||||
|
||||
for (; ip < ipend; ip++) {
|
||||
a += BSWAP_32(ip[0]);
|
||||
b += a;
|
||||
c += b;
|
||||
d += c;
|
||||
}
|
||||
|
||||
ZIO_SET_CHECKSUM(zcp, a, b, c, d);
|
||||
}
|
53
common/zfs/zfs_fletcher.h
Normal file
53
common/zfs/zfs_fletcher.h
Normal file
@ -0,0 +1,53 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _ZFS_FLETCHER_H
|
||||
#define _ZFS_FLETCHER_H
|
||||
|
||||
#include <sys/types.h>
|
||||
#include <sys/spa.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
/*
|
||||
* fletcher checksum functions
|
||||
*/
|
||||
|
||||
void fletcher_2_native(const void *, uint64_t, zio_cksum_t *);
|
||||
void fletcher_2_byteswap(const void *, uint64_t, zio_cksum_t *);
|
||||
void fletcher_4_native(const void *, uint64_t, zio_cksum_t *);
|
||||
void fletcher_4_byteswap(const void *, uint64_t, zio_cksum_t *);
|
||||
void fletcher_4_incremental_native(const void *, uint64_t,
|
||||
zio_cksum_t *);
|
||||
void fletcher_4_incremental_byteswap(const void *, uint64_t,
|
||||
zio_cksum_t *);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _ZFS_FLETCHER_H */
|
345
common/zfs/zfs_namecheck.c
Normal file
345
common/zfs/zfs_namecheck.c
Normal file
@ -0,0 +1,345 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
/*
|
||||
* Common name validation routines for ZFS. These routines are shared by the
|
||||
* userland code as well as the ioctl() layer to ensure that we don't
|
||||
* inadvertently expose a hole through direct ioctl()s that never gets tested.
|
||||
* In userland, however, we want significantly more information about _why_ the
|
||||
* name is invalid. In the kernel, we only care whether it's valid or not.
|
||||
* Each routine therefore takes a 'namecheck_err_t' which describes exactly why
|
||||
* the name failed to validate.
|
||||
*
|
||||
* Each function returns 0 on success, -1 on error.
|
||||
*/
|
||||
|
||||
#if defined(_KERNEL)
|
||||
#include <sys/systm.h>
|
||||
#else
|
||||
#include <string.h>
|
||||
#endif
|
||||
|
||||
#include <sys/param.h>
|
||||
#include <sys/nvpair.h>
|
||||
#include "zfs_namecheck.h"
|
||||
#include "zfs_deleg.h"
|
||||
|
||||
static int
|
||||
valid_char(char c)
|
||||
{
|
||||
return ((c >= 'a' && c <= 'z') ||
|
||||
(c >= 'A' && c <= 'Z') ||
|
||||
(c >= '0' && c <= '9') ||
|
||||
c == '-' || c == '_' || c == '.' || c == ':' || c == ' ');
|
||||
}
|
||||
|
||||
/*
|
||||
* Snapshot names must be made up of alphanumeric characters plus the following
|
||||
* characters:
|
||||
*
|
||||
* [-_.: ]
|
||||
*/
|
||||
int
|
||||
snapshot_namecheck(const char *path, namecheck_err_t *why, char *what)
|
||||
{
|
||||
const char *loc;
|
||||
|
||||
if (strlen(path) >= MAXNAMELEN) {
|
||||
if (why)
|
||||
*why = NAME_ERR_TOOLONG;
|
||||
return (-1);
|
||||
}
|
||||
|
||||
if (path[0] == '\0') {
|
||||
if (why)
|
||||
*why = NAME_ERR_EMPTY_COMPONENT;
|
||||
return (-1);
|
||||
}
|
||||
|
||||
for (loc = path; *loc; loc++) {
|
||||
if (!valid_char(*loc)) {
|
||||
if (why) {
|
||||
*why = NAME_ERR_INVALCHAR;
|
||||
*what = *loc;
|
||||
}
|
||||
return (-1);
|
||||
}
|
||||
}
|
||||
return (0);
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* Permissions set name must start with the letter '@' followed by the
|
||||
* same character restrictions as snapshot names, except that the name
|
||||
* cannot exceed 64 characters.
|
||||
*/
|
||||
int
|
||||
permset_namecheck(const char *path, namecheck_err_t *why, char *what)
|
||||
{
|
||||
if (strlen(path) >= ZFS_PERMSET_MAXLEN) {
|
||||
if (why)
|
||||
*why = NAME_ERR_TOOLONG;
|
||||
return (-1);
|
||||
}
|
||||
|
||||
if (path[0] != '@') {
|
||||
if (why) {
|
||||
*why = NAME_ERR_NO_AT;
|
||||
*what = path[0];
|
||||
}
|
||||
return (-1);
|
||||
}
|
||||
|
||||
return (snapshot_namecheck(&path[1], why, what));
|
||||
}
|
||||
|
||||
/*
|
||||
* Dataset names must be of the following form:
|
||||
*
|
||||
* [component][/]*[component][@component]
|
||||
*
|
||||
* Where each component is made up of alphanumeric characters plus the following
|
||||
* characters:
|
||||
*
|
||||
* [-_.:%]
|
||||
*
|
||||
* We allow '%' here as we use that character internally to create unique
|
||||
* names for temporary clones (for online recv).
|
||||
*/
|
||||
int
|
||||
dataset_namecheck(const char *path, namecheck_err_t *why, char *what)
|
||||
{
|
||||
const char *loc, *end;
|
||||
int found_snapshot;
|
||||
|
||||
/*
|
||||
* Make sure the name is not too long.
|
||||
*
|
||||
* ZFS_MAXNAMELEN is the maximum dataset length used in the userland
|
||||
* which is the same as MAXNAMELEN used in the kernel.
|
||||
* If ZFS_MAXNAMELEN value is changed, make sure to cleanup all
|
||||
* places using MAXNAMELEN.
|
||||
*/
|
||||
|
||||
if (strlen(path) >= MAXNAMELEN) {
|
||||
if (why)
|
||||
*why = NAME_ERR_TOOLONG;
|
||||
return (-1);
|
||||
}
|
||||
|
||||
/* Explicitly check for a leading slash. */
|
||||
if (path[0] == '/') {
|
||||
if (why)
|
||||
*why = NAME_ERR_LEADING_SLASH;
|
||||
return (-1);
|
||||
}
|
||||
|
||||
if (path[0] == '\0') {
|
||||
if (why)
|
||||
*why = NAME_ERR_EMPTY_COMPONENT;
|
||||
return (-1);
|
||||
}
|
||||
|
||||
loc = path;
|
||||
found_snapshot = 0;
|
||||
for (;;) {
|
||||
/* Find the end of this component */
|
||||
end = loc;
|
||||
while (*end != '/' && *end != '@' && *end != '\0')
|
||||
end++;
|
||||
|
||||
if (*end == '\0' && end[-1] == '/') {
|
||||
/* trailing slashes are not allowed */
|
||||
if (why)
|
||||
*why = NAME_ERR_TRAILING_SLASH;
|
||||
return (-1);
|
||||
}
|
||||
|
||||
/* Zero-length components are not allowed */
|
||||
if (loc == end) {
|
||||
if (why) {
|
||||
/*
|
||||
* Make sure this is really a zero-length
|
||||
* component and not a '@@'.
|
||||
*/
|
||||
if (*end == '@' && found_snapshot) {
|
||||
*why = NAME_ERR_MULTIPLE_AT;
|
||||
} else {
|
||||
*why = NAME_ERR_EMPTY_COMPONENT;
|
||||
}
|
||||
}
|
||||
|
||||
return (-1);
|
||||
}
|
||||
|
||||
/* Validate the contents of this component */
|
||||
while (loc != end) {
|
||||
if (!valid_char(*loc) && *loc != '%') {
|
||||
if (why) {
|
||||
*why = NAME_ERR_INVALCHAR;
|
||||
*what = *loc;
|
||||
}
|
||||
return (-1);
|
||||
}
|
||||
loc++;
|
||||
}
|
||||
|
||||
/* If we've reached the end of the string, we're OK */
|
||||
if (*end == '\0')
|
||||
return (0);
|
||||
|
||||
if (*end == '@') {
|
||||
/*
|
||||
* If we've found an @ symbol, indicate that we're in
|
||||
* the snapshot component, and report a second '@'
|
||||
* character as an error.
|
||||
*/
|
||||
if (found_snapshot) {
|
||||
if (why)
|
||||
*why = NAME_ERR_MULTIPLE_AT;
|
||||
return (-1);
|
||||
}
|
||||
|
||||
found_snapshot = 1;
|
||||
}
|
||||
|
||||
/*
|
||||
* If there is a '/' in a snapshot name
|
||||
* then report an error
|
||||
*/
|
||||
if (*end == '/' && found_snapshot) {
|
||||
if (why)
|
||||
*why = NAME_ERR_TRAILING_SLASH;
|
||||
return (-1);
|
||||
}
|
||||
|
||||
/* Update to the next component */
|
||||
loc = end + 1;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* mountpoint names must be of the following form:
|
||||
*
|
||||
* /[component][/]*[component][/]
|
||||
*/
|
||||
int
|
||||
mountpoint_namecheck(const char *path, namecheck_err_t *why)
|
||||
{
|
||||
const char *start, *end;
|
||||
|
||||
/*
|
||||
* Make sure none of the mountpoint component names are too long.
|
||||
* If a component name is too long then the mkdir of the mountpoint
|
||||
* will fail but then the mountpoint property will be set to a value
|
||||
* that can never be mounted. Better to fail before setting the prop.
|
||||
* Extra slashes are OK, they will be tossed by the mountpoint mkdir.
|
||||
*/
|
||||
|
||||
if (path == NULL || *path != '/') {
|
||||
if (why)
|
||||
*why = NAME_ERR_LEADING_SLASH;
|
||||
return (-1);
|
||||
}
|
||||
|
||||
/* Skip leading slash */
|
||||
start = &path[1];
|
||||
do {
|
||||
end = start;
|
||||
while (*end != '/' && *end != '\0')
|
||||
end++;
|
||||
|
||||
if (end - start >= MAXNAMELEN) {
|
||||
if (why)
|
||||
*why = NAME_ERR_TOOLONG;
|
||||
return (-1);
|
||||
}
|
||||
start = end + 1;
|
||||
|
||||
} while (*end != '\0');
|
||||
|
||||
return (0);
|
||||
}
|
||||
|
||||
/*
|
||||
* For pool names, we have the same set of valid characters as described in
|
||||
* dataset names, with the additional restriction that the pool name must begin
|
||||
* with a letter. The pool names 'raidz' and 'mirror' are also reserved names
|
||||
* that cannot be used.
|
||||
*/
|
||||
int
|
||||
pool_namecheck(const char *pool, namecheck_err_t *why, char *what)
|
||||
{
|
||||
const char *c;
|
||||
|
||||
/*
|
||||
* Make sure the name is not too long.
|
||||
*
|
||||
* ZPOOL_MAXNAMELEN is the maximum pool length used in the userland
|
||||
* which is the same as MAXNAMELEN used in the kernel.
|
||||
* If ZPOOL_MAXNAMELEN value is changed, make sure to cleanup all
|
||||
* places using MAXNAMELEN.
|
||||
*/
|
||||
if (strlen(pool) >= MAXNAMELEN) {
|
||||
if (why)
|
||||
*why = NAME_ERR_TOOLONG;
|
||||
return (-1);
|
||||
}
|
||||
|
||||
c = pool;
|
||||
while (*c != '\0') {
|
||||
if (!valid_char(*c)) {
|
||||
if (why) {
|
||||
*why = NAME_ERR_INVALCHAR;
|
||||
*what = *c;
|
||||
}
|
||||
return (-1);
|
||||
}
|
||||
c++;
|
||||
}
|
||||
|
||||
if (!(*pool >= 'a' && *pool <= 'z') &&
|
||||
!(*pool >= 'A' && *pool <= 'Z')) {
|
||||
if (why)
|
||||
*why = NAME_ERR_NOLETTER;
|
||||
return (-1);
|
||||
}
|
||||
|
||||
if (strcmp(pool, "mirror") == 0 || strcmp(pool, "raidz") == 0) {
|
||||
if (why)
|
||||
*why = NAME_ERR_RESERVED;
|
||||
return (-1);
|
||||
}
|
||||
|
||||
if (pool[0] == 'c' && (pool[1] >= '0' && pool[1] <= '9')) {
|
||||
if (why)
|
||||
*why = NAME_ERR_DISKLIKE;
|
||||
return (-1);
|
||||
}
|
||||
|
||||
return (0);
|
||||
}
|
58
common/zfs/zfs_namecheck.h
Normal file
58
common/zfs/zfs_namecheck.h
Normal file
@ -0,0 +1,58 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _ZFS_NAMECHECK_H
|
||||
#define _ZFS_NAMECHECK_H
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
typedef enum {
|
||||
NAME_ERR_LEADING_SLASH, /* name begins with leading slash */
|
||||
NAME_ERR_EMPTY_COMPONENT, /* name contains an empty component */
|
||||
NAME_ERR_TRAILING_SLASH, /* name ends with a slash */
|
||||
NAME_ERR_INVALCHAR, /* invalid character found */
|
||||
NAME_ERR_MULTIPLE_AT, /* multiple '@' characters found */
|
||||
NAME_ERR_NOLETTER, /* pool doesn't begin with a letter */
|
||||
NAME_ERR_RESERVED, /* entire name is reserved */
|
||||
NAME_ERR_DISKLIKE, /* reserved disk name (c[0-9].*) */
|
||||
NAME_ERR_TOOLONG, /* name is too long */
|
||||
NAME_ERR_NO_AT, /* permission set is missing '@' */
|
||||
} namecheck_err_t;
|
||||
|
||||
#define ZFS_PERMSET_MAXLEN 64
|
||||
|
||||
int pool_namecheck(const char *, namecheck_err_t *, char *);
|
||||
int dataset_namecheck(const char *, namecheck_err_t *, char *);
|
||||
int mountpoint_namecheck(const char *, namecheck_err_t *);
|
||||
int snapshot_namecheck(const char *, namecheck_err_t *, char *);
|
||||
int permset_namecheck(const char *, namecheck_err_t *, char *);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _ZFS_NAMECHECK_H */
|
595
common/zfs/zfs_prop.c
Normal file
595
common/zfs/zfs_prop.c
Normal file
@ -0,0 +1,595 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
/* Portions Copyright 2010 Robert Milkowski */
|
||||
|
||||
#include <sys/zio.h>
|
||||
#include <sys/spa.h>
|
||||
#include <sys/u8_textprep.h>
|
||||
#include <sys/zfs_acl.h>
|
||||
#include <sys/zfs_ioctl.h>
|
||||
#include <sys/zfs_znode.h>
|
||||
|
||||
#include "zfs_prop.h"
|
||||
#include "zfs_deleg.h"
|
||||
|
||||
#if defined(_KERNEL)
|
||||
#include <sys/systm.h>
|
||||
#else
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <ctype.h>
|
||||
#endif
|
||||
|
||||
static zprop_desc_t zfs_prop_table[ZFS_NUM_PROPS];
|
||||
|
||||
/* Note this is indexed by zfs_userquota_prop_t, keep the order the same */
|
||||
const char *zfs_userquota_prop_prefixes[] = {
|
||||
"userused@",
|
||||
"userquota@",
|
||||
"groupused@",
|
||||
"groupquota@"
|
||||
};
|
||||
|
||||
zprop_desc_t *
|
||||
zfs_prop_get_table(void)
|
||||
{
|
||||
return (zfs_prop_table);
|
||||
}
|
||||
|
||||
void
|
||||
zfs_prop_init(void)
|
||||
{
|
||||
static zprop_index_t checksum_table[] = {
|
||||
{ "on", ZIO_CHECKSUM_ON },
|
||||
{ "off", ZIO_CHECKSUM_OFF },
|
||||
{ "fletcher2", ZIO_CHECKSUM_FLETCHER_2 },
|
||||
{ "fletcher4", ZIO_CHECKSUM_FLETCHER_4 },
|
||||
{ "sha256", ZIO_CHECKSUM_SHA256 },
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
static zprop_index_t dedup_table[] = {
|
||||
{ "on", ZIO_CHECKSUM_ON },
|
||||
{ "off", ZIO_CHECKSUM_OFF },
|
||||
{ "verify", ZIO_CHECKSUM_ON | ZIO_CHECKSUM_VERIFY },
|
||||
{ "sha256", ZIO_CHECKSUM_SHA256 },
|
||||
{ "sha256,verify",
|
||||
ZIO_CHECKSUM_SHA256 | ZIO_CHECKSUM_VERIFY },
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
static zprop_index_t compress_table[] = {
|
||||
{ "on", ZIO_COMPRESS_ON },
|
||||
{ "off", ZIO_COMPRESS_OFF },
|
||||
{ "lzjb", ZIO_COMPRESS_LZJB },
|
||||
{ "gzip", ZIO_COMPRESS_GZIP_6 }, /* gzip default */
|
||||
{ "gzip-1", ZIO_COMPRESS_GZIP_1 },
|
||||
{ "gzip-2", ZIO_COMPRESS_GZIP_2 },
|
||||
{ "gzip-3", ZIO_COMPRESS_GZIP_3 },
|
||||
{ "gzip-4", ZIO_COMPRESS_GZIP_4 },
|
||||
{ "gzip-5", ZIO_COMPRESS_GZIP_5 },
|
||||
{ "gzip-6", ZIO_COMPRESS_GZIP_6 },
|
||||
{ "gzip-7", ZIO_COMPRESS_GZIP_7 },
|
||||
{ "gzip-8", ZIO_COMPRESS_GZIP_8 },
|
||||
{ "gzip-9", ZIO_COMPRESS_GZIP_9 },
|
||||
{ "zle", ZIO_COMPRESS_ZLE },
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
static zprop_index_t snapdir_table[] = {
|
||||
{ "hidden", ZFS_SNAPDIR_HIDDEN },
|
||||
{ "visible", ZFS_SNAPDIR_VISIBLE },
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
static zprop_index_t acl_inherit_table[] = {
|
||||
{ "discard", ZFS_ACL_DISCARD },
|
||||
{ "noallow", ZFS_ACL_NOALLOW },
|
||||
{ "restricted", ZFS_ACL_RESTRICTED },
|
||||
{ "passthrough", ZFS_ACL_PASSTHROUGH },
|
||||
{ "secure", ZFS_ACL_RESTRICTED }, /* bkwrd compatability */
|
||||
{ "passthrough-x", ZFS_ACL_PASSTHROUGH_X },
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
static zprop_index_t case_table[] = {
|
||||
{ "sensitive", ZFS_CASE_SENSITIVE },
|
||||
{ "insensitive", ZFS_CASE_INSENSITIVE },
|
||||
{ "mixed", ZFS_CASE_MIXED },
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
static zprop_index_t copies_table[] = {
|
||||
{ "1", 1 },
|
||||
{ "2", 2 },
|
||||
{ "3", 3 },
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
/*
|
||||
* Use the unique flags we have to send to u8_strcmp() and/or
|
||||
* u8_textprep() to represent the various normalization property
|
||||
* values.
|
||||
*/
|
||||
static zprop_index_t normalize_table[] = {
|
||||
{ "none", 0 },
|
||||
{ "formD", U8_TEXTPREP_NFD },
|
||||
{ "formKC", U8_TEXTPREP_NFKC },
|
||||
{ "formC", U8_TEXTPREP_NFC },
|
||||
{ "formKD", U8_TEXTPREP_NFKD },
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
static zprop_index_t version_table[] = {
|
||||
{ "1", 1 },
|
||||
{ "2", 2 },
|
||||
{ "3", 3 },
|
||||
{ "4", 4 },
|
||||
{ "5", 5 },
|
||||
{ "current", ZPL_VERSION },
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
static zprop_index_t boolean_table[] = {
|
||||
{ "off", 0 },
|
||||
{ "on", 1 },
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
static zprop_index_t logbias_table[] = {
|
||||
{ "latency", ZFS_LOGBIAS_LATENCY },
|
||||
{ "throughput", ZFS_LOGBIAS_THROUGHPUT },
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
static zprop_index_t canmount_table[] = {
|
||||
{ "off", ZFS_CANMOUNT_OFF },
|
||||
{ "on", ZFS_CANMOUNT_ON },
|
||||
{ "noauto", ZFS_CANMOUNT_NOAUTO },
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
static zprop_index_t cache_table[] = {
|
||||
{ "none", ZFS_CACHE_NONE },
|
||||
{ "metadata", ZFS_CACHE_METADATA },
|
||||
{ "all", ZFS_CACHE_ALL },
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
static zprop_index_t sync_table[] = {
|
||||
{ "standard", ZFS_SYNC_STANDARD },
|
||||
{ "always", ZFS_SYNC_ALWAYS },
|
||||
{ "disabled", ZFS_SYNC_DISABLED },
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
/* inherit index properties */
|
||||
zprop_register_index(ZFS_PROP_SYNC, "sync", ZFS_SYNC_STANDARD,
|
||||
PROP_INHERIT, ZFS_TYPE_FILESYSTEM | ZFS_TYPE_VOLUME,
|
||||
"standard | always | disabled", "SYNC",
|
||||
sync_table);
|
||||
zprop_register_index(ZFS_PROP_CHECKSUM, "checksum",
|
||||
ZIO_CHECKSUM_DEFAULT, PROP_INHERIT, ZFS_TYPE_FILESYSTEM |
|
||||
ZFS_TYPE_VOLUME,
|
||||
"on | off | fletcher2 | fletcher4 | sha256", "CHECKSUM",
|
||||
checksum_table);
|
||||
zprop_register_index(ZFS_PROP_DEDUP, "dedup", ZIO_CHECKSUM_OFF,
|
||||
PROP_INHERIT, ZFS_TYPE_FILESYSTEM | ZFS_TYPE_VOLUME,
|
||||
"on | off | verify | sha256[,verify]", "DEDUP",
|
||||
dedup_table);
|
||||
zprop_register_index(ZFS_PROP_COMPRESSION, "compression",
|
||||
ZIO_COMPRESS_DEFAULT, PROP_INHERIT,
|
||||
ZFS_TYPE_FILESYSTEM | ZFS_TYPE_VOLUME,
|
||||
"on | off | lzjb | gzip | gzip-[1-9] | zle", "COMPRESS",
|
||||
compress_table);
|
||||
zprop_register_index(ZFS_PROP_SNAPDIR, "snapdir", ZFS_SNAPDIR_HIDDEN,
|
||||
PROP_INHERIT, ZFS_TYPE_FILESYSTEM,
|
||||
"hidden | visible", "SNAPDIR", snapdir_table);
|
||||
zprop_register_index(ZFS_PROP_ACLINHERIT, "aclinherit",
|
||||
ZFS_ACL_RESTRICTED, PROP_INHERIT, ZFS_TYPE_FILESYSTEM,
|
||||
"discard | noallow | restricted | passthrough | passthrough-x",
|
||||
"ACLINHERIT", acl_inherit_table);
|
||||
zprop_register_index(ZFS_PROP_COPIES, "copies", 1, PROP_INHERIT,
|
||||
ZFS_TYPE_FILESYSTEM | ZFS_TYPE_VOLUME,
|
||||
"1 | 2 | 3", "COPIES", copies_table);
|
||||
zprop_register_index(ZFS_PROP_PRIMARYCACHE, "primarycache",
|
||||
ZFS_CACHE_ALL, PROP_INHERIT,
|
||||
ZFS_TYPE_FILESYSTEM | ZFS_TYPE_SNAPSHOT | ZFS_TYPE_VOLUME,
|
||||
"all | none | metadata", "PRIMARYCACHE", cache_table);
|
||||
zprop_register_index(ZFS_PROP_SECONDARYCACHE, "secondarycache",
|
||||
ZFS_CACHE_ALL, PROP_INHERIT,
|
||||
ZFS_TYPE_FILESYSTEM | ZFS_TYPE_SNAPSHOT | ZFS_TYPE_VOLUME,
|
||||
"all | none | metadata", "SECONDARYCACHE", cache_table);
|
||||
zprop_register_index(ZFS_PROP_LOGBIAS, "logbias", ZFS_LOGBIAS_LATENCY,
|
||||
PROP_INHERIT, ZFS_TYPE_FILESYSTEM | ZFS_TYPE_VOLUME,
|
||||
"latency | throughput", "LOGBIAS", logbias_table);
|
||||
|
||||
/* inherit index (boolean) properties */
|
||||
zprop_register_index(ZFS_PROP_ATIME, "atime", 1, PROP_INHERIT,
|
||||
ZFS_TYPE_FILESYSTEM, "on | off", "ATIME", boolean_table);
|
||||
zprop_register_index(ZFS_PROP_DEVICES, "devices", 1, PROP_INHERIT,
|
||||
ZFS_TYPE_FILESYSTEM | ZFS_TYPE_SNAPSHOT, "on | off", "DEVICES",
|
||||
boolean_table);
|
||||
zprop_register_index(ZFS_PROP_EXEC, "exec", 1, PROP_INHERIT,
|
||||
ZFS_TYPE_FILESYSTEM | ZFS_TYPE_SNAPSHOT, "on | off", "EXEC",
|
||||
boolean_table);
|
||||
zprop_register_index(ZFS_PROP_SETUID, "setuid", 1, PROP_INHERIT,
|
||||
ZFS_TYPE_FILESYSTEM | ZFS_TYPE_SNAPSHOT, "on | off", "SETUID",
|
||||
boolean_table);
|
||||
zprop_register_index(ZFS_PROP_READONLY, "readonly", 0, PROP_INHERIT,
|
||||
ZFS_TYPE_FILESYSTEM | ZFS_TYPE_VOLUME, "on | off", "RDONLY",
|
||||
boolean_table);
|
||||
zprop_register_index(ZFS_PROP_ZONED, "zoned", 0, PROP_INHERIT,
|
||||
ZFS_TYPE_FILESYSTEM, "on | off", "ZONED", boolean_table);
|
||||
zprop_register_index(ZFS_PROP_XATTR, "xattr", 1, PROP_INHERIT,
|
||||
ZFS_TYPE_FILESYSTEM | ZFS_TYPE_SNAPSHOT, "on | off", "XATTR",
|
||||
boolean_table);
|
||||
zprop_register_index(ZFS_PROP_VSCAN, "vscan", 0, PROP_INHERIT,
|
||||
ZFS_TYPE_FILESYSTEM, "on | off", "VSCAN",
|
||||
boolean_table);
|
||||
zprop_register_index(ZFS_PROP_NBMAND, "nbmand", 0, PROP_INHERIT,
|
||||
ZFS_TYPE_FILESYSTEM | ZFS_TYPE_SNAPSHOT, "on | off", "NBMAND",
|
||||
boolean_table);
|
||||
|
||||
/* default index properties */
|
||||
zprop_register_index(ZFS_PROP_VERSION, "version", 0, PROP_DEFAULT,
|
||||
ZFS_TYPE_FILESYSTEM | ZFS_TYPE_SNAPSHOT,
|
||||
"1 | 2 | 3 | 4 | current", "VERSION", version_table);
|
||||
zprop_register_index(ZFS_PROP_CANMOUNT, "canmount", ZFS_CANMOUNT_ON,
|
||||
PROP_DEFAULT, ZFS_TYPE_FILESYSTEM, "on | off | noauto",
|
||||
"CANMOUNT", canmount_table);
|
||||
|
||||
/* readonly index (boolean) properties */
|
||||
zprop_register_index(ZFS_PROP_MOUNTED, "mounted", 0, PROP_READONLY,
|
||||
ZFS_TYPE_FILESYSTEM, "yes | no", "MOUNTED", boolean_table);
|
||||
zprop_register_index(ZFS_PROP_DEFER_DESTROY, "defer_destroy", 0,
|
||||
PROP_READONLY, ZFS_TYPE_SNAPSHOT, "yes | no", "DEFER_DESTROY",
|
||||
boolean_table);
|
||||
|
||||
/* set once index properties */
|
||||
zprop_register_index(ZFS_PROP_NORMALIZE, "normalization", 0,
|
||||
PROP_ONETIME, ZFS_TYPE_FILESYSTEM | ZFS_TYPE_SNAPSHOT,
|
||||
"none | formC | formD | formKC | formKD", "NORMALIZATION",
|
||||
normalize_table);
|
||||
zprop_register_index(ZFS_PROP_CASE, "casesensitivity",
|
||||
ZFS_CASE_SENSITIVE, PROP_ONETIME, ZFS_TYPE_FILESYSTEM |
|
||||
ZFS_TYPE_SNAPSHOT,
|
||||
"sensitive | insensitive | mixed", "CASE", case_table);
|
||||
|
||||
/* set once index (boolean) properties */
|
||||
zprop_register_index(ZFS_PROP_UTF8ONLY, "utf8only", 0, PROP_ONETIME,
|
||||
ZFS_TYPE_FILESYSTEM | ZFS_TYPE_SNAPSHOT,
|
||||
"on | off", "UTF8ONLY", boolean_table);
|
||||
|
||||
/* string properties */
|
||||
zprop_register_string(ZFS_PROP_ORIGIN, "origin", NULL, PROP_READONLY,
|
||||
ZFS_TYPE_FILESYSTEM | ZFS_TYPE_VOLUME, "<snapshot>", "ORIGIN");
|
||||
zprop_register_string(ZFS_PROP_MOUNTPOINT, "mountpoint", "/",
|
||||
PROP_INHERIT, ZFS_TYPE_FILESYSTEM, "<path> | legacy | none",
|
||||
"MOUNTPOINT");
|
||||
zprop_register_string(ZFS_PROP_SHARENFS, "sharenfs", "off",
|
||||
PROP_INHERIT, ZFS_TYPE_FILESYSTEM, "on | off | share(1M) options",
|
||||
"SHARENFS");
|
||||
zprop_register_string(ZFS_PROP_TYPE, "type", NULL, PROP_READONLY,
|
||||
ZFS_TYPE_DATASET, "filesystem | volume | snapshot", "TYPE");
|
||||
zprop_register_string(ZFS_PROP_SHARESMB, "sharesmb", "off",
|
||||
PROP_INHERIT, ZFS_TYPE_FILESYSTEM,
|
||||
"on | off | sharemgr(1M) options", "SHARESMB");
|
||||
zprop_register_string(ZFS_PROP_MLSLABEL, "mlslabel",
|
||||
ZFS_MLSLABEL_DEFAULT, PROP_INHERIT, ZFS_TYPE_DATASET,
|
||||
"<sensitivity label>", "MLSLABEL");
|
||||
|
||||
/* readonly number properties */
|
||||
zprop_register_number(ZFS_PROP_USED, "used", 0, PROP_READONLY,
|
||||
ZFS_TYPE_DATASET, "<size>", "USED");
|
||||
zprop_register_number(ZFS_PROP_AVAILABLE, "available", 0, PROP_READONLY,
|
||||
ZFS_TYPE_FILESYSTEM | ZFS_TYPE_VOLUME, "<size>", "AVAIL");
|
||||
zprop_register_number(ZFS_PROP_REFERENCED, "referenced", 0,
|
||||
PROP_READONLY, ZFS_TYPE_DATASET, "<size>", "REFER");
|
||||
zprop_register_number(ZFS_PROP_COMPRESSRATIO, "compressratio", 0,
|
||||
PROP_READONLY, ZFS_TYPE_DATASET,
|
||||
"<1.00x or higher if compressed>", "RATIO");
|
||||
zprop_register_number(ZFS_PROP_VOLBLOCKSIZE, "volblocksize",
|
||||
ZVOL_DEFAULT_BLOCKSIZE, PROP_ONETIME,
|
||||
ZFS_TYPE_VOLUME, "512 to 128k, power of 2", "VOLBLOCK");
|
||||
zprop_register_number(ZFS_PROP_USEDSNAP, "usedbysnapshots", 0,
|
||||
PROP_READONLY, ZFS_TYPE_FILESYSTEM | ZFS_TYPE_VOLUME, "<size>",
|
||||
"USEDSNAP");
|
||||
zprop_register_number(ZFS_PROP_USEDDS, "usedbydataset", 0,
|
||||
PROP_READONLY, ZFS_TYPE_FILESYSTEM | ZFS_TYPE_VOLUME, "<size>",
|
||||
"USEDDS");
|
||||
zprop_register_number(ZFS_PROP_USEDCHILD, "usedbychildren", 0,
|
||||
PROP_READONLY, ZFS_TYPE_FILESYSTEM | ZFS_TYPE_VOLUME, "<size>",
|
||||
"USEDCHILD");
|
||||
zprop_register_number(ZFS_PROP_USEDREFRESERV, "usedbyrefreservation", 0,
|
||||
PROP_READONLY,
|
||||
ZFS_TYPE_FILESYSTEM | ZFS_TYPE_VOLUME, "<size>", "USEDREFRESERV");
|
||||
zprop_register_number(ZFS_PROP_USERREFS, "userrefs", 0, PROP_READONLY,
|
||||
ZFS_TYPE_SNAPSHOT, "<count>", "USERREFS");
|
||||
|
||||
/* default number properties */
|
||||
zprop_register_number(ZFS_PROP_QUOTA, "quota", 0, PROP_DEFAULT,
|
||||
ZFS_TYPE_FILESYSTEM, "<size> | none", "QUOTA");
|
||||
zprop_register_number(ZFS_PROP_RESERVATION, "reservation", 0,
|
||||
PROP_DEFAULT, ZFS_TYPE_FILESYSTEM | ZFS_TYPE_VOLUME,
|
||||
"<size> | none", "RESERV");
|
||||
zprop_register_number(ZFS_PROP_VOLSIZE, "volsize", 0, PROP_DEFAULT,
|
||||
ZFS_TYPE_VOLUME, "<size>", "VOLSIZE");
|
||||
zprop_register_number(ZFS_PROP_REFQUOTA, "refquota", 0, PROP_DEFAULT,
|
||||
ZFS_TYPE_FILESYSTEM, "<size> | none", "REFQUOTA");
|
||||
zprop_register_number(ZFS_PROP_REFRESERVATION, "refreservation", 0,
|
||||
PROP_DEFAULT, ZFS_TYPE_FILESYSTEM | ZFS_TYPE_VOLUME,
|
||||
"<size> | none", "REFRESERV");
|
||||
|
||||
/* inherit number properties */
|
||||
zprop_register_number(ZFS_PROP_RECORDSIZE, "recordsize",
|
||||
SPA_MAXBLOCKSIZE, PROP_INHERIT,
|
||||
ZFS_TYPE_FILESYSTEM, "512 to 128k, power of 2", "RECSIZE");
|
||||
|
||||
/* hidden properties */
|
||||
zprop_register_hidden(ZFS_PROP_CREATETXG, "createtxg", PROP_TYPE_NUMBER,
|
||||
PROP_READONLY, ZFS_TYPE_DATASET, "CREATETXG");
|
||||
zprop_register_hidden(ZFS_PROP_NUMCLONES, "numclones", PROP_TYPE_NUMBER,
|
||||
PROP_READONLY, ZFS_TYPE_SNAPSHOT, "NUMCLONES");
|
||||
zprop_register_hidden(ZFS_PROP_NAME, "name", PROP_TYPE_STRING,
|
||||
PROP_READONLY, ZFS_TYPE_DATASET, "NAME");
|
||||
zprop_register_hidden(ZFS_PROP_ISCSIOPTIONS, "iscsioptions",
|
||||
PROP_TYPE_STRING, PROP_INHERIT, ZFS_TYPE_VOLUME, "ISCSIOPTIONS");
|
||||
zprop_register_hidden(ZFS_PROP_STMF_SHAREINFO, "stmf_sbd_lu",
|
||||
PROP_TYPE_STRING, PROP_INHERIT, ZFS_TYPE_VOLUME,
|
||||
"STMF_SBD_LU");
|
||||
zprop_register_hidden(ZFS_PROP_GUID, "guid", PROP_TYPE_NUMBER,
|
||||
PROP_READONLY, ZFS_TYPE_DATASET, "GUID");
|
||||
zprop_register_hidden(ZFS_PROP_USERACCOUNTING, "useraccounting",
|
||||
PROP_TYPE_NUMBER, PROP_READONLY, ZFS_TYPE_DATASET,
|
||||
"USERACCOUNTING");
|
||||
zprop_register_hidden(ZFS_PROP_UNIQUE, "unique", PROP_TYPE_NUMBER,
|
||||
PROP_READONLY, ZFS_TYPE_DATASET, "UNIQUE");
|
||||
zprop_register_hidden(ZFS_PROP_OBJSETID, "objsetid", PROP_TYPE_NUMBER,
|
||||
PROP_READONLY, ZFS_TYPE_DATASET, "OBJSETID");
|
||||
|
||||
/*
|
||||
* Property to be removed once libbe is integrated
|
||||
*/
|
||||
zprop_register_hidden(ZFS_PROP_PRIVATE, "priv_prop",
|
||||
PROP_TYPE_NUMBER, PROP_READONLY, ZFS_TYPE_FILESYSTEM,
|
||||
"PRIV_PROP");
|
||||
|
||||
/* oddball properties */
|
||||
zprop_register_impl(ZFS_PROP_CREATION, "creation", PROP_TYPE_NUMBER, 0,
|
||||
NULL, PROP_READONLY, ZFS_TYPE_DATASET,
|
||||
"<date>", "CREATION", B_FALSE, B_TRUE, NULL);
|
||||
}
|
||||
|
||||
boolean_t
|
||||
zfs_prop_delegatable(zfs_prop_t prop)
|
||||
{
|
||||
zprop_desc_t *pd = &zfs_prop_table[prop];
|
||||
|
||||
/* The mlslabel property is never delegatable. */
|
||||
if (prop == ZFS_PROP_MLSLABEL)
|
||||
return (B_FALSE);
|
||||
|
||||
return (pd->pd_attr != PROP_READONLY);
|
||||
}
|
||||
|
||||
/*
|
||||
* Given a zfs dataset property name, returns the corresponding property ID.
|
||||
*/
|
||||
zfs_prop_t
|
||||
zfs_name_to_prop(const char *propname)
|
||||
{
|
||||
return (zprop_name_to_prop(propname, ZFS_TYPE_DATASET));
|
||||
}
|
||||
|
||||
/*
|
||||
* For user property names, we allow all lowercase alphanumeric characters, plus
|
||||
* a few useful punctuation characters.
|
||||
*/
|
||||
static int
|
||||
valid_char(char c)
|
||||
{
|
||||
return ((c >= 'a' && c <= 'z') ||
|
||||
(c >= '0' && c <= '9') ||
|
||||
c == '-' || c == '_' || c == '.' || c == ':');
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns true if this is a valid user-defined property (one with a ':').
|
||||
*/
|
||||
boolean_t
|
||||
zfs_prop_user(const char *name)
|
||||
{
|
||||
int i;
|
||||
char c;
|
||||
boolean_t foundsep = B_FALSE;
|
||||
|
||||
for (i = 0; i < strlen(name); i++) {
|
||||
c = name[i];
|
||||
if (!valid_char(c))
|
||||
return (B_FALSE);
|
||||
if (c == ':')
|
||||
foundsep = B_TRUE;
|
||||
}
|
||||
|
||||
if (!foundsep)
|
||||
return (B_FALSE);
|
||||
|
||||
return (B_TRUE);
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns true if this is a valid userspace-type property (one with a '@').
|
||||
* Note that after the @, any character is valid (eg, another @, for SID
|
||||
* user@domain).
|
||||
*/
|
||||
boolean_t
|
||||
zfs_prop_userquota(const char *name)
|
||||
{
|
||||
zfs_userquota_prop_t prop;
|
||||
|
||||
for (prop = 0; prop < ZFS_NUM_USERQUOTA_PROPS; prop++) {
|
||||
if (strncmp(name, zfs_userquota_prop_prefixes[prop],
|
||||
strlen(zfs_userquota_prop_prefixes[prop])) == 0) {
|
||||
return (B_TRUE);
|
||||
}
|
||||
}
|
||||
|
||||
return (B_FALSE);
|
||||
}
|
||||
|
||||
/*
|
||||
* Tables of index types, plus functions to convert between the user view
|
||||
* (strings) and internal representation (uint64_t).
|
||||
*/
|
||||
int
|
||||
zfs_prop_string_to_index(zfs_prop_t prop, const char *string, uint64_t *index)
|
||||
{
|
||||
return (zprop_string_to_index(prop, string, index, ZFS_TYPE_DATASET));
|
||||
}
|
||||
|
||||
int
|
||||
zfs_prop_index_to_string(zfs_prop_t prop, uint64_t index, const char **string)
|
||||
{
|
||||
return (zprop_index_to_string(prop, index, string, ZFS_TYPE_DATASET));
|
||||
}
|
||||
|
||||
uint64_t
|
||||
zfs_prop_random_value(zfs_prop_t prop, uint64_t seed)
|
||||
{
|
||||
return (zprop_random_value(prop, seed, ZFS_TYPE_DATASET));
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns TRUE if the property applies to any of the given dataset types.
|
||||
*/
|
||||
boolean_t
|
||||
zfs_prop_valid_for_type(int prop, zfs_type_t types)
|
||||
{
|
||||
return (zprop_valid_for_type(prop, types));
|
||||
}
|
||||
|
||||
zprop_type_t
|
||||
zfs_prop_get_type(zfs_prop_t prop)
|
||||
{
|
||||
return (zfs_prop_table[prop].pd_proptype);
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns TRUE if the property is readonly.
|
||||
*/
|
||||
boolean_t
|
||||
zfs_prop_readonly(zfs_prop_t prop)
|
||||
{
|
||||
return (zfs_prop_table[prop].pd_attr == PROP_READONLY ||
|
||||
zfs_prop_table[prop].pd_attr == PROP_ONETIME);
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns TRUE if the property is only allowed to be set once.
|
||||
*/
|
||||
boolean_t
|
||||
zfs_prop_setonce(zfs_prop_t prop)
|
||||
{
|
||||
return (zfs_prop_table[prop].pd_attr == PROP_ONETIME);
|
||||
}
|
||||
|
||||
const char *
|
||||
zfs_prop_default_string(zfs_prop_t prop)
|
||||
{
|
||||
return (zfs_prop_table[prop].pd_strdefault);
|
||||
}
|
||||
|
||||
uint64_t
|
||||
zfs_prop_default_numeric(zfs_prop_t prop)
|
||||
{
|
||||
return (zfs_prop_table[prop].pd_numdefault);
|
||||
}
|
||||
|
||||
/*
|
||||
* Given a dataset property ID, returns the corresponding name.
|
||||
* Assuming the zfs dataset property ID is valid.
|
||||
*/
|
||||
const char *
|
||||
zfs_prop_to_name(zfs_prop_t prop)
|
||||
{
|
||||
return (zfs_prop_table[prop].pd_name);
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns TRUE if the property is inheritable.
|
||||
*/
|
||||
boolean_t
|
||||
zfs_prop_inheritable(zfs_prop_t prop)
|
||||
{
|
||||
return (zfs_prop_table[prop].pd_attr == PROP_INHERIT ||
|
||||
zfs_prop_table[prop].pd_attr == PROP_ONETIME);
|
||||
}
|
||||
|
||||
#ifndef _KERNEL
|
||||
|
||||
/*
|
||||
* Returns a string describing the set of acceptable values for the given
|
||||
* zfs property, or NULL if it cannot be set.
|
||||
*/
|
||||
const char *
|
||||
zfs_prop_values(zfs_prop_t prop)
|
||||
{
|
||||
return (zfs_prop_table[prop].pd_values);
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns TRUE if this property is a string type. Note that index types
|
||||
* (compression, checksum) are treated as strings in userland, even though they
|
||||
* are stored numerically on disk.
|
||||
*/
|
||||
int
|
||||
zfs_prop_is_string(zfs_prop_t prop)
|
||||
{
|
||||
return (zfs_prop_table[prop].pd_proptype == PROP_TYPE_STRING ||
|
||||
zfs_prop_table[prop].pd_proptype == PROP_TYPE_INDEX);
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns the column header for the given property. Used only in
|
||||
* 'zfs list -o', but centralized here with the other property information.
|
||||
*/
|
||||
const char *
|
||||
zfs_prop_column_name(zfs_prop_t prop)
|
||||
{
|
||||
return (zfs_prop_table[prop].pd_colname);
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns whether the given property should be displayed right-justified for
|
||||
* 'zfs list'.
|
||||
*/
|
||||
boolean_t
|
||||
zfs_prop_align_right(zfs_prop_t prop)
|
||||
{
|
||||
return (zfs_prop_table[prop].pd_rightalign);
|
||||
}
|
||||
|
||||
#endif
|
129
common/zfs/zfs_prop.h
Normal file
129
common/zfs/zfs_prop.h
Normal file
@ -0,0 +1,129 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2010 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _ZFS_PROP_H
|
||||
#define _ZFS_PROP_H
|
||||
|
||||
#include <sys/fs/zfs.h>
|
||||
#include <sys/types.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
/*
|
||||
* For index types (e.g. compression and checksum), we want the numeric value
|
||||
* in the kernel, but the string value in userland.
|
||||
*/
|
||||
typedef enum {
|
||||
PROP_TYPE_NUMBER, /* numeric value */
|
||||
PROP_TYPE_STRING, /* string value */
|
||||
PROP_TYPE_INDEX /* numeric value indexed by string */
|
||||
} zprop_type_t;
|
||||
|
||||
typedef enum {
|
||||
PROP_DEFAULT,
|
||||
PROP_READONLY,
|
||||
PROP_INHERIT,
|
||||
/*
|
||||
* ONETIME properties are a sort of conglomeration of READONLY
|
||||
* and INHERIT. They can be set only during object creation,
|
||||
* after that they are READONLY. If not explicitly set during
|
||||
* creation, they can be inherited.
|
||||
*/
|
||||
PROP_ONETIME
|
||||
} zprop_attr_t;
|
||||
|
||||
typedef struct zfs_index {
|
||||
const char *pi_name;
|
||||
uint64_t pi_value;
|
||||
} zprop_index_t;
|
||||
|
||||
typedef struct {
|
||||
const char *pd_name; /* human-readable property name */
|
||||
int pd_propnum; /* property number */
|
||||
zprop_type_t pd_proptype; /* string, boolean, index, number */
|
||||
const char *pd_strdefault; /* default for strings */
|
||||
uint64_t pd_numdefault; /* for boolean / index / number */
|
||||
zprop_attr_t pd_attr; /* default, readonly, inherit */
|
||||
int pd_types; /* bitfield of valid dataset types */
|
||||
/* fs | vol | snap; or pool */
|
||||
const char *pd_values; /* string telling acceptable values */
|
||||
const char *pd_colname; /* column header for "zfs list" */
|
||||
boolean_t pd_rightalign; /* column alignment for "zfs list" */
|
||||
boolean_t pd_visible; /* do we list this property with the */
|
||||
/* "zfs get" help message */
|
||||
const zprop_index_t *pd_table; /* for index properties, a table */
|
||||
/* defining the possible values */
|
||||
size_t pd_table_size; /* number of entries in pd_table[] */
|
||||
} zprop_desc_t;
|
||||
|
||||
/*
|
||||
* zfs dataset property functions
|
||||
*/
|
||||
void zfs_prop_init(void);
|
||||
zprop_type_t zfs_prop_get_type(zfs_prop_t);
|
||||
boolean_t zfs_prop_delegatable(zfs_prop_t prop);
|
||||
zprop_desc_t *zfs_prop_get_table(void);
|
||||
|
||||
/*
|
||||
* zpool property functions
|
||||
*/
|
||||
void zpool_prop_init(void);
|
||||
zprop_type_t zpool_prop_get_type(zpool_prop_t);
|
||||
zprop_desc_t *zpool_prop_get_table(void);
|
||||
|
||||
/*
|
||||
* Common routines to initialize property tables
|
||||
*/
|
||||
void zprop_register_impl(int, const char *, zprop_type_t, uint64_t,
|
||||
const char *, zprop_attr_t, int, const char *, const char *,
|
||||
boolean_t, boolean_t, const zprop_index_t *);
|
||||
void zprop_register_string(int, const char *, const char *,
|
||||
zprop_attr_t attr, int, const char *, const char *);
|
||||
void zprop_register_number(int, const char *, uint64_t, zprop_attr_t, int,
|
||||
const char *, const char *);
|
||||
void zprop_register_index(int, const char *, uint64_t, zprop_attr_t, int,
|
||||
const char *, const char *, const zprop_index_t *);
|
||||
void zprop_register_hidden(int, const char *, zprop_type_t, zprop_attr_t,
|
||||
int, const char *);
|
||||
|
||||
/*
|
||||
* Common routines for zfs and zpool property management
|
||||
*/
|
||||
int zprop_iter_common(zprop_func, void *, boolean_t, boolean_t, zfs_type_t);
|
||||
int zprop_name_to_prop(const char *, zfs_type_t);
|
||||
int zprop_string_to_index(int, const char *, uint64_t *, zfs_type_t);
|
||||
int zprop_index_to_string(int, uint64_t, const char **, zfs_type_t);
|
||||
uint64_t zprop_random_value(int, uint64_t, zfs_type_t);
|
||||
const char *zprop_values(int, zfs_type_t);
|
||||
size_t zprop_width(int, boolean_t *, zfs_type_t);
|
||||
boolean_t zprop_valid_for_type(int, zfs_type_t);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _ZFS_PROP_H */
|
202
common/zfs/zpool_prop.c
Normal file
202
common/zfs/zpool_prop.c
Normal file
@ -0,0 +1,202 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#include <sys/zio.h>
|
||||
#include <sys/spa.h>
|
||||
#include <sys/zfs_acl.h>
|
||||
#include <sys/zfs_ioctl.h>
|
||||
#include <sys/fs/zfs.h>
|
||||
|
||||
#include "zfs_prop.h"
|
||||
|
||||
#if defined(_KERNEL)
|
||||
#include <sys/systm.h>
|
||||
#else
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <ctype.h>
|
||||
#endif
|
||||
|
||||
static zprop_desc_t zpool_prop_table[ZPOOL_NUM_PROPS];
|
||||
|
||||
zprop_desc_t *
|
||||
zpool_prop_get_table(void)
|
||||
{
|
||||
return (zpool_prop_table);
|
||||
}
|
||||
|
||||
void
|
||||
zpool_prop_init(void)
|
||||
{
|
||||
static zprop_index_t boolean_table[] = {
|
||||
{ "off", 0},
|
||||
{ "on", 1},
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
static zprop_index_t failuremode_table[] = {
|
||||
{ "wait", ZIO_FAILURE_MODE_WAIT },
|
||||
{ "continue", ZIO_FAILURE_MODE_CONTINUE },
|
||||
{ "panic", ZIO_FAILURE_MODE_PANIC },
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
/* string properties */
|
||||
zprop_register_string(ZPOOL_PROP_ALTROOT, "altroot", NULL, PROP_DEFAULT,
|
||||
ZFS_TYPE_POOL, "<path>", "ALTROOT");
|
||||
zprop_register_string(ZPOOL_PROP_BOOTFS, "bootfs", NULL, PROP_DEFAULT,
|
||||
ZFS_TYPE_POOL, "<filesystem>", "BOOTFS");
|
||||
zprop_register_string(ZPOOL_PROP_CACHEFILE, "cachefile", NULL,
|
||||
PROP_DEFAULT, ZFS_TYPE_POOL, "<file> | none", "CACHEFILE");
|
||||
|
||||
/* readonly number properties */
|
||||
zprop_register_number(ZPOOL_PROP_SIZE, "size", 0, PROP_READONLY,
|
||||
ZFS_TYPE_POOL, "<size>", "SIZE");
|
||||
zprop_register_number(ZPOOL_PROP_FREE, "free", 0, PROP_READONLY,
|
||||
ZFS_TYPE_POOL, "<size>", "FREE");
|
||||
zprop_register_number(ZPOOL_PROP_ALLOCATED, "allocated", 0,
|
||||
PROP_READONLY, ZFS_TYPE_POOL, "<size>", "ALLOC");
|
||||
zprop_register_number(ZPOOL_PROP_CAPACITY, "capacity", 0, PROP_READONLY,
|
||||
ZFS_TYPE_POOL, "<size>", "CAP");
|
||||
zprop_register_number(ZPOOL_PROP_GUID, "guid", 0, PROP_READONLY,
|
||||
ZFS_TYPE_POOL, "<guid>", "GUID");
|
||||
zprop_register_number(ZPOOL_PROP_HEALTH, "health", 0, PROP_READONLY,
|
||||
ZFS_TYPE_POOL, "<state>", "HEALTH");
|
||||
zprop_register_number(ZPOOL_PROP_DEDUPRATIO, "dedupratio", 0,
|
||||
PROP_READONLY, ZFS_TYPE_POOL, "<1.00x or higher if deduped>",
|
||||
"DEDUP");
|
||||
|
||||
/* default number properties */
|
||||
zprop_register_number(ZPOOL_PROP_VERSION, "version", SPA_VERSION,
|
||||
PROP_DEFAULT, ZFS_TYPE_POOL, "<version>", "VERSION");
|
||||
zprop_register_number(ZPOOL_PROP_DEDUPDITTO, "dedupditto", 0,
|
||||
PROP_DEFAULT, ZFS_TYPE_POOL, "<threshold (min 100)>", "DEDUPDITTO");
|
||||
|
||||
/* default index (boolean) properties */
|
||||
zprop_register_index(ZPOOL_PROP_DELEGATION, "delegation", 1,
|
||||
PROP_DEFAULT, ZFS_TYPE_POOL, "on | off", "DELEGATION",
|
||||
boolean_table);
|
||||
zprop_register_index(ZPOOL_PROP_AUTOREPLACE, "autoreplace", 0,
|
||||
PROP_DEFAULT, ZFS_TYPE_POOL, "on | off", "REPLACE", boolean_table);
|
||||
zprop_register_index(ZPOOL_PROP_LISTSNAPS, "listsnapshots", 0,
|
||||
PROP_DEFAULT, ZFS_TYPE_POOL, "on | off", "LISTSNAPS",
|
||||
boolean_table);
|
||||
zprop_register_index(ZPOOL_PROP_AUTOEXPAND, "autoexpand", 0,
|
||||
PROP_DEFAULT, ZFS_TYPE_POOL, "on | off", "EXPAND", boolean_table);
|
||||
zprop_register_index(ZPOOL_PROP_READONLY, "readonly", 0,
|
||||
PROP_DEFAULT, ZFS_TYPE_POOL, "on | off", "RDONLY", boolean_table);
|
||||
|
||||
/* default index properties */
|
||||
zprop_register_index(ZPOOL_PROP_FAILUREMODE, "failmode",
|
||||
ZIO_FAILURE_MODE_WAIT, PROP_DEFAULT, ZFS_TYPE_POOL,
|
||||
"wait | continue | panic", "FAILMODE", failuremode_table);
|
||||
|
||||
/* hidden properties */
|
||||
zprop_register_hidden(ZPOOL_PROP_NAME, "name", PROP_TYPE_STRING,
|
||||
PROP_READONLY, ZFS_TYPE_POOL, "NAME");
|
||||
}
|
||||
|
||||
/*
|
||||
* Given a property name and its type, returns the corresponding property ID.
|
||||
*/
|
||||
zpool_prop_t
|
||||
zpool_name_to_prop(const char *propname)
|
||||
{
|
||||
return (zprop_name_to_prop(propname, ZFS_TYPE_POOL));
|
||||
}
|
||||
|
||||
/*
|
||||
* Given a pool property ID, returns the corresponding name.
|
||||
* Assuming the pool propety ID is valid.
|
||||
*/
|
||||
const char *
|
||||
zpool_prop_to_name(zpool_prop_t prop)
|
||||
{
|
||||
return (zpool_prop_table[prop].pd_name);
|
||||
}
|
||||
|
||||
zprop_type_t
|
||||
zpool_prop_get_type(zpool_prop_t prop)
|
||||
{
|
||||
return (zpool_prop_table[prop].pd_proptype);
|
||||
}
|
||||
|
||||
boolean_t
|
||||
zpool_prop_readonly(zpool_prop_t prop)
|
||||
{
|
||||
return (zpool_prop_table[prop].pd_attr == PROP_READONLY);
|
||||
}
|
||||
|
||||
const char *
|
||||
zpool_prop_default_string(zpool_prop_t prop)
|
||||
{
|
||||
return (zpool_prop_table[prop].pd_strdefault);
|
||||
}
|
||||
|
||||
uint64_t
|
||||
zpool_prop_default_numeric(zpool_prop_t prop)
|
||||
{
|
||||
return (zpool_prop_table[prop].pd_numdefault);
|
||||
}
|
||||
|
||||
int
|
||||
zpool_prop_string_to_index(zpool_prop_t prop, const char *string,
|
||||
uint64_t *index)
|
||||
{
|
||||
return (zprop_string_to_index(prop, string, index, ZFS_TYPE_POOL));
|
||||
}
|
||||
|
||||
int
|
||||
zpool_prop_index_to_string(zpool_prop_t prop, uint64_t index,
|
||||
const char **string)
|
||||
{
|
||||
return (zprop_index_to_string(prop, index, string, ZFS_TYPE_POOL));
|
||||
}
|
||||
|
||||
uint64_t
|
||||
zpool_prop_random_value(zpool_prop_t prop, uint64_t seed)
|
||||
{
|
||||
return (zprop_random_value(prop, seed, ZFS_TYPE_POOL));
|
||||
}
|
||||
|
||||
#ifndef _KERNEL
|
||||
|
||||
const char *
|
||||
zpool_prop_values(zpool_prop_t prop)
|
||||
{
|
||||
return (zpool_prop_table[prop].pd_values);
|
||||
}
|
||||
|
||||
const char *
|
||||
zpool_prop_column_name(zpool_prop_t prop)
|
||||
{
|
||||
return (zpool_prop_table[prop].pd_colname);
|
||||
}
|
||||
|
||||
boolean_t
|
||||
zpool_prop_align_right(zpool_prop_t prop)
|
||||
{
|
||||
return (zpool_prop_table[prop].pd_rightalign);
|
||||
}
|
||||
#endif
|
426
common/zfs/zprop_common.c
Normal file
426
common/zfs/zprop_common.c
Normal file
@ -0,0 +1,426 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2010 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
/*
|
||||
* Common routines used by zfs and zpool property management.
|
||||
*/
|
||||
|
||||
#include <sys/zio.h>
|
||||
#include <sys/spa.h>
|
||||
#include <sys/zfs_acl.h>
|
||||
#include <sys/zfs_ioctl.h>
|
||||
#include <sys/zfs_znode.h>
|
||||
#include <sys/fs/zfs.h>
|
||||
|
||||
#include "zfs_prop.h"
|
||||
#include "zfs_deleg.h"
|
||||
|
||||
#if defined(_KERNEL)
|
||||
#include <sys/systm.h>
|
||||
#include <util/qsort.h>
|
||||
#else
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <ctype.h>
|
||||
#endif
|
||||
|
||||
static zprop_desc_t *
|
||||
zprop_get_proptable(zfs_type_t type)
|
||||
{
|
||||
if (type == ZFS_TYPE_POOL)
|
||||
return (zpool_prop_get_table());
|
||||
else
|
||||
return (zfs_prop_get_table());
|
||||
}
|
||||
|
||||
static int
|
||||
zprop_get_numprops(zfs_type_t type)
|
||||
{
|
||||
if (type == ZFS_TYPE_POOL)
|
||||
return (ZPOOL_NUM_PROPS);
|
||||
else
|
||||
return (ZFS_NUM_PROPS);
|
||||
}
|
||||
|
||||
void
|
||||
zprop_register_impl(int prop, const char *name, zprop_type_t type,
|
||||
uint64_t numdefault, const char *strdefault, zprop_attr_t attr,
|
||||
int objset_types, const char *values, const char *colname,
|
||||
boolean_t rightalign, boolean_t visible, const zprop_index_t *idx_tbl)
|
||||
{
|
||||
zprop_desc_t *prop_tbl = zprop_get_proptable(objset_types);
|
||||
zprop_desc_t *pd;
|
||||
|
||||
pd = &prop_tbl[prop];
|
||||
|
||||
ASSERT(pd->pd_name == NULL || pd->pd_name == name);
|
||||
ASSERT(name != NULL);
|
||||
ASSERT(colname != NULL);
|
||||
|
||||
pd->pd_name = name;
|
||||
pd->pd_propnum = prop;
|
||||
pd->pd_proptype = type;
|
||||
pd->pd_numdefault = numdefault;
|
||||
pd->pd_strdefault = strdefault;
|
||||
pd->pd_attr = attr;
|
||||
pd->pd_types = objset_types;
|
||||
pd->pd_values = values;
|
||||
pd->pd_colname = colname;
|
||||
pd->pd_rightalign = rightalign;
|
||||
pd->pd_visible = visible;
|
||||
pd->pd_table = idx_tbl;
|
||||
pd->pd_table_size = 0;
|
||||
while (idx_tbl && (idx_tbl++)->pi_name != NULL)
|
||||
pd->pd_table_size++;
|
||||
}
|
||||
|
||||
void
|
||||
zprop_register_string(int prop, const char *name, const char *def,
|
||||
zprop_attr_t attr, int objset_types, const char *values,
|
||||
const char *colname)
|
||||
{
|
||||
zprop_register_impl(prop, name, PROP_TYPE_STRING, 0, def, attr,
|
||||
objset_types, values, colname, B_FALSE, B_TRUE, NULL);
|
||||
|
||||
}
|
||||
|
||||
void
|
||||
zprop_register_number(int prop, const char *name, uint64_t def,
|
||||
zprop_attr_t attr, int objset_types, const char *values,
|
||||
const char *colname)
|
||||
{
|
||||
zprop_register_impl(prop, name, PROP_TYPE_NUMBER, def, NULL, attr,
|
||||
objset_types, values, colname, B_TRUE, B_TRUE, NULL);
|
||||
}
|
||||
|
||||
void
|
||||
zprop_register_index(int prop, const char *name, uint64_t def,
|
||||
zprop_attr_t attr, int objset_types, const char *values,
|
||||
const char *colname, const zprop_index_t *idx_tbl)
|
||||
{
|
||||
zprop_register_impl(prop, name, PROP_TYPE_INDEX, def, NULL, attr,
|
||||
objset_types, values, colname, B_TRUE, B_TRUE, idx_tbl);
|
||||
}
|
||||
|
||||
void
|
||||
zprop_register_hidden(int prop, const char *name, zprop_type_t type,
|
||||
zprop_attr_t attr, int objset_types, const char *colname)
|
||||
{
|
||||
zprop_register_impl(prop, name, type, 0, NULL, attr,
|
||||
objset_types, NULL, colname, B_FALSE, B_FALSE, NULL);
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* A comparison function we can use to order indexes into property tables.
|
||||
*/
|
||||
static int
|
||||
zprop_compare(const void *arg1, const void *arg2)
|
||||
{
|
||||
const zprop_desc_t *p1 = *((zprop_desc_t **)arg1);
|
||||
const zprop_desc_t *p2 = *((zprop_desc_t **)arg2);
|
||||
boolean_t p1ro, p2ro;
|
||||
|
||||
p1ro = (p1->pd_attr == PROP_READONLY);
|
||||
p2ro = (p2->pd_attr == PROP_READONLY);
|
||||
|
||||
if (p1ro == p2ro)
|
||||
return (strcmp(p1->pd_name, p2->pd_name));
|
||||
|
||||
return (p1ro ? -1 : 1);
|
||||
}
|
||||
|
||||
/*
|
||||
* Iterate over all properties in the given property table, calling back
|
||||
* into the specified function for each property. We will continue to
|
||||
* iterate until we either reach the end or the callback function returns
|
||||
* something other than ZPROP_CONT.
|
||||
*/
|
||||
int
|
||||
zprop_iter_common(zprop_func func, void *cb, boolean_t show_all,
|
||||
boolean_t ordered, zfs_type_t type)
|
||||
{
|
||||
int i, num_props, size, prop;
|
||||
zprop_desc_t *prop_tbl;
|
||||
zprop_desc_t **order;
|
||||
|
||||
prop_tbl = zprop_get_proptable(type);
|
||||
num_props = zprop_get_numprops(type);
|
||||
size = num_props * sizeof (zprop_desc_t *);
|
||||
|
||||
#if defined(_KERNEL)
|
||||
order = kmem_alloc(size, KM_SLEEP);
|
||||
#else
|
||||
if ((order = malloc(size)) == NULL)
|
||||
return (ZPROP_CONT);
|
||||
#endif
|
||||
|
||||
for (int j = 0; j < num_props; j++)
|
||||
order[j] = &prop_tbl[j];
|
||||
|
||||
if (ordered) {
|
||||
qsort((void *)order, num_props, sizeof (zprop_desc_t *),
|
||||
zprop_compare);
|
||||
}
|
||||
|
||||
prop = ZPROP_CONT;
|
||||
for (i = 0; i < num_props; i++) {
|
||||
if ((order[i]->pd_visible || show_all) &&
|
||||
(func(order[i]->pd_propnum, cb) != ZPROP_CONT)) {
|
||||
prop = order[i]->pd_propnum;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
#if defined(_KERNEL)
|
||||
kmem_free(order, size);
|
||||
#else
|
||||
free(order);
|
||||
#endif
|
||||
return (prop);
|
||||
}
|
||||
|
||||
static boolean_t
|
||||
propname_match(const char *p, size_t len, zprop_desc_t *prop_entry)
|
||||
{
|
||||
const char *propname = prop_entry->pd_name;
|
||||
#ifndef _KERNEL
|
||||
const char *colname = prop_entry->pd_colname;
|
||||
int c;
|
||||
#endif
|
||||
|
||||
if (len == strlen(propname) &&
|
||||
strncmp(p, propname, len) == 0)
|
||||
return (B_TRUE);
|
||||
|
||||
#ifndef _KERNEL
|
||||
if (colname == NULL || len != strlen(colname))
|
||||
return (B_FALSE);
|
||||
|
||||
for (c = 0; c < len; c++)
|
||||
if (p[c] != tolower(colname[c]))
|
||||
break;
|
||||
|
||||
return (colname[c] == '\0');
|
||||
#else
|
||||
return (B_FALSE);
|
||||
#endif
|
||||
}
|
||||
|
||||
typedef struct name_to_prop_cb {
|
||||
const char *propname;
|
||||
zprop_desc_t *prop_tbl;
|
||||
} name_to_prop_cb_t;
|
||||
|
||||
static int
|
||||
zprop_name_to_prop_cb(int prop, void *cb_data)
|
||||
{
|
||||
name_to_prop_cb_t *data = cb_data;
|
||||
|
||||
if (propname_match(data->propname, strlen(data->propname),
|
||||
&data->prop_tbl[prop]))
|
||||
return (prop);
|
||||
|
||||
return (ZPROP_CONT);
|
||||
}
|
||||
|
||||
int
|
||||
zprop_name_to_prop(const char *propname, zfs_type_t type)
|
||||
{
|
||||
int prop;
|
||||
name_to_prop_cb_t cb_data;
|
||||
|
||||
cb_data.propname = propname;
|
||||
cb_data.prop_tbl = zprop_get_proptable(type);
|
||||
|
||||
prop = zprop_iter_common(zprop_name_to_prop_cb, &cb_data,
|
||||
B_TRUE, B_FALSE, type);
|
||||
|
||||
return (prop == ZPROP_CONT ? ZPROP_INVAL : prop);
|
||||
}
|
||||
|
||||
int
|
||||
zprop_string_to_index(int prop, const char *string, uint64_t *index,
|
||||
zfs_type_t type)
|
||||
{
|
||||
zprop_desc_t *prop_tbl;
|
||||
const zprop_index_t *idx_tbl;
|
||||
int i;
|
||||
|
||||
if (prop == ZPROP_INVAL || prop == ZPROP_CONT)
|
||||
return (-1);
|
||||
|
||||
ASSERT(prop < zprop_get_numprops(type));
|
||||
prop_tbl = zprop_get_proptable(type);
|
||||
if ((idx_tbl = prop_tbl[prop].pd_table) == NULL)
|
||||
return (-1);
|
||||
|
||||
for (i = 0; idx_tbl[i].pi_name != NULL; i++) {
|
||||
if (strcmp(string, idx_tbl[i].pi_name) == 0) {
|
||||
*index = idx_tbl[i].pi_value;
|
||||
return (0);
|
||||
}
|
||||
}
|
||||
|
||||
return (-1);
|
||||
}
|
||||
|
||||
int
|
||||
zprop_index_to_string(int prop, uint64_t index, const char **string,
|
||||
zfs_type_t type)
|
||||
{
|
||||
zprop_desc_t *prop_tbl;
|
||||
const zprop_index_t *idx_tbl;
|
||||
int i;
|
||||
|
||||
if (prop == ZPROP_INVAL || prop == ZPROP_CONT)
|
||||
return (-1);
|
||||
|
||||
ASSERT(prop < zprop_get_numprops(type));
|
||||
prop_tbl = zprop_get_proptable(type);
|
||||
if ((idx_tbl = prop_tbl[prop].pd_table) == NULL)
|
||||
return (-1);
|
||||
|
||||
for (i = 0; idx_tbl[i].pi_name != NULL; i++) {
|
||||
if (idx_tbl[i].pi_value == index) {
|
||||
*string = idx_tbl[i].pi_name;
|
||||
return (0);
|
||||
}
|
||||
}
|
||||
|
||||
return (-1);
|
||||
}
|
||||
|
||||
/*
|
||||
* Return a random valid property value. Used by ztest.
|
||||
*/
|
||||
uint64_t
|
||||
zprop_random_value(int prop, uint64_t seed, zfs_type_t type)
|
||||
{
|
||||
zprop_desc_t *prop_tbl;
|
||||
const zprop_index_t *idx_tbl;
|
||||
|
||||
ASSERT((uint_t)prop < zprop_get_numprops(type));
|
||||
prop_tbl = zprop_get_proptable(type);
|
||||
idx_tbl = prop_tbl[prop].pd_table;
|
||||
|
||||
if (idx_tbl == NULL)
|
||||
return (seed);
|
||||
|
||||
return (idx_tbl[seed % prop_tbl[prop].pd_table_size].pi_value);
|
||||
}
|
||||
|
||||
const char *
|
||||
zprop_values(int prop, zfs_type_t type)
|
||||
{
|
||||
zprop_desc_t *prop_tbl;
|
||||
|
||||
ASSERT(prop != ZPROP_INVAL && prop != ZPROP_CONT);
|
||||
ASSERT(prop < zprop_get_numprops(type));
|
||||
|
||||
prop_tbl = zprop_get_proptable(type);
|
||||
|
||||
return (prop_tbl[prop].pd_values);
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns TRUE if the property applies to any of the given dataset types.
|
||||
*/
|
||||
boolean_t
|
||||
zprop_valid_for_type(int prop, zfs_type_t type)
|
||||
{
|
||||
zprop_desc_t *prop_tbl;
|
||||
|
||||
if (prop == ZPROP_INVAL || prop == ZPROP_CONT)
|
||||
return (B_FALSE);
|
||||
|
||||
ASSERT(prop < zprop_get_numprops(type));
|
||||
prop_tbl = zprop_get_proptable(type);
|
||||
return ((prop_tbl[prop].pd_types & type) != 0);
|
||||
}
|
||||
|
||||
#ifndef _KERNEL
|
||||
|
||||
/*
|
||||
* Determines the minimum width for the column, and indicates whether it's fixed
|
||||
* or not. Only string columns are non-fixed.
|
||||
*/
|
||||
size_t
|
||||
zprop_width(int prop, boolean_t *fixed, zfs_type_t type)
|
||||
{
|
||||
zprop_desc_t *prop_tbl, *pd;
|
||||
const zprop_index_t *idx;
|
||||
size_t ret;
|
||||
int i;
|
||||
|
||||
ASSERT(prop != ZPROP_INVAL && prop != ZPROP_CONT);
|
||||
ASSERT(prop < zprop_get_numprops(type));
|
||||
|
||||
prop_tbl = zprop_get_proptable(type);
|
||||
pd = &prop_tbl[prop];
|
||||
|
||||
*fixed = B_TRUE;
|
||||
|
||||
/*
|
||||
* Start with the width of the column name.
|
||||
*/
|
||||
ret = strlen(pd->pd_colname);
|
||||
|
||||
/*
|
||||
* For fixed-width values, make sure the width is large enough to hold
|
||||
* any possible value.
|
||||
*/
|
||||
switch (pd->pd_proptype) {
|
||||
case PROP_TYPE_NUMBER:
|
||||
/*
|
||||
* The maximum length of a human-readable number is 5 characters
|
||||
* ("20.4M", for example).
|
||||
*/
|
||||
if (ret < 5)
|
||||
ret = 5;
|
||||
/*
|
||||
* 'creation' is handled specially because it's a number
|
||||
* internally, but displayed as a date string.
|
||||
*/
|
||||
if (prop == ZFS_PROP_CREATION)
|
||||
*fixed = B_FALSE;
|
||||
break;
|
||||
case PROP_TYPE_INDEX:
|
||||
idx = prop_tbl[prop].pd_table;
|
||||
for (i = 0; idx[i].pi_name != NULL; i++) {
|
||||
if (strlen(idx[i].pi_name) > ret)
|
||||
ret = strlen(idx[i].pi_name);
|
||||
}
|
||||
break;
|
||||
|
||||
case PROP_TYPE_STRING:
|
||||
*fixed = B_FALSE;
|
||||
break;
|
||||
}
|
||||
|
||||
return (ret);
|
||||
}
|
||||
|
||||
#endif
|
2007
uts/common/Makefile.files
Normal file
2007
uts/common/Makefile.files
Normal file
File diff suppressed because it is too large
Load Diff
@ -20,12 +20,9 @@
|
||||
*/
|
||||
|
||||
/*
|
||||
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#pragma ident "%Z%%M% %I% %E% SMI"
|
||||
|
||||
/*
|
||||
* DTrace - Dynamic Tracing for Solaris
|
||||
*
|
||||
@ -186,7 +183,9 @@ static dtrace_ecb_t *dtrace_ecb_create_cache; /* cached created ECB */
|
||||
static dtrace_genid_t dtrace_probegen; /* current probe generation */
|
||||
static dtrace_helpers_t *dtrace_deferred_pid; /* deferred helper list */
|
||||
static dtrace_enabling_t *dtrace_retained; /* list of retained enablings */
|
||||
static dtrace_genid_t dtrace_retained_gen; /* current retained enab gen */
|
||||
static dtrace_dynvar_t dtrace_dynhash_sink; /* end of dynamic hash chains */
|
||||
static int dtrace_dynvar_failclean; /* dynvars failed to clean */
|
||||
|
||||
/*
|
||||
* DTrace Locking
|
||||
@ -240,10 +239,16 @@ static void
|
||||
dtrace_nullop(void)
|
||||
{}
|
||||
|
||||
static int
|
||||
dtrace_enable_nullop(void)
|
||||
{
|
||||
return (0);
|
||||
}
|
||||
|
||||
static dtrace_pops_t dtrace_provider_ops = {
|
||||
(void (*)(void *, const dtrace_probedesc_t *))dtrace_nullop,
|
||||
(void (*)(void *, struct modctl *))dtrace_nullop,
|
||||
(void (*)(void *, dtrace_id_t, void *))dtrace_nullop,
|
||||
(int (*)(void *, dtrace_id_t, void *))dtrace_enable_nullop,
|
||||
(void (*)(void *, dtrace_id_t, void *))dtrace_nullop,
|
||||
(void (*)(void *, dtrace_id_t, void *))dtrace_nullop,
|
||||
(void (*)(void *, dtrace_id_t, void *))dtrace_nullop,
|
||||
@ -427,6 +432,7 @@ dtrace_load##bits(uintptr_t addr) \
|
||||
#define DTRACE_DYNHASH_SINK 1
|
||||
#define DTRACE_DYNHASH_VALID 2
|
||||
|
||||
#define DTRACE_MATCH_FAIL -1
|
||||
#define DTRACE_MATCH_NEXT 0
|
||||
#define DTRACE_MATCH_DONE 1
|
||||
#define DTRACE_ANCHORED(probe) ((probe)->dtpr_func[0] != '\0')
|
||||
@ -1182,12 +1188,12 @@ dtrace_dynvar_clean(dtrace_dstate_t *dstate)
|
||||
{
|
||||
dtrace_dynvar_t *dirty;
|
||||
dtrace_dstate_percpu_t *dcpu;
|
||||
int i, work = 0;
|
||||
dtrace_dynvar_t **rinsep;
|
||||
int i, j, work = 0;
|
||||
|
||||
for (i = 0; i < NCPU; i++) {
|
||||
dcpu = &dstate->dtds_percpu[i];
|
||||
|
||||
ASSERT(dcpu->dtdsc_rinsing == NULL);
|
||||
rinsep = &dcpu->dtdsc_rinsing;
|
||||
|
||||
/*
|
||||
* If the dirty list is NULL, there is no dirty work to do.
|
||||
@ -1195,14 +1201,62 @@ dtrace_dynvar_clean(dtrace_dstate_t *dstate)
|
||||
if (dcpu->dtdsc_dirty == NULL)
|
||||
continue;
|
||||
|
||||
/*
|
||||
* If the clean list is non-NULL, then we're not going to do
|
||||
* any work for this CPU -- it means that there has not been
|
||||
* a dtrace_dynvar() allocation on this CPU (or from this CPU)
|
||||
* since the last time we cleaned house.
|
||||
*/
|
||||
if (dcpu->dtdsc_clean != NULL)
|
||||
if (dcpu->dtdsc_rinsing != NULL) {
|
||||
/*
|
||||
* If the rinsing list is non-NULL, then it is because
|
||||
* this CPU was selected to accept another CPU's
|
||||
* dirty list -- and since that time, dirty buffers
|
||||
* have accumulated. This is a highly unlikely
|
||||
* condition, but we choose to ignore the dirty
|
||||
* buffers -- they'll be picked up a future cleanse.
|
||||
*/
|
||||
continue;
|
||||
}
|
||||
|
||||
if (dcpu->dtdsc_clean != NULL) {
|
||||
/*
|
||||
* If the clean list is non-NULL, then we're in a
|
||||
* situation where a CPU has done deallocations (we
|
||||
* have a non-NULL dirty list) but no allocations (we
|
||||
* also have a non-NULL clean list). We can't simply
|
||||
* move the dirty list into the clean list on this
|
||||
* CPU, yet we also don't want to allow this condition
|
||||
* to persist, lest a short clean list prevent a
|
||||
* massive dirty list from being cleaned (which in
|
||||
* turn could lead to otherwise avoidable dynamic
|
||||
* drops). To deal with this, we look for some CPU
|
||||
* with a NULL clean list, NULL dirty list, and NULL
|
||||
* rinsing list -- and then we borrow this CPU to
|
||||
* rinse our dirty list.
|
||||
*/
|
||||
for (j = 0; j < NCPU; j++) {
|
||||
dtrace_dstate_percpu_t *rinser;
|
||||
|
||||
rinser = &dstate->dtds_percpu[j];
|
||||
|
||||
if (rinser->dtdsc_rinsing != NULL)
|
||||
continue;
|
||||
|
||||
if (rinser->dtdsc_dirty != NULL)
|
||||
continue;
|
||||
|
||||
if (rinser->dtdsc_clean != NULL)
|
||||
continue;
|
||||
|
||||
rinsep = &rinser->dtdsc_rinsing;
|
||||
break;
|
||||
}
|
||||
|
||||
if (j == NCPU) {
|
||||
/*
|
||||
* We were unable to find another CPU that
|
||||
* could accept this dirty list -- we are
|
||||
* therefore unable to clean it now.
|
||||
*/
|
||||
dtrace_dynvar_failclean++;
|
||||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
work = 1;
|
||||
|
||||
@ -1219,7 +1273,7 @@ dtrace_dynvar_clean(dtrace_dstate_t *dstate)
|
||||
* on a hash chain, either the dirty list or the
|
||||
* rinsing list for some CPU must be non-NULL.)
|
||||
*/
|
||||
dcpu->dtdsc_rinsing = dirty;
|
||||
*rinsep = dirty;
|
||||
dtrace_membar_producer();
|
||||
} while (dtrace_casptr(&dcpu->dtdsc_dirty,
|
||||
dirty, NULL) != dirty);
|
||||
@ -1650,7 +1704,7 @@ retry:
|
||||
ASSERT(clean->dtdv_hashval == DTRACE_DYNHASH_FREE);
|
||||
|
||||
/*
|
||||
* Now we'll move the clean list to the free list.
|
||||
* Now we'll move the clean list to our free list.
|
||||
* It's impossible for this to fail: the only way
|
||||
* the free list can be updated is through this
|
||||
* code path, and only one CPU can own the clean list.
|
||||
@ -1663,6 +1717,7 @@ retry:
|
||||
* owners of the clean lists out before resetting
|
||||
* the clean lists.
|
||||
*/
|
||||
dcpu = &dstate->dtds_percpu[me];
|
||||
rval = dtrace_casptr(&dcpu->dtdsc_free, NULL, clean);
|
||||
ASSERT(rval == NULL);
|
||||
goto retry;
|
||||
@ -3600,7 +3655,7 @@ dtrace_dif_subr(uint_t subr, uint_t rd, uint64_t *regs,
|
||||
int64_t index = (int64_t)tupregs[1].dttk_value;
|
||||
int64_t remaining = (int64_t)tupregs[2].dttk_value;
|
||||
size_t len = dtrace_strlen((char *)s, size);
|
||||
int64_t i = 0;
|
||||
int64_t i;
|
||||
|
||||
if (!dtrace_canload(s, len + 1, mstate, vstate)) {
|
||||
regs[rd] = NULL;
|
||||
@ -6655,7 +6710,7 @@ dtrace_match(const dtrace_probekey_t *pkp, uint32_t priv, uid_t uid,
|
||||
{
|
||||
dtrace_probe_t template, *probe;
|
||||
dtrace_hash_t *hash = NULL;
|
||||
int len, best = INT_MAX, nmatched = 0;
|
||||
int len, rc, best = INT_MAX, nmatched = 0;
|
||||
dtrace_id_t i;
|
||||
|
||||
ASSERT(MUTEX_HELD(&dtrace_lock));
|
||||
@ -6667,7 +6722,8 @@ dtrace_match(const dtrace_probekey_t *pkp, uint32_t priv, uid_t uid,
|
||||
if (pkp->dtpk_id != DTRACE_IDNONE) {
|
||||
if ((probe = dtrace_probe_lookup_id(pkp->dtpk_id)) != NULL &&
|
||||
dtrace_match_probe(probe, pkp, priv, uid, zoneid) > 0) {
|
||||
(void) (*matched)(probe, arg);
|
||||
if ((*matched)(probe, arg) == DTRACE_MATCH_FAIL)
|
||||
return (DTRACE_MATCH_FAIL);
|
||||
nmatched++;
|
||||
}
|
||||
return (nmatched);
|
||||
@ -6714,8 +6770,12 @@ dtrace_match(const dtrace_probekey_t *pkp, uint32_t priv, uid_t uid,
|
||||
|
||||
nmatched++;
|
||||
|
||||
if ((*matched)(probe, arg) != DTRACE_MATCH_NEXT)
|
||||
if ((rc = (*matched)(probe, arg)) !=
|
||||
DTRACE_MATCH_NEXT) {
|
||||
if (rc == DTRACE_MATCH_FAIL)
|
||||
return (DTRACE_MATCH_FAIL);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
return (nmatched);
|
||||
@ -6734,8 +6794,11 @@ dtrace_match(const dtrace_probekey_t *pkp, uint32_t priv, uid_t uid,
|
||||
|
||||
nmatched++;
|
||||
|
||||
if ((*matched)(probe, arg) != DTRACE_MATCH_NEXT)
|
||||
if ((rc = (*matched)(probe, arg)) != DTRACE_MATCH_NEXT) {
|
||||
if (rc == DTRACE_MATCH_FAIL)
|
||||
return (DTRACE_MATCH_FAIL);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
return (nmatched);
|
||||
@ -6955,7 +7018,7 @@ dtrace_unregister(dtrace_provider_id_t id)
|
||||
dtrace_probe_t *probe, *first = NULL;
|
||||
|
||||
if (old->dtpv_pops.dtps_enable ==
|
||||
(void (*)(void *, dtrace_id_t, void *))dtrace_nullop) {
|
||||
(int (*)(void *, dtrace_id_t, void *))dtrace_enable_nullop) {
|
||||
/*
|
||||
* If DTrace itself is the provider, we're called with locks
|
||||
* already held.
|
||||
@ -7101,7 +7164,7 @@ dtrace_invalidate(dtrace_provider_id_t id)
|
||||
dtrace_provider_t *pvp = (dtrace_provider_t *)id;
|
||||
|
||||
ASSERT(pvp->dtpv_pops.dtps_enable !=
|
||||
(void (*)(void *, dtrace_id_t, void *))dtrace_nullop);
|
||||
(int (*)(void *, dtrace_id_t, void *))dtrace_enable_nullop);
|
||||
|
||||
mutex_enter(&dtrace_provider_lock);
|
||||
mutex_enter(&dtrace_lock);
|
||||
@ -7142,7 +7205,7 @@ dtrace_condense(dtrace_provider_id_t id)
|
||||
* Make sure this isn't the dtrace provider itself.
|
||||
*/
|
||||
ASSERT(prov->dtpv_pops.dtps_enable !=
|
||||
(void (*)(void *, dtrace_id_t, void *))dtrace_nullop);
|
||||
(int (*)(void *, dtrace_id_t, void *))dtrace_enable_nullop);
|
||||
|
||||
mutex_enter(&dtrace_provider_lock);
|
||||
mutex_enter(&dtrace_lock);
|
||||
@ -8103,7 +8166,7 @@ dtrace_difo_validate(dtrace_difo_t *dp, dtrace_vstate_t *vstate, uint_t nregs,
|
||||
break;
|
||||
|
||||
default:
|
||||
err += efunc(dp->dtdo_len - 1, "bad return size");
|
||||
err += efunc(dp->dtdo_len - 1, "bad return size\n");
|
||||
}
|
||||
}
|
||||
|
||||
@ -9096,7 +9159,7 @@ dtrace_ecb_add(dtrace_state_t *state, dtrace_probe_t *probe)
|
||||
return (ecb);
|
||||
}
|
||||
|
||||
static void
|
||||
static int
|
||||
dtrace_ecb_enable(dtrace_ecb_t *ecb)
|
||||
{
|
||||
dtrace_probe_t *probe = ecb->dte_probe;
|
||||
@ -9109,7 +9172,7 @@ dtrace_ecb_enable(dtrace_ecb_t *ecb)
|
||||
/*
|
||||
* This is the NULL probe -- there's nothing to do.
|
||||
*/
|
||||
return;
|
||||
return (0);
|
||||
}
|
||||
|
||||
if (probe->dtpr_ecb == NULL) {
|
||||
@ -9123,8 +9186,8 @@ dtrace_ecb_enable(dtrace_ecb_t *ecb)
|
||||
if (ecb->dte_predicate != NULL)
|
||||
probe->dtpr_predcache = ecb->dte_predicate->dtp_cacheid;
|
||||
|
||||
prov->dtpv_pops.dtps_enable(prov->dtpv_arg,
|
||||
probe->dtpr_id, probe->dtpr_arg);
|
||||
return (prov->dtpv_pops.dtps_enable(prov->dtpv_arg,
|
||||
probe->dtpr_id, probe->dtpr_arg));
|
||||
} else {
|
||||
/*
|
||||
* This probe is already active. Swing the last pointer to
|
||||
@ -9137,6 +9200,7 @@ dtrace_ecb_enable(dtrace_ecb_t *ecb)
|
||||
probe->dtpr_predcache = 0;
|
||||
|
||||
dtrace_sync();
|
||||
return (0);
|
||||
}
|
||||
}
|
||||
|
||||
@ -9920,7 +9984,9 @@ dtrace_ecb_create_enable(dtrace_probe_t *probe, void *arg)
|
||||
if ((ecb = dtrace_ecb_create(state, probe, enab)) == NULL)
|
||||
return (DTRACE_MATCH_DONE);
|
||||
|
||||
dtrace_ecb_enable(ecb);
|
||||
if (dtrace_ecb_enable(ecb) < 0)
|
||||
return (DTRACE_MATCH_FAIL);
|
||||
|
||||
return (DTRACE_MATCH_NEXT);
|
||||
}
|
||||
|
||||
@ -10557,6 +10623,7 @@ dtrace_enabling_destroy(dtrace_enabling_t *enab)
|
||||
ASSERT(enab->dten_vstate->dtvs_state != NULL);
|
||||
ASSERT(enab->dten_vstate->dtvs_state->dts_nretained > 0);
|
||||
enab->dten_vstate->dtvs_state->dts_nretained--;
|
||||
dtrace_retained_gen++;
|
||||
}
|
||||
|
||||
if (enab->dten_prev == NULL) {
|
||||
@ -10599,6 +10666,7 @@ dtrace_enabling_retain(dtrace_enabling_t *enab)
|
||||
return (ENOSPC);
|
||||
|
||||
state->dts_nretained++;
|
||||
dtrace_retained_gen++;
|
||||
|
||||
if (dtrace_retained == NULL) {
|
||||
dtrace_retained = enab;
|
||||
@ -10713,7 +10781,7 @@ static int
|
||||
dtrace_enabling_match(dtrace_enabling_t *enab, int *nmatched)
|
||||
{
|
||||
int i = 0;
|
||||
int matched = 0;
|
||||
int total_matched = 0, matched = 0;
|
||||
|
||||
ASSERT(MUTEX_HELD(&cpu_lock));
|
||||
ASSERT(MUTEX_HELD(&dtrace_lock));
|
||||
@ -10724,7 +10792,14 @@ dtrace_enabling_match(dtrace_enabling_t *enab, int *nmatched)
|
||||
enab->dten_current = ep;
|
||||
enab->dten_error = 0;
|
||||
|
||||
matched += dtrace_probe_enable(&ep->dted_probe, enab);
|
||||
/*
|
||||
* If a provider failed to enable a probe then get out and
|
||||
* let the consumer know we failed.
|
||||
*/
|
||||
if ((matched = dtrace_probe_enable(&ep->dted_probe, enab)) < 0)
|
||||
return (EBUSY);
|
||||
|
||||
total_matched += matched;
|
||||
|
||||
if (enab->dten_error != 0) {
|
||||
/*
|
||||
@ -10752,7 +10827,7 @@ dtrace_enabling_match(dtrace_enabling_t *enab, int *nmatched)
|
||||
|
||||
enab->dten_probegen = dtrace_probegen;
|
||||
if (nmatched != NULL)
|
||||
*nmatched = matched;
|
||||
*nmatched = total_matched;
|
||||
|
||||
return (0);
|
||||
}
|
||||
@ -10766,13 +10841,22 @@ dtrace_enabling_matchall(void)
|
||||
mutex_enter(&dtrace_lock);
|
||||
|
||||
/*
|
||||
* Because we can be called after dtrace_detach() has been called, we
|
||||
* cannot assert that there are retained enablings. We can safely
|
||||
* load from dtrace_retained, however: the taskq_destroy() at the
|
||||
* end of dtrace_detach() will block pending our completion.
|
||||
* Iterate over all retained enablings to see if any probes match
|
||||
* against them. We only perform this operation on enablings for which
|
||||
* we have sufficient permissions by virtue of being in the global zone
|
||||
* or in the same zone as the DTrace client. Because we can be called
|
||||
* after dtrace_detach() has been called, we cannot assert that there
|
||||
* are retained enablings. We can safely load from dtrace_retained,
|
||||
* however: the taskq_destroy() at the end of dtrace_detach() will
|
||||
* block pending our completion.
|
||||
*/
|
||||
for (enab = dtrace_retained; enab != NULL; enab = enab->dten_next)
|
||||
(void) dtrace_enabling_match(enab, NULL);
|
||||
for (enab = dtrace_retained; enab != NULL; enab = enab->dten_next) {
|
||||
cred_t *cr = enab->dten_vstate->dtvs_state->dts_cred.dcr_cred;
|
||||
|
||||
if (INGLOBALZONE(curproc) ||
|
||||
cr != NULL && getzoneid() == crgetzoneid(cr))
|
||||
(void) dtrace_enabling_match(enab, NULL);
|
||||
}
|
||||
|
||||
mutex_exit(&dtrace_lock);
|
||||
mutex_exit(&cpu_lock);
|
||||
@ -10830,6 +10914,7 @@ dtrace_enabling_provide(dtrace_provider_t *prv)
|
||||
{
|
||||
int i, all = 0;
|
||||
dtrace_probedesc_t desc;
|
||||
dtrace_genid_t gen;
|
||||
|
||||
ASSERT(MUTEX_HELD(&dtrace_lock));
|
||||
ASSERT(MUTEX_HELD(&dtrace_provider_lock));
|
||||
@ -10840,15 +10925,25 @@ dtrace_enabling_provide(dtrace_provider_t *prv)
|
||||
}
|
||||
|
||||
do {
|
||||
dtrace_enabling_t *enab = dtrace_retained;
|
||||
dtrace_enabling_t *enab;
|
||||
void *parg = prv->dtpv_arg;
|
||||
|
||||
for (; enab != NULL; enab = enab->dten_next) {
|
||||
retry:
|
||||
gen = dtrace_retained_gen;
|
||||
for (enab = dtrace_retained; enab != NULL;
|
||||
enab = enab->dten_next) {
|
||||
for (i = 0; i < enab->dten_ndesc; i++) {
|
||||
desc = enab->dten_desc[i]->dted_probe;
|
||||
mutex_exit(&dtrace_lock);
|
||||
prv->dtpv_pops.dtps_provide(parg, &desc);
|
||||
mutex_enter(&dtrace_lock);
|
||||
/*
|
||||
* Process the retained enablings again if
|
||||
* they have changed while we weren't holding
|
||||
* dtrace_lock.
|
||||
*/
|
||||
if (gen != dtrace_retained_gen)
|
||||
goto retry;
|
||||
}
|
||||
}
|
||||
} while (all && (prv = prv->dtpv_next) != NULL);
|
||||
@ -10970,7 +11065,8 @@ dtrace_dof_copyin(uintptr_t uarg, int *errp)
|
||||
|
||||
dof = kmem_alloc(hdr.dofh_loadsz, KM_SLEEP);
|
||||
|
||||
if (copyin((void *)uarg, dof, hdr.dofh_loadsz) != 0) {
|
||||
if (copyin((void *)uarg, dof, hdr.dofh_loadsz) != 0 ||
|
||||
dof->dofh_loadsz != hdr.dofh_loadsz) {
|
||||
kmem_free(dof, hdr.dofh_loadsz);
|
||||
*errp = EFAULT;
|
||||
return (NULL);
|
||||
@ -11698,6 +11794,13 @@ dtrace_dof_slurp(dof_hdr_t *dof, dtrace_vstate_t *vstate, cred_t *cr,
|
||||
}
|
||||
}
|
||||
|
||||
if (DOF_SEC_ISLOADABLE(sec->dofs_type) &&
|
||||
!(sec->dofs_flags & DOF_SECF_LOAD)) {
|
||||
dtrace_dof_error(dof, "loadable section with load "
|
||||
"flag unset");
|
||||
return (-1);
|
||||
}
|
||||
|
||||
if (!(sec->dofs_flags & DOF_SECF_LOAD))
|
||||
continue; /* just ignore non-loadable sections */
|
||||
|
||||
@ -14390,7 +14493,8 @@ dtrace_open(dev_t *devp, int flag, int otyp, cred_t *cred_p)
|
||||
* If this wasn't an open with the "helper" minor, then it must be
|
||||
* the "dtrace" minor.
|
||||
*/
|
||||
ASSERT(getminor(*devp) == DTRACEMNRN_DTRACE);
|
||||
if (getminor(*devp) != DTRACEMNRN_DTRACE)
|
||||
return (ENXIO);
|
||||
|
||||
/*
|
||||
* If no DTRACE_PRIV_* bits are set in the credential, then the
|
||||
@ -14427,7 +14531,7 @@ dtrace_open(dev_t *devp, int flag, int otyp, cred_t *cred_p)
|
||||
mutex_exit(&cpu_lock);
|
||||
|
||||
if (state == NULL) {
|
||||
if (--dtrace_opens == 0)
|
||||
if (--dtrace_opens == 0 && dtrace_anon.dta_enabling == NULL)
|
||||
(void) kdi_dtrace_set(KDI_DTSET_DTRACE_DEACTIVATE);
|
||||
mutex_exit(&dtrace_lock);
|
||||
return (EAGAIN);
|
||||
@ -14463,7 +14567,12 @@ dtrace_close(dev_t dev, int flag, int otyp, cred_t *cred_p)
|
||||
|
||||
dtrace_state_destroy(state);
|
||||
ASSERT(dtrace_opens > 0);
|
||||
if (--dtrace_opens == 0)
|
||||
|
||||
/*
|
||||
* Only relinquish control of the kernel debugger interface when there
|
||||
* are no consumers and no anonymous enablings.
|
||||
*/
|
||||
if (--dtrace_opens == 0 && dtrace_anon.dta_enabling == NULL)
|
||||
(void) kdi_dtrace_set(KDI_DTSET_DTRACE_DEACTIVATE);
|
||||
|
||||
mutex_exit(&dtrace_lock);
|
||||
@ -15458,7 +15567,8 @@ static struct dev_ops dtrace_ops = {
|
||||
nodev, /* reset */
|
||||
&dtrace_cb_ops, /* driver operations */
|
||||
NULL, /* bus operations */
|
||||
nodev /* dev power */
|
||||
nodev, /* dev power */
|
||||
ddi_quiesce_not_needed, /* quiesce */
|
||||
};
|
||||
|
||||
static struct modldrv modldrv = {
|
||||
|
@ -20,11 +20,10 @@
|
||||
*/
|
||||
|
||||
/*
|
||||
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
|
||||
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#pragma ident "%Z%%M% %I% %E% SMI"
|
||||
|
||||
#include <sys/atomic.h>
|
||||
#include <sys/errno.h>
|
||||
@ -876,7 +875,7 @@ fasttrap_disable_callbacks(void)
|
||||
}
|
||||
|
||||
/*ARGSUSED*/
|
||||
static void
|
||||
static int
|
||||
fasttrap_pid_enable(void *arg, dtrace_id_t id, void *parg)
|
||||
{
|
||||
fasttrap_probe_t *probe = parg;
|
||||
@ -904,7 +903,7 @@ fasttrap_pid_enable(void *arg, dtrace_id_t id, void *parg)
|
||||
* provider can't go away while we're in this code path.
|
||||
*/
|
||||
if (probe->ftp_prov->ftp_retired)
|
||||
return;
|
||||
return (0);
|
||||
|
||||
/*
|
||||
* If we can't find the process, it may be that we're in the context of
|
||||
@ -913,7 +912,7 @@ fasttrap_pid_enable(void *arg, dtrace_id_t id, void *parg)
|
||||
*/
|
||||
if ((p = sprlock(probe->ftp_pid)) == NULL) {
|
||||
if ((curproc->p_flag & SFORKING) == 0)
|
||||
return;
|
||||
return (0);
|
||||
|
||||
mutex_enter(&pidlock);
|
||||
p = prfind(probe->ftp_pid);
|
||||
@ -975,7 +974,7 @@ fasttrap_pid_enable(void *arg, dtrace_id_t id, void *parg)
|
||||
* drop our reference on the trap table entry.
|
||||
*/
|
||||
fasttrap_disable_callbacks();
|
||||
return;
|
||||
return (0);
|
||||
}
|
||||
}
|
||||
|
||||
@ -983,6 +982,7 @@ fasttrap_pid_enable(void *arg, dtrace_id_t id, void *parg)
|
||||
sprunlock(p);
|
||||
|
||||
probe->ftp_enabled = 1;
|
||||
return (0);
|
||||
}
|
||||
|
||||
/*ARGSUSED*/
|
||||
@ -1946,7 +1946,8 @@ fasttrap_ioctl(dev_t dev, int cmd, intptr_t arg, int md, cred_t *cr, int *rv)
|
||||
|
||||
probe = kmem_alloc(size, KM_SLEEP);
|
||||
|
||||
if (copyin(uprobe, probe, size) != 0) {
|
||||
if (copyin(uprobe, probe, size) != 0 ||
|
||||
probe->ftps_noffs != noffs) {
|
||||
kmem_free(probe, size);
|
||||
return (EFAULT);
|
||||
}
|
||||
@ -2044,13 +2045,6 @@ err:
|
||||
tp->ftt_proc->ftpc_acount != 0)
|
||||
break;
|
||||
|
||||
/*
|
||||
* The count of active providers can only be
|
||||
* decremented (i.e. to zero) during exec, exit, and
|
||||
* removal of a meta provider so it should be
|
||||
* impossible to drop the count during this operation().
|
||||
*/
|
||||
ASSERT(tp->ftt_proc->ftpc_acount != 0);
|
||||
tp = tp->ftt_next;
|
||||
}
|
||||
|
||||
@ -2346,7 +2340,8 @@ static struct dev_ops fasttrap_ops = {
|
||||
nodev, /* reset */
|
||||
&fasttrap_cb_ops, /* driver operations */
|
||||
NULL, /* bus operations */
|
||||
nodev /* dev power */
|
||||
nodev, /* dev power */
|
||||
ddi_quiesce_not_needed, /* quiesce */
|
||||
};
|
||||
|
||||
/*
|
||||
|
@ -19,11 +19,10 @@
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
|
||||
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#pragma ident "%Z%%M% %I% %E% SMI"
|
||||
|
||||
#include <sys/types.h>
|
||||
#include <sys/param.h>
|
||||
@ -84,7 +83,7 @@ static kmutex_t lockstat_test; /* for testing purposes only */
|
||||
static dtrace_provider_id_t lockstat_id;
|
||||
|
||||
/*ARGSUSED*/
|
||||
static void
|
||||
static int
|
||||
lockstat_enable(void *arg, dtrace_id_t id, void *parg)
|
||||
{
|
||||
lockstat_probe_t *probe = parg;
|
||||
@ -103,6 +102,7 @@ lockstat_enable(void *arg, dtrace_id_t id, void *parg)
|
||||
*/
|
||||
mutex_enter(&lockstat_test);
|
||||
mutex_exit(&lockstat_test);
|
||||
return (0);
|
||||
}
|
||||
|
||||
/*ARGSUSED*/
|
||||
@ -310,11 +310,13 @@ static struct dev_ops lockstat_ops = {
|
||||
nulldev, /* reset */
|
||||
&lockstat_cb_ops, /* cb_ops */
|
||||
NULL, /* bus_ops */
|
||||
NULL, /* power */
|
||||
ddi_quiesce_not_needed, /* quiesce */
|
||||
};
|
||||
|
||||
static struct modldrv modldrv = {
|
||||
&mod_driverops, /* Type of module. This one is a driver */
|
||||
"Lock Statistics %I%", /* name of module */
|
||||
"Lock Statistics", /* name of module */
|
||||
&lockstat_ops, /* driver ops */
|
||||
};
|
||||
|
||||
|
@ -19,11 +19,10 @@
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2007 Sun Microsystems, Inc. All rights reserved.
|
||||
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#pragma ident "%Z%%M% %I% %E% SMI"
|
||||
|
||||
#include <sys/errno.h>
|
||||
#include <sys/stat.h>
|
||||
@ -361,7 +360,7 @@ profile_offline(void *arg, cpu_t *cpu, void *oarg)
|
||||
}
|
||||
|
||||
/*ARGSUSED*/
|
||||
static void
|
||||
static int
|
||||
profile_enable(void *arg, dtrace_id_t id, void *parg)
|
||||
{
|
||||
profile_probe_t *prof = parg;
|
||||
@ -391,6 +390,7 @@ profile_enable(void *arg, dtrace_id_t id, void *parg)
|
||||
} else {
|
||||
prof->prof_cyclic = cyclic_add_omni(&omni);
|
||||
}
|
||||
return (0);
|
||||
}
|
||||
|
||||
/*ARGSUSED*/
|
||||
@ -539,7 +539,8 @@ static struct dev_ops profile_ops = {
|
||||
nodev, /* reset */
|
||||
&profile_cb_ops, /* driver operations */
|
||||
NULL, /* bus operations */
|
||||
nodev /* dev power */
|
||||
nodev, /* dev power */
|
||||
ddi_quiesce_not_needed, /* quiesce */
|
||||
};
|
||||
|
||||
/*
|
||||
|
@ -19,12 +19,9 @@
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
* Copyright (c) 2004, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#pragma ident "%Z%%M% %I% %E% SMI"
|
||||
|
||||
#include <sys/sdt_impl.h>
|
||||
|
||||
static dtrace_pattr_t vtrace_attr = {
|
||||
@ -43,6 +40,14 @@ static dtrace_pattr_t info_attr = {
|
||||
{ DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_ISA },
|
||||
};
|
||||
|
||||
static dtrace_pattr_t fc_attr = {
|
||||
{ DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
|
||||
{ DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
|
||||
{ DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
|
||||
{ DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_ISA },
|
||||
{ DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
|
||||
};
|
||||
|
||||
static dtrace_pattr_t fpu_attr = {
|
||||
{ DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
|
||||
{ DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
|
||||
@ -83,6 +88,14 @@ static dtrace_pattr_t xpv_attr = {
|
||||
{ DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_PLATFORM },
|
||||
};
|
||||
|
||||
static dtrace_pattr_t iscsi_attr = {
|
||||
{ DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
|
||||
{ DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
|
||||
{ DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
|
||||
{ DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_ISA },
|
||||
{ DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
|
||||
};
|
||||
|
||||
sdt_provider_t sdt_providers[] = {
|
||||
{ "vtrace", "__vtrace_", &vtrace_attr, 0 },
|
||||
{ "sysinfo", "__cpu_sysinfo_", &info_attr, 0 },
|
||||
@ -91,11 +104,17 @@ sdt_provider_t sdt_providers[] = {
|
||||
{ "sched", "__sched_", &stab_attr, 0 },
|
||||
{ "proc", "__proc_", &stab_attr, 0 },
|
||||
{ "io", "__io_", &stab_attr, 0 },
|
||||
{ "ip", "__ip_", &stab_attr, 0 },
|
||||
{ "tcp", "__tcp_", &stab_attr, 0 },
|
||||
{ "udp", "__udp_", &stab_attr, 0 },
|
||||
{ "mib", "__mib_", &stab_attr, 0 },
|
||||
{ "fsinfo", "__fsinfo_", &fsinfo_attr, 0 },
|
||||
{ "iscsi", "__iscsi_", &iscsi_attr, 0 },
|
||||
{ "nfsv3", "__nfsv3_", &stab_attr, 0 },
|
||||
{ "nfsv4", "__nfsv4_", &stab_attr, 0 },
|
||||
{ "xpv", "__xpv_", &xpv_attr, 0 },
|
||||
{ "fc", "__fc_", &fc_attr, 0 },
|
||||
{ "srp", "__srp_", &fc_attr, 0 },
|
||||
{ "sysevent", "__sysevent_", &stab_attr, 0 },
|
||||
{ "sdt", NULL, &sdt_attr, 0 },
|
||||
{ NULL }
|
||||
@ -169,6 +188,73 @@ sdt_argdesc_t sdt_args[] = {
|
||||
{ "fsinfo", NULL, 0, 0, "vnode_t *", "fileinfo_t *" },
|
||||
{ "fsinfo", NULL, 1, 1, "int", "int" },
|
||||
|
||||
{ "iscsi", "async-send", 0, 0, "idm_conn_t *", "conninfo_t *" },
|
||||
{ "iscsi", "async-send", 1, 1, "iscsi_async_evt_hdr_t *",
|
||||
"iscsiinfo_t *" },
|
||||
{ "iscsi", "login-command", 0, 0, "idm_conn_t *", "conninfo_t *" },
|
||||
{ "iscsi", "login-command", 1, 1, "iscsi_login_hdr_t *",
|
||||
"iscsiinfo_t *" },
|
||||
{ "iscsi", "login-response", 0, 0, "idm_conn_t *", "conninfo_t *" },
|
||||
{ "iscsi", "login-response", 1, 1, "iscsi_login_rsp_hdr_t *",
|
||||
"iscsiinfo_t *" },
|
||||
{ "iscsi", "logout-command", 0, 0, "idm_conn_t *", "conninfo_t *" },
|
||||
{ "iscsi", "logout-command", 1, 1, "iscsi_logout_hdr_t *",
|
||||
"iscsiinfo_t *" },
|
||||
{ "iscsi", "logout-response", 0, 0, "idm_conn_t *", "conninfo_t *" },
|
||||
{ "iscsi", "logout-response", 1, 1, "iscsi_logout_rsp_hdr_t *",
|
||||
"iscsiinfo_t *" },
|
||||
{ "iscsi", "data-request", 0, 0, "idm_conn_t *", "conninfo_t *" },
|
||||
{ "iscsi", "data-request", 1, 1, "iscsi_rtt_hdr_t *",
|
||||
"iscsiinfo_t *" },
|
||||
{ "iscsi", "data-send", 0, 0, "idm_conn_t *", "conninfo_t *" },
|
||||
{ "iscsi", "data-send", 1, 1, "iscsi_data_rsp_hdr_t *",
|
||||
"iscsiinfo_t *" },
|
||||
{ "iscsi", "data-receive", 0, 0, "idm_conn_t *", "conninfo_t *" },
|
||||
{ "iscsi", "data-receive", 1, 1, "iscsi_data_hdr_t *",
|
||||
"iscsiinfo_t *" },
|
||||
{ "iscsi", "nop-send", 0, 0, "idm_conn_t *", "conninfo_t *" },
|
||||
{ "iscsi", "nop-send", 1, 1, "iscsi_nop_in_hdr_t *", "iscsiinfo_t *" },
|
||||
{ "iscsi", "nop-receive", 0, 0, "idm_conn_t *", "conninfo_t *" },
|
||||
{ "iscsi", "nop-receive", 1, 1, "iscsi_nop_out_hdr_t *",
|
||||
"iscsiinfo_t *" },
|
||||
{ "iscsi", "scsi-command", 0, 0, "idm_conn_t *", "conninfo_t *" },
|
||||
{ "iscsi", "scsi-command", 1, 1, "iscsi_scsi_cmd_hdr_t *",
|
||||
"iscsiinfo_t *" },
|
||||
{ "iscsi", "scsi-command", 2, 2, "scsi_task_t *", "scsicmd_t *" },
|
||||
{ "iscsi", "scsi-response", 0, 0, "idm_conn_t *", "conninfo_t *" },
|
||||
{ "iscsi", "scsi-response", 1, 1, "iscsi_scsi_rsp_hdr_t *",
|
||||
"iscsiinfo_t *" },
|
||||
{ "iscsi", "task-command", 0, 0, "idm_conn_t *", "conninfo_t *" },
|
||||
{ "iscsi", "task-command", 1, 1, "iscsi_scsi_task_mgt_hdr_t *",
|
||||
"iscsiinfo_t *" },
|
||||
{ "iscsi", "task-response", 0, 0, "idm_conn_t *", "conninfo_t *" },
|
||||
{ "iscsi", "task-response", 1, 1, "iscsi_scsi_task_mgt_rsp_hdr_t *",
|
||||
"iscsiinfo_t *" },
|
||||
{ "iscsi", "text-command", 0, 0, "idm_conn_t *", "conninfo_t *" },
|
||||
{ "iscsi", "text-command", 1, 1, "iscsi_text_hdr_t *",
|
||||
"iscsiinfo_t *" },
|
||||
{ "iscsi", "text-response", 0, 0, "idm_conn_t *", "conninfo_t *" },
|
||||
{ "iscsi", "text-response", 1, 1, "iscsi_text_rsp_hdr_t *",
|
||||
"iscsiinfo_t *" },
|
||||
{ "iscsi", "xfer-start", 0, 0, "idm_conn_t *", "conninfo_t *" },
|
||||
{ "iscsi", "xfer-start", 1, 0, "idm_conn_t *", "iscsiinfo_t *" },
|
||||
{ "iscsi", "xfer-start", 2, 1, "uintptr_t", "xferinfo_t *" },
|
||||
{ "iscsi", "xfer-start", 3, 2, "uint32_t"},
|
||||
{ "iscsi", "xfer-start", 4, 3, "uintptr_t"},
|
||||
{ "iscsi", "xfer-start", 5, 4, "uint32_t"},
|
||||
{ "iscsi", "xfer-start", 6, 5, "uint32_t"},
|
||||
{ "iscsi", "xfer-start", 7, 6, "uint32_t"},
|
||||
{ "iscsi", "xfer-start", 8, 7, "int"},
|
||||
{ "iscsi", "xfer-done", 0, 0, "idm_conn_t *", "conninfo_t *" },
|
||||
{ "iscsi", "xfer-done", 1, 0, "idm_conn_t *", "iscsiinfo_t *" },
|
||||
{ "iscsi", "xfer-done", 2, 1, "uintptr_t", "xferinfo_t *" },
|
||||
{ "iscsi", "xfer-done", 3, 2, "uint32_t"},
|
||||
{ "iscsi", "xfer-done", 4, 3, "uintptr_t"},
|
||||
{ "iscsi", "xfer-done", 5, 4, "uint32_t"},
|
||||
{ "iscsi", "xfer-done", 6, 5, "uint32_t"},
|
||||
{ "iscsi", "xfer-done", 7, 6, "uint32_t"},
|
||||
{ "iscsi", "xfer-done", 8, 7, "int"},
|
||||
|
||||
{ "nfsv3", "op-getattr-start", 0, 0, "struct svc_req *",
|
||||
"conninfo_t *" },
|
||||
{ "nfsv3", "op-getattr-start", 1, 1, "nfsv3oparg_t *",
|
||||
@ -788,6 +874,75 @@ sdt_argdesc_t sdt_args[] = {
|
||||
"nfsv4cbinfo_t *" },
|
||||
{ "nfsv4", "cb-recall-done", 2, 2, "CB_RECALL4res *" },
|
||||
|
||||
{ "ip", "send", 0, 0, "mblk_t *", "pktinfo_t *" },
|
||||
{ "ip", "send", 1, 1, "conn_t *", "csinfo_t *" },
|
||||
{ "ip", "send", 2, 2, "void_ip_t *", "ipinfo_t *" },
|
||||
{ "ip", "send", 3, 3, "__dtrace_ipsr_ill_t *", "ifinfo_t *" },
|
||||
{ "ip", "send", 4, 4, "ipha_t *", "ipv4info_t *" },
|
||||
{ "ip", "send", 5, 5, "ip6_t *", "ipv6info_t *" },
|
||||
{ "ip", "send", 6, 6, "int" }, /* used by __dtrace_ipsr_ill_t */
|
||||
{ "ip", "receive", 0, 0, "mblk_t *", "pktinfo_t *" },
|
||||
{ "ip", "receive", 1, 1, "conn_t *", "csinfo_t *" },
|
||||
{ "ip", "receive", 2, 2, "void_ip_t *", "ipinfo_t *" },
|
||||
{ "ip", "receive", 3, 3, "__dtrace_ipsr_ill_t *", "ifinfo_t *" },
|
||||
{ "ip", "receive", 4, 4, "ipha_t *", "ipv4info_t *" },
|
||||
{ "ip", "receive", 5, 5, "ip6_t *", "ipv6info_t *" },
|
||||
{ "ip", "receive", 6, 6, "int" }, /* used by __dtrace_ipsr_ill_t */
|
||||
|
||||
{ "tcp", "connect-established", 0, 0, "mblk_t *", "pktinfo_t *" },
|
||||
{ "tcp", "connect-established", 1, 1, "ip_xmit_attr_t *",
|
||||
"csinfo_t *" },
|
||||
{ "tcp", "connect-established", 2, 2, "void_ip_t *", "ipinfo_t *" },
|
||||
{ "tcp", "connect-established", 3, 3, "tcp_t *", "tcpsinfo_t *" },
|
||||
{ "tcp", "connect-established", 4, 4, "tcph_t *", "tcpinfo_t *" },
|
||||
{ "tcp", "connect-refused", 0, 0, "mblk_t *", "pktinfo_t *" },
|
||||
{ "tcp", "connect-refused", 1, 1, "ip_xmit_attr_t *", "csinfo_t *" },
|
||||
{ "tcp", "connect-refused", 2, 2, "void_ip_t *", "ipinfo_t *" },
|
||||
{ "tcp", "connect-refused", 3, 3, "tcp_t *", "tcpsinfo_t *" },
|
||||
{ "tcp", "connect-refused", 4, 4, "tcph_t *", "tcpinfo_t *" },
|
||||
{ "tcp", "connect-request", 0, 0, "mblk_t *", "pktinfo_t *" },
|
||||
{ "tcp", "connect-request", 1, 1, "ip_xmit_attr_t *", "csinfo_t *" },
|
||||
{ "tcp", "connect-request", 2, 2, "void_ip_t *", "ipinfo_t *" },
|
||||
{ "tcp", "connect-request", 3, 3, "tcp_t *", "tcpsinfo_t *" },
|
||||
{ "tcp", "connect-request", 4, 4, "tcph_t *", "tcpinfo_t *" },
|
||||
{ "tcp", "accept-established", 0, 0, "mblk_t *", "pktinfo_t *" },
|
||||
{ "tcp", "accept-established", 1, 1, "ip_xmit_attr_t *", "csinfo_t *" },
|
||||
{ "tcp", "accept-established", 2, 2, "void_ip_t *", "ipinfo_t *" },
|
||||
{ "tcp", "accept-established", 3, 3, "tcp_t *", "tcpsinfo_t *" },
|
||||
{ "tcp", "accept-established", 4, 4, "tcph_t *", "tcpinfo_t *" },
|
||||
{ "tcp", "accept-refused", 0, 0, "mblk_t *", "pktinfo_t *" },
|
||||
{ "tcp", "accept-refused", 1, 1, "ip_xmit_attr_t *", "csinfo_t *" },
|
||||
{ "tcp", "accept-refused", 2, 2, "void_ip_t *", "ipinfo_t *" },
|
||||
{ "tcp", "accept-refused", 3, 3, "tcp_t *", "tcpsinfo_t *" },
|
||||
{ "tcp", "accept-refused", 4, 4, "tcph_t *", "tcpinfo_t *" },
|
||||
{ "tcp", "state-change", 0, 0, "void", "void" },
|
||||
{ "tcp", "state-change", 1, 1, "ip_xmit_attr_t *", "csinfo_t *" },
|
||||
{ "tcp", "state-change", 2, 2, "void", "void" },
|
||||
{ "tcp", "state-change", 3, 3, "tcp_t *", "tcpsinfo_t *" },
|
||||
{ "tcp", "state-change", 4, 4, "void", "void" },
|
||||
{ "tcp", "state-change", 5, 5, "int32_t", "tcplsinfo_t *" },
|
||||
{ "tcp", "send", 0, 0, "mblk_t *", "pktinfo_t *" },
|
||||
{ "tcp", "send", 1, 1, "ip_xmit_attr_t *", "csinfo_t *" },
|
||||
{ "tcp", "send", 2, 2, "__dtrace_tcp_void_ip_t *", "ipinfo_t *" },
|
||||
{ "tcp", "send", 3, 3, "tcp_t *", "tcpsinfo_t *" },
|
||||
{ "tcp", "send", 4, 4, "__dtrace_tcp_tcph_t *", "tcpinfo_t *" },
|
||||
{ "tcp", "receive", 0, 0, "mblk_t *", "pktinfo_t *" },
|
||||
{ "tcp", "receive", 1, 1, "ip_xmit_attr_t *", "csinfo_t *" },
|
||||
{ "tcp", "receive", 2, 2, "__dtrace_tcp_void_ip_t *", "ipinfo_t *" },
|
||||
{ "tcp", "receive", 3, 3, "tcp_t *", "tcpsinfo_t *" },
|
||||
{ "tcp", "receive", 4, 4, "__dtrace_tcp_tcph_t *", "tcpinfo_t *" },
|
||||
|
||||
{ "udp", "send", 0, 0, "mblk_t *", "pktinfo_t *" },
|
||||
{ "udp", "send", 1, 1, "ip_xmit_attr_t *", "csinfo_t *" },
|
||||
{ "udp", "send", 2, 2, "void_ip_t *", "ipinfo_t *" },
|
||||
{ "udp", "send", 3, 3, "udp_t *", "udpsinfo_t *" },
|
||||
{ "udp", "send", 4, 4, "udpha_t *", "udpinfo_t *" },
|
||||
{ "udp", "receive", 0, 0, "mblk_t *", "pktinfo_t *" },
|
||||
{ "udp", "receive", 1, 1, "ip_xmit_attr_t *", "csinfo_t *" },
|
||||
{ "udp", "receive", 2, 2, "void_ip_t *", "ipinfo_t *" },
|
||||
{ "udp", "receive", 3, 3, "udp_t *", "udpsinfo_t *" },
|
||||
{ "udp", "receive", 4, 4, "udpha_t *", "udpinfo_t *" },
|
||||
|
||||
{ "sysevent", "post", 0, 0, "evch_bind_t *", "syseventchaninfo_t *" },
|
||||
{ "sysevent", "post", 1, 1, "sysevent_impl_t *", "syseventinfo_t *" },
|
||||
|
||||
@ -848,6 +1003,154 @@ sdt_argdesc_t sdt_args[] = {
|
||||
{ "xpv", "setvcpucontext-end", 0, 0, "int" },
|
||||
{ "xpv", "setvcpucontext-start", 0, 0, "domid_t" },
|
||||
{ "xpv", "setvcpucontext-start", 1, 1, "vcpu_guest_context_t *" },
|
||||
|
||||
{ "srp", "service-up", 0, 0, "srpt_session_t *", "conninfo_t *" },
|
||||
{ "srp", "service-up", 1, 0, "srpt_session_t *", "srp_portinfo_t *" },
|
||||
{ "srp", "service-down", 0, 0, "srpt_session_t *", "conninfo_t *" },
|
||||
{ "srp", "service-down", 1, 0, "srpt_session_t *",
|
||||
"srp_portinfo_t *" },
|
||||
{ "srp", "login-command", 0, 0, "srpt_session_t *", "conninfo_t *" },
|
||||
{ "srp", "login-command", 1, 0, "srpt_session_t *",
|
||||
"srp_portinfo_t *" },
|
||||
{ "srp", "login-command", 2, 1, "srp_login_req_t *",
|
||||
"srp_logininfo_t *" },
|
||||
{ "srp", "login-response", 0, 0, "srpt_session_t *", "conninfo_t *" },
|
||||
{ "srp", "login-response", 1, 0, "srpt_session_t *",
|
||||
"srp_portinfo_t *" },
|
||||
{ "srp", "login-response", 2, 1, "srp_login_rsp_t *",
|
||||
"srp_logininfo_t *" },
|
||||
{ "srp", "login-response", 3, 2, "srp_login_rej_t *" },
|
||||
{ "srp", "logout-command", 0, 0, "srpt_channel_t *", "conninfo_t *" },
|
||||
{ "srp", "logout-command", 1, 0, "srpt_channel_t *",
|
||||
"srp_portinfo_t *" },
|
||||
{ "srp", "task-command", 0, 0, "srpt_channel_t *", "conninfo_t *" },
|
||||
{ "srp", "task-command", 1, 0, "srpt_channel_t *",
|
||||
"srp_portinfo_t *" },
|
||||
{ "srp", "task-command", 2, 1, "srp_cmd_req_t *", "srp_taskinfo_t *" },
|
||||
{ "srp", "task-response", 0, 0, "srpt_channel_t *", "conninfo_t *" },
|
||||
{ "srp", "task-response", 1, 0, "srpt_channel_t *",
|
||||
"srp_portinfo_t *" },
|
||||
{ "srp", "task-response", 2, 1, "srp_rsp_t *", "srp_taskinfo_t *" },
|
||||
{ "srp", "task-response", 3, 2, "scsi_task_t *" },
|
||||
{ "srp", "task-response", 4, 3, "int8_t" },
|
||||
{ "srp", "scsi-command", 0, 0, "srpt_channel_t *", "conninfo_t *" },
|
||||
{ "srp", "scsi-command", 1, 0, "srpt_channel_t *",
|
||||
"srp_portinfo_t *" },
|
||||
{ "srp", "scsi-command", 2, 1, "scsi_task_t *", "scsicmd_t *" },
|
||||
{ "srp", "scsi-command", 3, 2, "srp_cmd_req_t *", "srp_taskinfo_t *" },
|
||||
{ "srp", "scsi-response", 0, 0, "srpt_channel_t *", "conninfo_t *" },
|
||||
{ "srp", "scsi-response", 1, 0, "srpt_channel_t *",
|
||||
"srp_portinfo_t *" },
|
||||
{ "srp", "scsi-response", 2, 1, "srp_rsp_t *", "srp_taskinfo_t *" },
|
||||
{ "srp", "scsi-response", 3, 2, "scsi_task_t *" },
|
||||
{ "srp", "scsi-response", 4, 3, "int8_t" },
|
||||
{ "srp", "xfer-start", 0, 0, "srpt_channel_t *", "conninfo_t *" },
|
||||
{ "srp", "xfer-start", 1, 0, "srpt_channel_t *",
|
||||
"srp_portinfo_t *" },
|
||||
{ "srp", "xfer-start", 2, 1, "ibt_wr_ds_t *", "xferinfo_t *" },
|
||||
{ "srp", "xfer-start", 3, 2, "srpt_iu_t *", "srp_taskinfo_t *" },
|
||||
{ "srp", "xfer-start", 4, 3, "ibt_send_wr_t *"},
|
||||
{ "srp", "xfer-start", 5, 4, "uint32_t" },
|
||||
{ "srp", "xfer-start", 6, 5, "uint32_t" },
|
||||
{ "srp", "xfer-start", 7, 6, "uint32_t" },
|
||||
{ "srp", "xfer-start", 8, 7, "uint32_t" },
|
||||
{ "srp", "xfer-done", 0, 0, "srpt_channel_t *", "conninfo_t *" },
|
||||
{ "srp", "xfer-done", 1, 0, "srpt_channel_t *",
|
||||
"srp_portinfo_t *" },
|
||||
{ "srp", "xfer-done", 2, 1, "ibt_wr_ds_t *", "xferinfo_t *" },
|
||||
{ "srp", "xfer-done", 3, 2, "srpt_iu_t *", "srp_taskinfo_t *" },
|
||||
{ "srp", "xfer-done", 4, 3, "ibt_send_wr_t *"},
|
||||
{ "srp", "xfer-done", 5, 4, "uint32_t" },
|
||||
{ "srp", "xfer-done", 6, 5, "uint32_t" },
|
||||
{ "srp", "xfer-done", 7, 6, "uint32_t" },
|
||||
{ "srp", "xfer-done", 8, 7, "uint32_t" },
|
||||
|
||||
{ "fc", "link-up", 0, 0, "fct_i_local_port_t *", "conninfo_t *" },
|
||||
{ "fc", "link-down", 0, 0, "fct_i_local_port_t *", "conninfo_t *" },
|
||||
{ "fc", "fabric-login-start", 0, 0, "fct_i_local_port_t *",
|
||||
"conninfo_t *" },
|
||||
{ "fc", "fabric-login-start", 1, 0, "fct_i_local_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "fabric-login-end", 0, 0, "fct_i_local_port_t *",
|
||||
"conninfo_t *" },
|
||||
{ "fc", "fabric-login-end", 1, 0, "fct_i_local_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "rport-login-start", 0, 0, "fct_cmd_t *",
|
||||
"conninfo_t *" },
|
||||
{ "fc", "rport-login-start", 1, 1, "fct_local_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "rport-login-start", 2, 2, "fct_i_remote_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "rport-login-start", 3, 3, "int", "int" },
|
||||
{ "fc", "rport-login-end", 0, 0, "fct_cmd_t *",
|
||||
"conninfo_t *" },
|
||||
{ "fc", "rport-login-end", 1, 1, "fct_local_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "rport-login-end", 2, 2, "fct_i_remote_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "rport-login-end", 3, 3, "int", "int" },
|
||||
{ "fc", "rport-login-end", 4, 4, "int", "int" },
|
||||
{ "fc", "rport-logout-start", 0, 0, "fct_cmd_t *",
|
||||
"conninfo_t *" },
|
||||
{ "fc", "rport-logout-start", 1, 1, "fct_local_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "rport-logout-start", 2, 2, "fct_i_remote_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "rport-logout-start", 3, 3, "int", "int" },
|
||||
{ "fc", "rport-logout-end", 0, 0, "fct_cmd_t *",
|
||||
"conninfo_t *" },
|
||||
{ "fc", "rport-logout-end", 1, 1, "fct_local_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "rport-logout-end", 2, 2, "fct_i_remote_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "rport-logout-end", 3, 3, "int", "int" },
|
||||
{ "fc", "scsi-command", 0, 0, "fct_cmd_t *",
|
||||
"conninfo_t *" },
|
||||
{ "fc", "scsi-command", 1, 1, "fct_i_local_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "scsi-command", 2, 2, "scsi_task_t *",
|
||||
"scsicmd_t *" },
|
||||
{ "fc", "scsi-command", 3, 3, "fct_i_remote_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "scsi-response", 0, 0, "fct_cmd_t *",
|
||||
"conninfo_t *" },
|
||||
{ "fc", "scsi-response", 1, 1, "fct_i_local_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "scsi-response", 2, 2, "scsi_task_t *",
|
||||
"scsicmd_t *" },
|
||||
{ "fc", "scsi-response", 3, 3, "fct_i_remote_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "xfer-start", 0, 0, "fct_cmd_t *",
|
||||
"conninfo_t *" },
|
||||
{ "fc", "xfer-start", 1, 1, "fct_i_local_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "xfer-start", 2, 2, "scsi_task_t *",
|
||||
"scsicmd_t *" },
|
||||
{ "fc", "xfer-start", 3, 3, "fct_i_remote_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "xfer-start", 4, 4, "stmf_data_buf_t *",
|
||||
"fc_xferinfo_t *" },
|
||||
{ "fc", "xfer-done", 0, 0, "fct_cmd_t *",
|
||||
"conninfo_t *" },
|
||||
{ "fc", "xfer-done", 1, 1, "fct_i_local_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "xfer-done", 2, 2, "scsi_task_t *",
|
||||
"scsicmd_t *" },
|
||||
{ "fc", "xfer-done", 3, 3, "fct_i_remote_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "xfer-done", 4, 4, "stmf_data_buf_t *",
|
||||
"fc_xferinfo_t *" },
|
||||
{ "fc", "rscn-receive", 0, 0, "fct_i_local_port_t *",
|
||||
"conninfo_t *" },
|
||||
{ "fc", "rscn-receive", 1, 1, "int", "int"},
|
||||
{ "fc", "abts-receive", 0, 0, "fct_cmd_t *",
|
||||
"conninfo_t *" },
|
||||
{ "fc", "abts-receive", 1, 1, "fct_i_local_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
{ "fc", "abts-receive", 2, 2, "fct_i_remote_port_t *",
|
||||
"fc_port_info_t *" },
|
||||
|
||||
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
|
@ -19,11 +19,10 @@
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2006 Sun Microsystems, Inc. All rights reserved.
|
||||
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#pragma ident "%Z%%M% %I% %E% SMI"
|
||||
|
||||
#include <sys/dtrace.h>
|
||||
#include <sys/systrace.h>
|
||||
@ -141,7 +140,7 @@ systrace_destroy(void *arg, dtrace_id_t id, void *parg)
|
||||
}
|
||||
|
||||
/*ARGSUSED*/
|
||||
static void
|
||||
static int
|
||||
systrace_enable(void *arg, dtrace_id_t id, void *parg)
|
||||
{
|
||||
int sysnum = SYSTRACE_SYSNUM((uintptr_t)parg);
|
||||
@ -162,7 +161,7 @@ systrace_enable(void *arg, dtrace_id_t id, void *parg)
|
||||
|
||||
if (enabled) {
|
||||
ASSERT(sysent[sysnum].sy_callc == dtrace_systrace_syscall);
|
||||
return;
|
||||
return (0);
|
||||
}
|
||||
|
||||
(void) casptr(&sysent[sysnum].sy_callc,
|
||||
@ -173,6 +172,7 @@ systrace_enable(void *arg, dtrace_id_t id, void *parg)
|
||||
(void *)systrace_sysent32[sysnum].stsy_underlying,
|
||||
(void *)dtrace_systrace_syscall32);
|
||||
#endif
|
||||
return (0);
|
||||
}
|
||||
|
||||
/*ARGSUSED*/
|
||||
@ -336,7 +336,8 @@ static struct dev_ops systrace_ops = {
|
||||
nodev, /* reset */
|
||||
&systrace_cb_ops, /* driver operations */
|
||||
NULL, /* bus operations */
|
||||
nodev /* dev power */
|
||||
nodev, /* dev power */
|
||||
ddi_quiesce_not_needed, /* quiesce */
|
||||
};
|
||||
|
||||
/*
|
||||
|
1178
uts/common/fs/gfs.c
Normal file
1178
uts/common/fs/gfs.c
Normal file
File diff suppressed because it is too large
Load Diff
4536
uts/common/fs/vnode.c
Normal file
4536
uts/common/fs/vnode.c
Normal file
File diff suppressed because it is too large
Load Diff
4658
uts/common/fs/zfs/arc.c
Normal file
4658
uts/common/fs/zfs/arc.c
Normal file
File diff suppressed because it is too large
Load Diff
69
uts/common/fs/zfs/bplist.c
Normal file
69
uts/common/fs/zfs/bplist.c
Normal file
@ -0,0 +1,69 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#include <sys/bplist.h>
|
||||
#include <sys/zfs_context.h>
|
||||
|
||||
|
||||
void
|
||||
bplist_create(bplist_t *bpl)
|
||||
{
|
||||
mutex_init(&bpl->bpl_lock, NULL, MUTEX_DEFAULT, NULL);
|
||||
list_create(&bpl->bpl_list, sizeof (bplist_entry_t),
|
||||
offsetof(bplist_entry_t, bpe_node));
|
||||
}
|
||||
|
||||
void
|
||||
bplist_destroy(bplist_t *bpl)
|
||||
{
|
||||
list_destroy(&bpl->bpl_list);
|
||||
mutex_destroy(&bpl->bpl_lock);
|
||||
}
|
||||
|
||||
void
|
||||
bplist_append(bplist_t *bpl, const blkptr_t *bp)
|
||||
{
|
||||
bplist_entry_t *bpe = kmem_alloc(sizeof (*bpe), KM_SLEEP);
|
||||
|
||||
mutex_enter(&bpl->bpl_lock);
|
||||
bpe->bpe_blk = *bp;
|
||||
list_insert_tail(&bpl->bpl_list, bpe);
|
||||
mutex_exit(&bpl->bpl_lock);
|
||||
}
|
||||
|
||||
void
|
||||
bplist_iterate(bplist_t *bpl, bplist_itor_t *func, void *arg, dmu_tx_t *tx)
|
||||
{
|
||||
bplist_entry_t *bpe;
|
||||
|
||||
mutex_enter(&bpl->bpl_lock);
|
||||
while (bpe = list_head(&bpl->bpl_list)) {
|
||||
list_remove(&bpl->bpl_list, bpe);
|
||||
mutex_exit(&bpl->bpl_lock);
|
||||
func(arg, &bpe->bpe_blk, tx);
|
||||
kmem_free(bpe, sizeof (*bpe));
|
||||
mutex_enter(&bpl->bpl_lock);
|
||||
}
|
||||
mutex_exit(&bpl->bpl_lock);
|
||||
}
|
495
uts/common/fs/zfs/bpobj.c
Normal file
495
uts/common/fs/zfs/bpobj.c
Normal file
@ -0,0 +1,495 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#include <sys/bpobj.h>
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/refcount.h>
|
||||
|
||||
uint64_t
|
||||
bpobj_alloc(objset_t *os, int blocksize, dmu_tx_t *tx)
|
||||
{
|
||||
int size;
|
||||
|
||||
if (spa_version(dmu_objset_spa(os)) < SPA_VERSION_BPOBJ_ACCOUNT)
|
||||
size = BPOBJ_SIZE_V0;
|
||||
else if (spa_version(dmu_objset_spa(os)) < SPA_VERSION_DEADLISTS)
|
||||
size = BPOBJ_SIZE_V1;
|
||||
else
|
||||
size = sizeof (bpobj_phys_t);
|
||||
|
||||
return (dmu_object_alloc(os, DMU_OT_BPOBJ, blocksize,
|
||||
DMU_OT_BPOBJ_HDR, size, tx));
|
||||
}
|
||||
|
||||
void
|
||||
bpobj_free(objset_t *os, uint64_t obj, dmu_tx_t *tx)
|
||||
{
|
||||
int64_t i;
|
||||
bpobj_t bpo;
|
||||
dmu_object_info_t doi;
|
||||
int epb;
|
||||
dmu_buf_t *dbuf = NULL;
|
||||
|
||||
VERIFY3U(0, ==, bpobj_open(&bpo, os, obj));
|
||||
|
||||
mutex_enter(&bpo.bpo_lock);
|
||||
|
||||
if (!bpo.bpo_havesubobj || bpo.bpo_phys->bpo_subobjs == 0)
|
||||
goto out;
|
||||
|
||||
VERIFY3U(0, ==, dmu_object_info(os, bpo.bpo_phys->bpo_subobjs, &doi));
|
||||
epb = doi.doi_data_block_size / sizeof (uint64_t);
|
||||
|
||||
for (i = bpo.bpo_phys->bpo_num_subobjs - 1; i >= 0; i--) {
|
||||
uint64_t *objarray;
|
||||
uint64_t offset, blkoff;
|
||||
|
||||
offset = i * sizeof (uint64_t);
|
||||
blkoff = P2PHASE(i, epb);
|
||||
|
||||
if (dbuf == NULL || dbuf->db_offset > offset) {
|
||||
if (dbuf)
|
||||
dmu_buf_rele(dbuf, FTAG);
|
||||
VERIFY3U(0, ==, dmu_buf_hold(os,
|
||||
bpo.bpo_phys->bpo_subobjs, offset, FTAG, &dbuf, 0));
|
||||
}
|
||||
|
||||
ASSERT3U(offset, >=, dbuf->db_offset);
|
||||
ASSERT3U(offset, <, dbuf->db_offset + dbuf->db_size);
|
||||
|
||||
objarray = dbuf->db_data;
|
||||
bpobj_free(os, objarray[blkoff], tx);
|
||||
}
|
||||
if (dbuf) {
|
||||
dmu_buf_rele(dbuf, FTAG);
|
||||
dbuf = NULL;
|
||||
}
|
||||
VERIFY3U(0, ==, dmu_object_free(os, bpo.bpo_phys->bpo_subobjs, tx));
|
||||
|
||||
out:
|
||||
mutex_exit(&bpo.bpo_lock);
|
||||
bpobj_close(&bpo);
|
||||
|
||||
VERIFY3U(0, ==, dmu_object_free(os, obj, tx));
|
||||
}
|
||||
|
||||
int
|
||||
bpobj_open(bpobj_t *bpo, objset_t *os, uint64_t object)
|
||||
{
|
||||
dmu_object_info_t doi;
|
||||
int err;
|
||||
|
||||
err = dmu_object_info(os, object, &doi);
|
||||
if (err)
|
||||
return (err);
|
||||
|
||||
bzero(bpo, sizeof (*bpo));
|
||||
mutex_init(&bpo->bpo_lock, NULL, MUTEX_DEFAULT, NULL);
|
||||
|
||||
ASSERT(bpo->bpo_dbuf == NULL);
|
||||
ASSERT(bpo->bpo_phys == NULL);
|
||||
ASSERT(object != 0);
|
||||
ASSERT3U(doi.doi_type, ==, DMU_OT_BPOBJ);
|
||||
ASSERT3U(doi.doi_bonus_type, ==, DMU_OT_BPOBJ_HDR);
|
||||
|
||||
err = dmu_bonus_hold(os, object, bpo, &bpo->bpo_dbuf);
|
||||
if (err)
|
||||
return (err);
|
||||
|
||||
bpo->bpo_os = os;
|
||||
bpo->bpo_object = object;
|
||||
bpo->bpo_epb = doi.doi_data_block_size >> SPA_BLKPTRSHIFT;
|
||||
bpo->bpo_havecomp = (doi.doi_bonus_size > BPOBJ_SIZE_V0);
|
||||
bpo->bpo_havesubobj = (doi.doi_bonus_size > BPOBJ_SIZE_V1);
|
||||
bpo->bpo_phys = bpo->bpo_dbuf->db_data;
|
||||
return (0);
|
||||
}
|
||||
|
||||
void
|
||||
bpobj_close(bpobj_t *bpo)
|
||||
{
|
||||
/* Lame workaround for closing a bpobj that was never opened. */
|
||||
if (bpo->bpo_object == 0)
|
||||
return;
|
||||
|
||||
dmu_buf_rele(bpo->bpo_dbuf, bpo);
|
||||
if (bpo->bpo_cached_dbuf != NULL)
|
||||
dmu_buf_rele(bpo->bpo_cached_dbuf, bpo);
|
||||
bpo->bpo_dbuf = NULL;
|
||||
bpo->bpo_phys = NULL;
|
||||
bpo->bpo_cached_dbuf = NULL;
|
||||
bpo->bpo_object = 0;
|
||||
|
||||
mutex_destroy(&bpo->bpo_lock);
|
||||
}
|
||||
|
||||
static int
|
||||
bpobj_iterate_impl(bpobj_t *bpo, bpobj_itor_t func, void *arg, dmu_tx_t *tx,
|
||||
boolean_t free)
|
||||
{
|
||||
dmu_object_info_t doi;
|
||||
int epb;
|
||||
int64_t i;
|
||||
int err = 0;
|
||||
dmu_buf_t *dbuf = NULL;
|
||||
|
||||
mutex_enter(&bpo->bpo_lock);
|
||||
|
||||
if (free)
|
||||
dmu_buf_will_dirty(bpo->bpo_dbuf, tx);
|
||||
|
||||
for (i = bpo->bpo_phys->bpo_num_blkptrs - 1; i >= 0; i--) {
|
||||
blkptr_t *bparray;
|
||||
blkptr_t *bp;
|
||||
uint64_t offset, blkoff;
|
||||
|
||||
offset = i * sizeof (blkptr_t);
|
||||
blkoff = P2PHASE(i, bpo->bpo_epb);
|
||||
|
||||
if (dbuf == NULL || dbuf->db_offset > offset) {
|
||||
if (dbuf)
|
||||
dmu_buf_rele(dbuf, FTAG);
|
||||
err = dmu_buf_hold(bpo->bpo_os, bpo->bpo_object, offset,
|
||||
FTAG, &dbuf, 0);
|
||||
if (err)
|
||||
break;
|
||||
}
|
||||
|
||||
ASSERT3U(offset, >=, dbuf->db_offset);
|
||||
ASSERT3U(offset, <, dbuf->db_offset + dbuf->db_size);
|
||||
|
||||
bparray = dbuf->db_data;
|
||||
bp = &bparray[blkoff];
|
||||
err = func(arg, bp, tx);
|
||||
if (err)
|
||||
break;
|
||||
if (free) {
|
||||
bpo->bpo_phys->bpo_bytes -=
|
||||
bp_get_dsize_sync(dmu_objset_spa(bpo->bpo_os), bp);
|
||||
ASSERT3S(bpo->bpo_phys->bpo_bytes, >=, 0);
|
||||
if (bpo->bpo_havecomp) {
|
||||
bpo->bpo_phys->bpo_comp -= BP_GET_PSIZE(bp);
|
||||
bpo->bpo_phys->bpo_uncomp -= BP_GET_UCSIZE(bp);
|
||||
}
|
||||
bpo->bpo_phys->bpo_num_blkptrs--;
|
||||
ASSERT3S(bpo->bpo_phys->bpo_num_blkptrs, >=, 0);
|
||||
}
|
||||
}
|
||||
if (dbuf) {
|
||||
dmu_buf_rele(dbuf, FTAG);
|
||||
dbuf = NULL;
|
||||
}
|
||||
if (free) {
|
||||
i++;
|
||||
VERIFY3U(0, ==, dmu_free_range(bpo->bpo_os, bpo->bpo_object,
|
||||
i * sizeof (blkptr_t), -1ULL, tx));
|
||||
}
|
||||
if (err || !bpo->bpo_havesubobj || bpo->bpo_phys->bpo_subobjs == 0)
|
||||
goto out;
|
||||
|
||||
ASSERT(bpo->bpo_havecomp);
|
||||
err = dmu_object_info(bpo->bpo_os, bpo->bpo_phys->bpo_subobjs, &doi);
|
||||
if (err) {
|
||||
mutex_exit(&bpo->bpo_lock);
|
||||
return (err);
|
||||
}
|
||||
epb = doi.doi_data_block_size / sizeof (uint64_t);
|
||||
|
||||
for (i = bpo->bpo_phys->bpo_num_subobjs - 1; i >= 0; i--) {
|
||||
uint64_t *objarray;
|
||||
uint64_t offset, blkoff;
|
||||
bpobj_t sublist;
|
||||
uint64_t used_before, comp_before, uncomp_before;
|
||||
uint64_t used_after, comp_after, uncomp_after;
|
||||
|
||||
offset = i * sizeof (uint64_t);
|
||||
blkoff = P2PHASE(i, epb);
|
||||
|
||||
if (dbuf == NULL || dbuf->db_offset > offset) {
|
||||
if (dbuf)
|
||||
dmu_buf_rele(dbuf, FTAG);
|
||||
err = dmu_buf_hold(bpo->bpo_os,
|
||||
bpo->bpo_phys->bpo_subobjs, offset, FTAG, &dbuf, 0);
|
||||
if (err)
|
||||
break;
|
||||
}
|
||||
|
||||
ASSERT3U(offset, >=, dbuf->db_offset);
|
||||
ASSERT3U(offset, <, dbuf->db_offset + dbuf->db_size);
|
||||
|
||||
objarray = dbuf->db_data;
|
||||
err = bpobj_open(&sublist, bpo->bpo_os, objarray[blkoff]);
|
||||
if (err)
|
||||
break;
|
||||
if (free) {
|
||||
err = bpobj_space(&sublist,
|
||||
&used_before, &comp_before, &uncomp_before);
|
||||
if (err)
|
||||
break;
|
||||
}
|
||||
err = bpobj_iterate_impl(&sublist, func, arg, tx, free);
|
||||
if (free) {
|
||||
VERIFY3U(0, ==, bpobj_space(&sublist,
|
||||
&used_after, &comp_after, &uncomp_after));
|
||||
bpo->bpo_phys->bpo_bytes -= used_before - used_after;
|
||||
ASSERT3S(bpo->bpo_phys->bpo_bytes, >=, 0);
|
||||
bpo->bpo_phys->bpo_comp -= comp_before - comp_after;
|
||||
bpo->bpo_phys->bpo_uncomp -=
|
||||
uncomp_before - uncomp_after;
|
||||
}
|
||||
|
||||
bpobj_close(&sublist);
|
||||
if (err)
|
||||
break;
|
||||
if (free) {
|
||||
err = dmu_object_free(bpo->bpo_os,
|
||||
objarray[blkoff], tx);
|
||||
if (err)
|
||||
break;
|
||||
bpo->bpo_phys->bpo_num_subobjs--;
|
||||
ASSERT3S(bpo->bpo_phys->bpo_num_subobjs, >=, 0);
|
||||
}
|
||||
}
|
||||
if (dbuf) {
|
||||
dmu_buf_rele(dbuf, FTAG);
|
||||
dbuf = NULL;
|
||||
}
|
||||
if (free) {
|
||||
VERIFY3U(0, ==, dmu_free_range(bpo->bpo_os,
|
||||
bpo->bpo_phys->bpo_subobjs,
|
||||
(i + 1) * sizeof (uint64_t), -1ULL, tx));
|
||||
}
|
||||
|
||||
out:
|
||||
/* If there are no entries, there should be no bytes. */
|
||||
ASSERT(bpo->bpo_phys->bpo_num_blkptrs > 0 ||
|
||||
(bpo->bpo_havesubobj && bpo->bpo_phys->bpo_num_subobjs > 0) ||
|
||||
bpo->bpo_phys->bpo_bytes == 0);
|
||||
|
||||
mutex_exit(&bpo->bpo_lock);
|
||||
return (err);
|
||||
}
|
||||
|
||||
/*
|
||||
* Iterate and remove the entries. If func returns nonzero, iteration
|
||||
* will stop and that entry will not be removed.
|
||||
*/
|
||||
int
|
||||
bpobj_iterate(bpobj_t *bpo, bpobj_itor_t func, void *arg, dmu_tx_t *tx)
|
||||
{
|
||||
return (bpobj_iterate_impl(bpo, func, arg, tx, B_TRUE));
|
||||
}
|
||||
|
||||
/*
|
||||
* Iterate the entries. If func returns nonzero, iteration will stop.
|
||||
*/
|
||||
int
|
||||
bpobj_iterate_nofree(bpobj_t *bpo, bpobj_itor_t func, void *arg, dmu_tx_t *tx)
|
||||
{
|
||||
return (bpobj_iterate_impl(bpo, func, arg, tx, B_FALSE));
|
||||
}
|
||||
|
||||
void
|
||||
bpobj_enqueue_subobj(bpobj_t *bpo, uint64_t subobj, dmu_tx_t *tx)
|
||||
{
|
||||
bpobj_t subbpo;
|
||||
uint64_t used, comp, uncomp, subsubobjs;
|
||||
|
||||
ASSERT(bpo->bpo_havesubobj);
|
||||
ASSERT(bpo->bpo_havecomp);
|
||||
|
||||
VERIFY3U(0, ==, bpobj_open(&subbpo, bpo->bpo_os, subobj));
|
||||
VERIFY3U(0, ==, bpobj_space(&subbpo, &used, &comp, &uncomp));
|
||||
|
||||
if (used == 0) {
|
||||
/* No point in having an empty subobj. */
|
||||
bpobj_close(&subbpo);
|
||||
bpobj_free(bpo->bpo_os, subobj, tx);
|
||||
return;
|
||||
}
|
||||
|
||||
dmu_buf_will_dirty(bpo->bpo_dbuf, tx);
|
||||
if (bpo->bpo_phys->bpo_subobjs == 0) {
|
||||
bpo->bpo_phys->bpo_subobjs = dmu_object_alloc(bpo->bpo_os,
|
||||
DMU_OT_BPOBJ_SUBOBJ, SPA_MAXBLOCKSIZE, DMU_OT_NONE, 0, tx);
|
||||
}
|
||||
|
||||
mutex_enter(&bpo->bpo_lock);
|
||||
dmu_write(bpo->bpo_os, bpo->bpo_phys->bpo_subobjs,
|
||||
bpo->bpo_phys->bpo_num_subobjs * sizeof (subobj),
|
||||
sizeof (subobj), &subobj, tx);
|
||||
bpo->bpo_phys->bpo_num_subobjs++;
|
||||
|
||||
/*
|
||||
* If subobj has only one block of subobjs, then move subobj's
|
||||
* subobjs to bpo's subobj list directly. This reduces
|
||||
* recursion in bpobj_iterate due to nested subobjs.
|
||||
*/
|
||||
subsubobjs = subbpo.bpo_phys->bpo_subobjs;
|
||||
if (subsubobjs != 0) {
|
||||
dmu_object_info_t doi;
|
||||
|
||||
VERIFY3U(0, ==, dmu_object_info(bpo->bpo_os, subsubobjs, &doi));
|
||||
if (doi.doi_max_offset == doi.doi_data_block_size) {
|
||||
dmu_buf_t *subdb;
|
||||
uint64_t numsubsub = subbpo.bpo_phys->bpo_num_subobjs;
|
||||
|
||||
VERIFY3U(0, ==, dmu_buf_hold(bpo->bpo_os, subsubobjs,
|
||||
0, FTAG, &subdb, 0));
|
||||
dmu_write(bpo->bpo_os, bpo->bpo_phys->bpo_subobjs,
|
||||
bpo->bpo_phys->bpo_num_subobjs * sizeof (subobj),
|
||||
numsubsub * sizeof (subobj), subdb->db_data, tx);
|
||||
dmu_buf_rele(subdb, FTAG);
|
||||
bpo->bpo_phys->bpo_num_subobjs += numsubsub;
|
||||
|
||||
dmu_buf_will_dirty(subbpo.bpo_dbuf, tx);
|
||||
subbpo.bpo_phys->bpo_subobjs = 0;
|
||||
VERIFY3U(0, ==, dmu_object_free(bpo->bpo_os,
|
||||
subsubobjs, tx));
|
||||
}
|
||||
}
|
||||
bpo->bpo_phys->bpo_bytes += used;
|
||||
bpo->bpo_phys->bpo_comp += comp;
|
||||
bpo->bpo_phys->bpo_uncomp += uncomp;
|
||||
mutex_exit(&bpo->bpo_lock);
|
||||
|
||||
bpobj_close(&subbpo);
|
||||
}
|
||||
|
||||
void
|
||||
bpobj_enqueue(bpobj_t *bpo, const blkptr_t *bp, dmu_tx_t *tx)
|
||||
{
|
||||
blkptr_t stored_bp = *bp;
|
||||
uint64_t offset;
|
||||
int blkoff;
|
||||
blkptr_t *bparray;
|
||||
|
||||
ASSERT(!BP_IS_HOLE(bp));
|
||||
|
||||
/* We never need the fill count. */
|
||||
stored_bp.blk_fill = 0;
|
||||
|
||||
/* The bpobj will compress better if we can leave off the checksum */
|
||||
if (!BP_GET_DEDUP(bp))
|
||||
bzero(&stored_bp.blk_cksum, sizeof (stored_bp.blk_cksum));
|
||||
|
||||
mutex_enter(&bpo->bpo_lock);
|
||||
|
||||
offset = bpo->bpo_phys->bpo_num_blkptrs * sizeof (stored_bp);
|
||||
blkoff = P2PHASE(bpo->bpo_phys->bpo_num_blkptrs, bpo->bpo_epb);
|
||||
|
||||
if (bpo->bpo_cached_dbuf == NULL ||
|
||||
offset < bpo->bpo_cached_dbuf->db_offset ||
|
||||
offset >= bpo->bpo_cached_dbuf->db_offset +
|
||||
bpo->bpo_cached_dbuf->db_size) {
|
||||
if (bpo->bpo_cached_dbuf)
|
||||
dmu_buf_rele(bpo->bpo_cached_dbuf, bpo);
|
||||
VERIFY3U(0, ==, dmu_buf_hold(bpo->bpo_os, bpo->bpo_object,
|
||||
offset, bpo, &bpo->bpo_cached_dbuf, 0));
|
||||
}
|
||||
|
||||
dmu_buf_will_dirty(bpo->bpo_cached_dbuf, tx);
|
||||
bparray = bpo->bpo_cached_dbuf->db_data;
|
||||
bparray[blkoff] = stored_bp;
|
||||
|
||||
dmu_buf_will_dirty(bpo->bpo_dbuf, tx);
|
||||
bpo->bpo_phys->bpo_num_blkptrs++;
|
||||
bpo->bpo_phys->bpo_bytes +=
|
||||
bp_get_dsize_sync(dmu_objset_spa(bpo->bpo_os), bp);
|
||||
if (bpo->bpo_havecomp) {
|
||||
bpo->bpo_phys->bpo_comp += BP_GET_PSIZE(bp);
|
||||
bpo->bpo_phys->bpo_uncomp += BP_GET_UCSIZE(bp);
|
||||
}
|
||||
mutex_exit(&bpo->bpo_lock);
|
||||
}
|
||||
|
||||
struct space_range_arg {
|
||||
spa_t *spa;
|
||||
uint64_t mintxg;
|
||||
uint64_t maxtxg;
|
||||
uint64_t used;
|
||||
uint64_t comp;
|
||||
uint64_t uncomp;
|
||||
};
|
||||
|
||||
/* ARGSUSED */
|
||||
static int
|
||||
space_range_cb(void *arg, const blkptr_t *bp, dmu_tx_t *tx)
|
||||
{
|
||||
struct space_range_arg *sra = arg;
|
||||
|
||||
if (bp->blk_birth > sra->mintxg && bp->blk_birth <= sra->maxtxg) {
|
||||
sra->used += bp_get_dsize_sync(sra->spa, bp);
|
||||
sra->comp += BP_GET_PSIZE(bp);
|
||||
sra->uncomp += BP_GET_UCSIZE(bp);
|
||||
}
|
||||
return (0);
|
||||
}
|
||||
|
||||
int
|
||||
bpobj_space(bpobj_t *bpo, uint64_t *usedp, uint64_t *compp, uint64_t *uncompp)
|
||||
{
|
||||
mutex_enter(&bpo->bpo_lock);
|
||||
|
||||
*usedp = bpo->bpo_phys->bpo_bytes;
|
||||
if (bpo->bpo_havecomp) {
|
||||
*compp = bpo->bpo_phys->bpo_comp;
|
||||
*uncompp = bpo->bpo_phys->bpo_uncomp;
|
||||
mutex_exit(&bpo->bpo_lock);
|
||||
return (0);
|
||||
} else {
|
||||
mutex_exit(&bpo->bpo_lock);
|
||||
return (bpobj_space_range(bpo, 0, UINT64_MAX,
|
||||
usedp, compp, uncompp));
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Return the amount of space in the bpobj which is:
|
||||
* mintxg < blk_birth <= maxtxg
|
||||
*/
|
||||
int
|
||||
bpobj_space_range(bpobj_t *bpo, uint64_t mintxg, uint64_t maxtxg,
|
||||
uint64_t *usedp, uint64_t *compp, uint64_t *uncompp)
|
||||
{
|
||||
struct space_range_arg sra = { 0 };
|
||||
int err;
|
||||
|
||||
/*
|
||||
* As an optimization, if they want the whole txg range, just
|
||||
* get bpo_bytes rather than iterating over the bps.
|
||||
*/
|
||||
if (mintxg < TXG_INITIAL && maxtxg == UINT64_MAX && bpo->bpo_havecomp)
|
||||
return (bpobj_space(bpo, usedp, compp, uncompp));
|
||||
|
||||
sra.spa = dmu_objset_spa(bpo->bpo_os);
|
||||
sra.mintxg = mintxg;
|
||||
sra.maxtxg = maxtxg;
|
||||
|
||||
err = bpobj_iterate_nofree(bpo, space_range_cb, &sra, NULL);
|
||||
*usedp = sra.used;
|
||||
*compp = sra.comp;
|
||||
*uncompp = sra.uncomp;
|
||||
return (err);
|
||||
}
|
2707
uts/common/fs/zfs/dbuf.c
Normal file
2707
uts/common/fs/zfs/dbuf.c
Normal file
File diff suppressed because it is too large
Load Diff
1146
uts/common/fs/zfs/ddt.c
Normal file
1146
uts/common/fs/zfs/ddt.c
Normal file
File diff suppressed because it is too large
Load Diff
157
uts/common/fs/zfs/ddt_zap.c
Normal file
157
uts/common/fs/zfs/ddt_zap.c
Normal file
@ -0,0 +1,157 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
|
||||
/*
|
||||
* Copyright (c) 2009, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/spa.h>
|
||||
#include <sys/zio.h>
|
||||
#include <sys/ddt.h>
|
||||
#include <sys/zap.h>
|
||||
#include <sys/dmu_tx.h>
|
||||
#include <util/sscanf.h>
|
||||
|
||||
int ddt_zap_leaf_blockshift = 12;
|
||||
int ddt_zap_indirect_blockshift = 12;
|
||||
|
||||
static int
|
||||
ddt_zap_create(objset_t *os, uint64_t *objectp, dmu_tx_t *tx, boolean_t prehash)
|
||||
{
|
||||
zap_flags_t flags = ZAP_FLAG_HASH64 | ZAP_FLAG_UINT64_KEY;
|
||||
|
||||
if (prehash)
|
||||
flags |= ZAP_FLAG_PRE_HASHED_KEY;
|
||||
|
||||
*objectp = zap_create_flags(os, 0, flags, DMU_OT_DDT_ZAP,
|
||||
ddt_zap_leaf_blockshift, ddt_zap_indirect_blockshift,
|
||||
DMU_OT_NONE, 0, tx);
|
||||
|
||||
return (*objectp == 0 ? ENOTSUP : 0);
|
||||
}
|
||||
|
||||
static int
|
||||
ddt_zap_destroy(objset_t *os, uint64_t object, dmu_tx_t *tx)
|
||||
{
|
||||
return (zap_destroy(os, object, tx));
|
||||
}
|
||||
|
||||
static int
|
||||
ddt_zap_lookup(objset_t *os, uint64_t object, ddt_entry_t *dde)
|
||||
{
|
||||
uchar_t cbuf[sizeof (dde->dde_phys) + 1];
|
||||
uint64_t one, csize;
|
||||
int error;
|
||||
|
||||
error = zap_length_uint64(os, object, (uint64_t *)&dde->dde_key,
|
||||
DDT_KEY_WORDS, &one, &csize);
|
||||
if (error)
|
||||
return (error);
|
||||
|
||||
ASSERT(one == 1);
|
||||
ASSERT(csize <= sizeof (cbuf));
|
||||
|
||||
error = zap_lookup_uint64(os, object, (uint64_t *)&dde->dde_key,
|
||||
DDT_KEY_WORDS, 1, csize, cbuf);
|
||||
if (error)
|
||||
return (error);
|
||||
|
||||
ddt_decompress(cbuf, dde->dde_phys, csize, sizeof (dde->dde_phys));
|
||||
|
||||
return (0);
|
||||
}
|
||||
|
||||
static void
|
||||
ddt_zap_prefetch(objset_t *os, uint64_t object, ddt_entry_t *dde)
|
||||
{
|
||||
(void) zap_prefetch_uint64(os, object, (uint64_t *)&dde->dde_key,
|
||||
DDT_KEY_WORDS);
|
||||
}
|
||||
|
||||
static int
|
||||
ddt_zap_update(objset_t *os, uint64_t object, ddt_entry_t *dde, dmu_tx_t *tx)
|
||||
{
|
||||
uchar_t cbuf[sizeof (dde->dde_phys) + 1];
|
||||
uint64_t csize;
|
||||
|
||||
csize = ddt_compress(dde->dde_phys, cbuf,
|
||||
sizeof (dde->dde_phys), sizeof (cbuf));
|
||||
|
||||
return (zap_update_uint64(os, object, (uint64_t *)&dde->dde_key,
|
||||
DDT_KEY_WORDS, 1, csize, cbuf, tx));
|
||||
}
|
||||
|
||||
static int
|
||||
ddt_zap_remove(objset_t *os, uint64_t object, ddt_entry_t *dde, dmu_tx_t *tx)
|
||||
{
|
||||
return (zap_remove_uint64(os, object, (uint64_t *)&dde->dde_key,
|
||||
DDT_KEY_WORDS, tx));
|
||||
}
|
||||
|
||||
static int
|
||||
ddt_zap_walk(objset_t *os, uint64_t object, ddt_entry_t *dde, uint64_t *walk)
|
||||
{
|
||||
zap_cursor_t zc;
|
||||
zap_attribute_t za;
|
||||
int error;
|
||||
|
||||
zap_cursor_init_serialized(&zc, os, object, *walk);
|
||||
if ((error = zap_cursor_retrieve(&zc, &za)) == 0) {
|
||||
uchar_t cbuf[sizeof (dde->dde_phys) + 1];
|
||||
uint64_t csize = za.za_num_integers;
|
||||
ASSERT(za.za_integer_length == 1);
|
||||
error = zap_lookup_uint64(os, object, (uint64_t *)za.za_name,
|
||||
DDT_KEY_WORDS, 1, csize, cbuf);
|
||||
ASSERT(error == 0);
|
||||
if (error == 0) {
|
||||
ddt_decompress(cbuf, dde->dde_phys, csize,
|
||||
sizeof (dde->dde_phys));
|
||||
dde->dde_key = *(ddt_key_t *)za.za_name;
|
||||
}
|
||||
zap_cursor_advance(&zc);
|
||||
*walk = zap_cursor_serialize(&zc);
|
||||
}
|
||||
zap_cursor_fini(&zc);
|
||||
return (error);
|
||||
}
|
||||
|
||||
static uint64_t
|
||||
ddt_zap_count(objset_t *os, uint64_t object)
|
||||
{
|
||||
uint64_t count = 0;
|
||||
|
||||
VERIFY(zap_count(os, object, &count) == 0);
|
||||
|
||||
return (count);
|
||||
}
|
||||
|
||||
const ddt_ops_t ddt_zap_ops = {
|
||||
"zap",
|
||||
ddt_zap_create,
|
||||
ddt_zap_destroy,
|
||||
ddt_zap_lookup,
|
||||
ddt_zap_prefetch,
|
||||
ddt_zap_update,
|
||||
ddt_zap_remove,
|
||||
ddt_zap_walk,
|
||||
ddt_zap_count,
|
||||
};
|
1764
uts/common/fs/zfs/dmu.c
Normal file
1764
uts/common/fs/zfs/dmu.c
Normal file
File diff suppressed because it is too large
Load Diff
221
uts/common/fs/zfs/dmu_diff.c
Normal file
221
uts/common/fs/zfs/dmu_diff.c
Normal file
@ -0,0 +1,221 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/dmu_impl.h>
|
||||
#include <sys/dmu_tx.h>
|
||||
#include <sys/dbuf.h>
|
||||
#include <sys/dnode.h>
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/dmu_objset.h>
|
||||
#include <sys/dmu_traverse.h>
|
||||
#include <sys/dsl_dataset.h>
|
||||
#include <sys/dsl_dir.h>
|
||||
#include <sys/dsl_pool.h>
|
||||
#include <sys/dsl_synctask.h>
|
||||
#include <sys/zfs_ioctl.h>
|
||||
#include <sys/zap.h>
|
||||
#include <sys/zio_checksum.h>
|
||||
#include <sys/zfs_znode.h>
|
||||
|
||||
struct diffarg {
|
||||
struct vnode *da_vp; /* file to which we are reporting */
|
||||
offset_t *da_offp;
|
||||
int da_err; /* error that stopped diff search */
|
||||
dmu_diff_record_t da_ddr;
|
||||
};
|
||||
|
||||
static int
|
||||
write_record(struct diffarg *da)
|
||||
{
|
||||
ssize_t resid; /* have to get resid to get detailed errno */
|
||||
|
||||
if (da->da_ddr.ddr_type == DDR_NONE) {
|
||||
da->da_err = 0;
|
||||
return (0);
|
||||
}
|
||||
|
||||
da->da_err = vn_rdwr(UIO_WRITE, da->da_vp, (caddr_t)&da->da_ddr,
|
||||
sizeof (da->da_ddr), 0, UIO_SYSSPACE, FAPPEND,
|
||||
RLIM64_INFINITY, CRED(), &resid);
|
||||
*da->da_offp += sizeof (da->da_ddr);
|
||||
return (da->da_err);
|
||||
}
|
||||
|
||||
static int
|
||||
report_free_dnode_range(struct diffarg *da, uint64_t first, uint64_t last)
|
||||
{
|
||||
ASSERT(first <= last);
|
||||
if (da->da_ddr.ddr_type != DDR_FREE ||
|
||||
first != da->da_ddr.ddr_last + 1) {
|
||||
if (write_record(da) != 0)
|
||||
return (da->da_err);
|
||||
da->da_ddr.ddr_type = DDR_FREE;
|
||||
da->da_ddr.ddr_first = first;
|
||||
da->da_ddr.ddr_last = last;
|
||||
return (0);
|
||||
}
|
||||
da->da_ddr.ddr_last = last;
|
||||
return (0);
|
||||
}
|
||||
|
||||
static int
|
||||
report_dnode(struct diffarg *da, uint64_t object, dnode_phys_t *dnp)
|
||||
{
|
||||
ASSERT(dnp != NULL);
|
||||
if (dnp->dn_type == DMU_OT_NONE)
|
||||
return (report_free_dnode_range(da, object, object));
|
||||
|
||||
if (da->da_ddr.ddr_type != DDR_INUSE ||
|
||||
object != da->da_ddr.ddr_last + 1) {
|
||||
if (write_record(da) != 0)
|
||||
return (da->da_err);
|
||||
da->da_ddr.ddr_type = DDR_INUSE;
|
||||
da->da_ddr.ddr_first = da->da_ddr.ddr_last = object;
|
||||
return (0);
|
||||
}
|
||||
da->da_ddr.ddr_last = object;
|
||||
return (0);
|
||||
}
|
||||
|
||||
#define DBP_SPAN(dnp, level) \
|
||||
(((uint64_t)dnp->dn_datablkszsec) << (SPA_MINBLOCKSHIFT + \
|
||||
(level) * (dnp->dn_indblkshift - SPA_BLKPTRSHIFT)))
|
||||
|
||||
/* ARGSUSED */
|
||||
static int
|
||||
diff_cb(spa_t *spa, zilog_t *zilog, const blkptr_t *bp, arc_buf_t *pbuf,
|
||||
const zbookmark_t *zb, const dnode_phys_t *dnp, void *arg)
|
||||
{
|
||||
struct diffarg *da = arg;
|
||||
int err = 0;
|
||||
|
||||
if (issig(JUSTLOOKING) && issig(FORREAL))
|
||||
return (EINTR);
|
||||
|
||||
if (zb->zb_object != DMU_META_DNODE_OBJECT)
|
||||
return (0);
|
||||
|
||||
if (bp == NULL) {
|
||||
uint64_t span = DBP_SPAN(dnp, zb->zb_level);
|
||||
uint64_t dnobj = (zb->zb_blkid * span) >> DNODE_SHIFT;
|
||||
|
||||
err = report_free_dnode_range(da, dnobj,
|
||||
dnobj + (span >> DNODE_SHIFT) - 1);
|
||||
if (err)
|
||||
return (err);
|
||||
} else if (zb->zb_level == 0) {
|
||||
dnode_phys_t *blk;
|
||||
arc_buf_t *abuf;
|
||||
uint32_t aflags = ARC_WAIT;
|
||||
int blksz = BP_GET_LSIZE(bp);
|
||||
int i;
|
||||
|
||||
if (dsl_read(NULL, spa, bp, pbuf,
|
||||
arc_getbuf_func, &abuf, ZIO_PRIORITY_ASYNC_READ,
|
||||
ZIO_FLAG_CANFAIL, &aflags, zb) != 0)
|
||||
return (EIO);
|
||||
|
||||
blk = abuf->b_data;
|
||||
for (i = 0; i < blksz >> DNODE_SHIFT; i++) {
|
||||
uint64_t dnobj = (zb->zb_blkid <<
|
||||
(DNODE_BLOCK_SHIFT - DNODE_SHIFT)) + i;
|
||||
err = report_dnode(da, dnobj, blk+i);
|
||||
if (err)
|
||||
break;
|
||||
}
|
||||
(void) arc_buf_remove_ref(abuf, &abuf);
|
||||
if (err)
|
||||
return (err);
|
||||
/* Don't care about the data blocks */
|
||||
return (TRAVERSE_VISIT_NO_CHILDREN);
|
||||
}
|
||||
return (0);
|
||||
}
|
||||
|
||||
int
|
||||
dmu_diff(objset_t *tosnap, objset_t *fromsnap, struct vnode *vp, offset_t *offp)
|
||||
{
|
||||
struct diffarg da;
|
||||
dsl_dataset_t *ds = tosnap->os_dsl_dataset;
|
||||
dsl_dataset_t *fromds = fromsnap->os_dsl_dataset;
|
||||
dsl_dataset_t *findds;
|
||||
dsl_dataset_t *relds;
|
||||
int err = 0;
|
||||
|
||||
/* make certain we are looking at snapshots */
|
||||
if (!dsl_dataset_is_snapshot(ds) || !dsl_dataset_is_snapshot(fromds))
|
||||
return (EINVAL);
|
||||
|
||||
/* fromsnap must be earlier and from the same lineage as tosnap */
|
||||
if (fromds->ds_phys->ds_creation_txg >= ds->ds_phys->ds_creation_txg)
|
||||
return (EXDEV);
|
||||
|
||||
relds = NULL;
|
||||
findds = ds;
|
||||
|
||||
while (fromds->ds_dir != findds->ds_dir) {
|
||||
dsl_pool_t *dp = ds->ds_dir->dd_pool;
|
||||
|
||||
if (!dsl_dir_is_clone(findds->ds_dir)) {
|
||||
if (relds)
|
||||
dsl_dataset_rele(relds, FTAG);
|
||||
return (EXDEV);
|
||||
}
|
||||
|
||||
rw_enter(&dp->dp_config_rwlock, RW_READER);
|
||||
err = dsl_dataset_hold_obj(dp,
|
||||
findds->ds_dir->dd_phys->dd_origin_obj, FTAG, &findds);
|
||||
rw_exit(&dp->dp_config_rwlock);
|
||||
|
||||
if (relds)
|
||||
dsl_dataset_rele(relds, FTAG);
|
||||
|
||||
if (err)
|
||||
return (EXDEV);
|
||||
|
||||
relds = findds;
|
||||
}
|
||||
|
||||
if (relds)
|
||||
dsl_dataset_rele(relds, FTAG);
|
||||
|
||||
da.da_vp = vp;
|
||||
da.da_offp = offp;
|
||||
da.da_ddr.ddr_type = DDR_NONE;
|
||||
da.da_ddr.ddr_first = da.da_ddr.ddr_last = 0;
|
||||
da.da_err = 0;
|
||||
|
||||
err = traverse_dataset(ds, fromds->ds_phys->ds_creation_txg,
|
||||
TRAVERSE_PRE | TRAVERSE_PREFETCH_METADATA, diff_cb, &da);
|
||||
|
||||
if (err) {
|
||||
da.da_err = err;
|
||||
} else {
|
||||
/* we set the da.da_err we return as side-effect */
|
||||
(void) write_record(&da);
|
||||
}
|
||||
|
||||
return (da.da_err);
|
||||
}
|
196
uts/common/fs/zfs/dmu_object.c
Normal file
196
uts/common/fs/zfs/dmu_object.c
Normal file
@ -0,0 +1,196 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/dmu_objset.h>
|
||||
#include <sys/dmu_tx.h>
|
||||
#include <sys/dnode.h>
|
||||
|
||||
uint64_t
|
||||
dmu_object_alloc(objset_t *os, dmu_object_type_t ot, int blocksize,
|
||||
dmu_object_type_t bonustype, int bonuslen, dmu_tx_t *tx)
|
||||
{
|
||||
uint64_t object;
|
||||
uint64_t L2_dnode_count = DNODES_PER_BLOCK <<
|
||||
(DMU_META_DNODE(os)->dn_indblkshift - SPA_BLKPTRSHIFT);
|
||||
dnode_t *dn = NULL;
|
||||
int restarted = B_FALSE;
|
||||
|
||||
mutex_enter(&os->os_obj_lock);
|
||||
for (;;) {
|
||||
object = os->os_obj_next;
|
||||
/*
|
||||
* Each time we polish off an L2 bp worth of dnodes
|
||||
* (2^13 objects), move to another L2 bp that's still
|
||||
* reasonably sparse (at most 1/4 full). Look from the
|
||||
* beginning once, but after that keep looking from here.
|
||||
* If we can't find one, just keep going from here.
|
||||
*/
|
||||
if (P2PHASE(object, L2_dnode_count) == 0) {
|
||||
uint64_t offset = restarted ? object << DNODE_SHIFT : 0;
|
||||
int error = dnode_next_offset(DMU_META_DNODE(os),
|
||||
DNODE_FIND_HOLE,
|
||||
&offset, 2, DNODES_PER_BLOCK >> 2, 0);
|
||||
restarted = B_TRUE;
|
||||
if (error == 0)
|
||||
object = offset >> DNODE_SHIFT;
|
||||
}
|
||||
os->os_obj_next = ++object;
|
||||
|
||||
/*
|
||||
* XXX We should check for an i/o error here and return
|
||||
* up to our caller. Actually we should pre-read it in
|
||||
* dmu_tx_assign(), but there is currently no mechanism
|
||||
* to do so.
|
||||
*/
|
||||
(void) dnode_hold_impl(os, object, DNODE_MUST_BE_FREE,
|
||||
FTAG, &dn);
|
||||
if (dn)
|
||||
break;
|
||||
|
||||
if (dmu_object_next(os, &object, B_TRUE, 0) == 0)
|
||||
os->os_obj_next = object - 1;
|
||||
}
|
||||
|
||||
dnode_allocate(dn, ot, blocksize, 0, bonustype, bonuslen, tx);
|
||||
dnode_rele(dn, FTAG);
|
||||
|
||||
mutex_exit(&os->os_obj_lock);
|
||||
|
||||
dmu_tx_add_new_object(tx, os, object);
|
||||
return (object);
|
||||
}
|
||||
|
||||
int
|
||||
dmu_object_claim(objset_t *os, uint64_t object, dmu_object_type_t ot,
|
||||
int blocksize, dmu_object_type_t bonustype, int bonuslen, dmu_tx_t *tx)
|
||||
{
|
||||
dnode_t *dn;
|
||||
int err;
|
||||
|
||||
if (object == DMU_META_DNODE_OBJECT && !dmu_tx_private_ok(tx))
|
||||
return (EBADF);
|
||||
|
||||
err = dnode_hold_impl(os, object, DNODE_MUST_BE_FREE, FTAG, &dn);
|
||||
if (err)
|
||||
return (err);
|
||||
dnode_allocate(dn, ot, blocksize, 0, bonustype, bonuslen, tx);
|
||||
dnode_rele(dn, FTAG);
|
||||
|
||||
dmu_tx_add_new_object(tx, os, object);
|
||||
return (0);
|
||||
}
|
||||
|
||||
int
|
||||
dmu_object_reclaim(objset_t *os, uint64_t object, dmu_object_type_t ot,
|
||||
int blocksize, dmu_object_type_t bonustype, int bonuslen)
|
||||
{
|
||||
dnode_t *dn;
|
||||
dmu_tx_t *tx;
|
||||
int nblkptr;
|
||||
int err;
|
||||
|
||||
if (object == DMU_META_DNODE_OBJECT)
|
||||
return (EBADF);
|
||||
|
||||
err = dnode_hold_impl(os, object, DNODE_MUST_BE_ALLOCATED,
|
||||
FTAG, &dn);
|
||||
if (err)
|
||||
return (err);
|
||||
|
||||
if (dn->dn_type == ot && dn->dn_datablksz == blocksize &&
|
||||
dn->dn_bonustype == bonustype && dn->dn_bonuslen == bonuslen) {
|
||||
/* nothing is changing, this is a noop */
|
||||
dnode_rele(dn, FTAG);
|
||||
return (0);
|
||||
}
|
||||
|
||||
if (bonustype == DMU_OT_SA) {
|
||||
nblkptr = 1;
|
||||
} else {
|
||||
nblkptr = 1 + ((DN_MAX_BONUSLEN - bonuslen) >> SPA_BLKPTRSHIFT);
|
||||
}
|
||||
|
||||
/*
|
||||
* If we are losing blkptrs or changing the block size this must
|
||||
* be a new file instance. We must clear out the previous file
|
||||
* contents before we can change this type of metadata in the dnode.
|
||||
*/
|
||||
if (dn->dn_nblkptr > nblkptr || dn->dn_datablksz != blocksize) {
|
||||
err = dmu_free_long_range(os, object, 0, DMU_OBJECT_END);
|
||||
if (err)
|
||||
goto out;
|
||||
}
|
||||
|
||||
tx = dmu_tx_create(os);
|
||||
dmu_tx_hold_bonus(tx, object);
|
||||
err = dmu_tx_assign(tx, TXG_WAIT);
|
||||
if (err) {
|
||||
dmu_tx_abort(tx);
|
||||
goto out;
|
||||
}
|
||||
|
||||
dnode_reallocate(dn, ot, blocksize, bonustype, bonuslen, tx);
|
||||
|
||||
dmu_tx_commit(tx);
|
||||
out:
|
||||
dnode_rele(dn, FTAG);
|
||||
|
||||
return (err);
|
||||
}
|
||||
|
||||
int
|
||||
dmu_object_free(objset_t *os, uint64_t object, dmu_tx_t *tx)
|
||||
{
|
||||
dnode_t *dn;
|
||||
int err;
|
||||
|
||||
ASSERT(object != DMU_META_DNODE_OBJECT || dmu_tx_private_ok(tx));
|
||||
|
||||
err = dnode_hold_impl(os, object, DNODE_MUST_BE_ALLOCATED,
|
||||
FTAG, &dn);
|
||||
if (err)
|
||||
return (err);
|
||||
|
||||
ASSERT(dn->dn_type != DMU_OT_NONE);
|
||||
dnode_free_range(dn, 0, DMU_OBJECT_END, tx);
|
||||
dnode_free(dn, tx);
|
||||
dnode_rele(dn, FTAG);
|
||||
|
||||
return (0);
|
||||
}
|
||||
|
||||
int
|
||||
dmu_object_next(objset_t *os, uint64_t *objectp, boolean_t hole, uint64_t txg)
|
||||
{
|
||||
uint64_t offset = (*objectp + 1) << DNODE_SHIFT;
|
||||
int error;
|
||||
|
||||
error = dnode_next_offset(DMU_META_DNODE(os),
|
||||
(hole ? DNODE_FIND_HOLE : 0), &offset, 0, DNODES_PER_BLOCK, txg);
|
||||
|
||||
*objectp = offset >> DNODE_SHIFT;
|
||||
|
||||
return (error);
|
||||
}
|
1789
uts/common/fs/zfs/dmu_objset.c
Normal file
1789
uts/common/fs/zfs/dmu_objset.c
Normal file
File diff suppressed because it is too large
Load Diff
1606
uts/common/fs/zfs/dmu_send.c
Normal file
1606
uts/common/fs/zfs/dmu_send.c
Normal file
File diff suppressed because it is too large
Load Diff
482
uts/common/fs/zfs/dmu_traverse.c
Normal file
482
uts/common/fs/zfs/dmu_traverse.c
Normal file
@ -0,0 +1,482 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/dmu_objset.h>
|
||||
#include <sys/dmu_traverse.h>
|
||||
#include <sys/dsl_dataset.h>
|
||||
#include <sys/dsl_dir.h>
|
||||
#include <sys/dsl_pool.h>
|
||||
#include <sys/dnode.h>
|
||||
#include <sys/spa.h>
|
||||
#include <sys/zio.h>
|
||||
#include <sys/dmu_impl.h>
|
||||
#include <sys/sa.h>
|
||||
#include <sys/sa_impl.h>
|
||||
#include <sys/callb.h>
|
||||
|
||||
int zfs_pd_blks_max = 100;
|
||||
|
||||
typedef struct prefetch_data {
|
||||
kmutex_t pd_mtx;
|
||||
kcondvar_t pd_cv;
|
||||
int pd_blks_max;
|
||||
int pd_blks_fetched;
|
||||
int pd_flags;
|
||||
boolean_t pd_cancel;
|
||||
boolean_t pd_exited;
|
||||
} prefetch_data_t;
|
||||
|
||||
typedef struct traverse_data {
|
||||
spa_t *td_spa;
|
||||
uint64_t td_objset;
|
||||
blkptr_t *td_rootbp;
|
||||
uint64_t td_min_txg;
|
||||
int td_flags;
|
||||
prefetch_data_t *td_pfd;
|
||||
blkptr_cb_t *td_func;
|
||||
void *td_arg;
|
||||
} traverse_data_t;
|
||||
|
||||
static int traverse_dnode(traverse_data_t *td, const dnode_phys_t *dnp,
|
||||
arc_buf_t *buf, uint64_t objset, uint64_t object);
|
||||
|
||||
static int
|
||||
traverse_zil_block(zilog_t *zilog, blkptr_t *bp, void *arg, uint64_t claim_txg)
|
||||
{
|
||||
traverse_data_t *td = arg;
|
||||
zbookmark_t zb;
|
||||
|
||||
if (bp->blk_birth == 0)
|
||||
return (0);
|
||||
|
||||
if (claim_txg == 0 && bp->blk_birth >= spa_first_txg(td->td_spa))
|
||||
return (0);
|
||||
|
||||
SET_BOOKMARK(&zb, td->td_objset, ZB_ZIL_OBJECT, ZB_ZIL_LEVEL,
|
||||
bp->blk_cksum.zc_word[ZIL_ZC_SEQ]);
|
||||
|
||||
(void) td->td_func(td->td_spa, zilog, bp, NULL, &zb, NULL, td->td_arg);
|
||||
|
||||
return (0);
|
||||
}
|
||||
|
||||
static int
|
||||
traverse_zil_record(zilog_t *zilog, lr_t *lrc, void *arg, uint64_t claim_txg)
|
||||
{
|
||||
traverse_data_t *td = arg;
|
||||
|
||||
if (lrc->lrc_txtype == TX_WRITE) {
|
||||
lr_write_t *lr = (lr_write_t *)lrc;
|
||||
blkptr_t *bp = &lr->lr_blkptr;
|
||||
zbookmark_t zb;
|
||||
|
||||
if (bp->blk_birth == 0)
|
||||
return (0);
|
||||
|
||||
if (claim_txg == 0 || bp->blk_birth < claim_txg)
|
||||
return (0);
|
||||
|
||||
SET_BOOKMARK(&zb, td->td_objset, lr->lr_foid,
|
||||
ZB_ZIL_LEVEL, lr->lr_offset / BP_GET_LSIZE(bp));
|
||||
|
||||
(void) td->td_func(td->td_spa, zilog, bp, NULL, &zb, NULL,
|
||||
td->td_arg);
|
||||
}
|
||||
return (0);
|
||||
}
|
||||
|
||||
static void
|
||||
traverse_zil(traverse_data_t *td, zil_header_t *zh)
|
||||
{
|
||||
uint64_t claim_txg = zh->zh_claim_txg;
|
||||
zilog_t *zilog;
|
||||
|
||||
/*
|
||||
* We only want to visit blocks that have been claimed but not yet
|
||||
* replayed; plus, in read-only mode, blocks that are already stable.
|
||||
*/
|
||||
if (claim_txg == 0 && spa_writeable(td->td_spa))
|
||||
return;
|
||||
|
||||
zilog = zil_alloc(spa_get_dsl(td->td_spa)->dp_meta_objset, zh);
|
||||
|
||||
(void) zil_parse(zilog, traverse_zil_block, traverse_zil_record, td,
|
||||
claim_txg);
|
||||
|
||||
zil_free(zilog);
|
||||
}
|
||||
|
||||
static int
|
||||
traverse_visitbp(traverse_data_t *td, const dnode_phys_t *dnp,
|
||||
arc_buf_t *pbuf, blkptr_t *bp, const zbookmark_t *zb)
|
||||
{
|
||||
zbookmark_t czb;
|
||||
int err = 0, lasterr = 0;
|
||||
arc_buf_t *buf = NULL;
|
||||
prefetch_data_t *pd = td->td_pfd;
|
||||
boolean_t hard = td->td_flags & TRAVERSE_HARD;
|
||||
|
||||
if (bp->blk_birth == 0) {
|
||||
err = td->td_func(td->td_spa, NULL, NULL, pbuf, zb, dnp,
|
||||
td->td_arg);
|
||||
return (err);
|
||||
}
|
||||
|
||||
if (bp->blk_birth <= td->td_min_txg)
|
||||
return (0);
|
||||
|
||||
if (pd && !pd->pd_exited &&
|
||||
((pd->pd_flags & TRAVERSE_PREFETCH_DATA) ||
|
||||
BP_GET_TYPE(bp) == DMU_OT_DNODE || BP_GET_LEVEL(bp) > 0)) {
|
||||
mutex_enter(&pd->pd_mtx);
|
||||
ASSERT(pd->pd_blks_fetched >= 0);
|
||||
while (pd->pd_blks_fetched == 0 && !pd->pd_exited)
|
||||
cv_wait(&pd->pd_cv, &pd->pd_mtx);
|
||||
pd->pd_blks_fetched--;
|
||||
cv_broadcast(&pd->pd_cv);
|
||||
mutex_exit(&pd->pd_mtx);
|
||||
}
|
||||
|
||||
if (td->td_flags & TRAVERSE_PRE) {
|
||||
err = td->td_func(td->td_spa, NULL, bp, pbuf, zb, dnp,
|
||||
td->td_arg);
|
||||
if (err == TRAVERSE_VISIT_NO_CHILDREN)
|
||||
return (0);
|
||||
if (err)
|
||||
return (err);
|
||||
}
|
||||
|
||||
if (BP_GET_LEVEL(bp) > 0) {
|
||||
uint32_t flags = ARC_WAIT;
|
||||
int i;
|
||||
blkptr_t *cbp;
|
||||
int epb = BP_GET_LSIZE(bp) >> SPA_BLKPTRSHIFT;
|
||||
|
||||
err = dsl_read(NULL, td->td_spa, bp, pbuf,
|
||||
arc_getbuf_func, &buf,
|
||||
ZIO_PRIORITY_ASYNC_READ, ZIO_FLAG_CANFAIL, &flags, zb);
|
||||
if (err)
|
||||
return (err);
|
||||
|
||||
/* recursively visitbp() blocks below this */
|
||||
cbp = buf->b_data;
|
||||
for (i = 0; i < epb; i++, cbp++) {
|
||||
SET_BOOKMARK(&czb, zb->zb_objset, zb->zb_object,
|
||||
zb->zb_level - 1,
|
||||
zb->zb_blkid * epb + i);
|
||||
err = traverse_visitbp(td, dnp, buf, cbp, &czb);
|
||||
if (err) {
|
||||
if (!hard)
|
||||
break;
|
||||
lasterr = err;
|
||||
}
|
||||
}
|
||||
} else if (BP_GET_TYPE(bp) == DMU_OT_DNODE) {
|
||||
uint32_t flags = ARC_WAIT;
|
||||
int i;
|
||||
int epb = BP_GET_LSIZE(bp) >> DNODE_SHIFT;
|
||||
|
||||
err = dsl_read(NULL, td->td_spa, bp, pbuf,
|
||||
arc_getbuf_func, &buf,
|
||||
ZIO_PRIORITY_ASYNC_READ, ZIO_FLAG_CANFAIL, &flags, zb);
|
||||
if (err)
|
||||
return (err);
|
||||
|
||||
/* recursively visitbp() blocks below this */
|
||||
dnp = buf->b_data;
|
||||
for (i = 0; i < epb; i++, dnp++) {
|
||||
err = traverse_dnode(td, dnp, buf, zb->zb_objset,
|
||||
zb->zb_blkid * epb + i);
|
||||
if (err) {
|
||||
if (!hard)
|
||||
break;
|
||||
lasterr = err;
|
||||
}
|
||||
}
|
||||
} else if (BP_GET_TYPE(bp) == DMU_OT_OBJSET) {
|
||||
uint32_t flags = ARC_WAIT;
|
||||
objset_phys_t *osp;
|
||||
dnode_phys_t *dnp;
|
||||
|
||||
err = dsl_read_nolock(NULL, td->td_spa, bp,
|
||||
arc_getbuf_func, &buf,
|
||||
ZIO_PRIORITY_ASYNC_READ, ZIO_FLAG_CANFAIL, &flags, zb);
|
||||
if (err)
|
||||
return (err);
|
||||
|
||||
osp = buf->b_data;
|
||||
dnp = &osp->os_meta_dnode;
|
||||
err = traverse_dnode(td, dnp, buf, zb->zb_objset,
|
||||
DMU_META_DNODE_OBJECT);
|
||||
if (err && hard) {
|
||||
lasterr = err;
|
||||
err = 0;
|
||||
}
|
||||
if (err == 0 && arc_buf_size(buf) >= sizeof (objset_phys_t)) {
|
||||
dnp = &osp->os_userused_dnode;
|
||||
err = traverse_dnode(td, dnp, buf, zb->zb_objset,
|
||||
DMU_USERUSED_OBJECT);
|
||||
}
|
||||
if (err && hard) {
|
||||
lasterr = err;
|
||||
err = 0;
|
||||
}
|
||||
if (err == 0 && arc_buf_size(buf) >= sizeof (objset_phys_t)) {
|
||||
dnp = &osp->os_groupused_dnode;
|
||||
err = traverse_dnode(td, dnp, buf, zb->zb_objset,
|
||||
DMU_GROUPUSED_OBJECT);
|
||||
}
|
||||
}
|
||||
|
||||
if (buf)
|
||||
(void) arc_buf_remove_ref(buf, &buf);
|
||||
|
||||
if (err == 0 && lasterr == 0 && (td->td_flags & TRAVERSE_POST)) {
|
||||
err = td->td_func(td->td_spa, NULL, bp, pbuf, zb, dnp,
|
||||
td->td_arg);
|
||||
}
|
||||
|
||||
return (err != 0 ? err : lasterr);
|
||||
}
|
||||
|
||||
static int
|
||||
traverse_dnode(traverse_data_t *td, const dnode_phys_t *dnp,
|
||||
arc_buf_t *buf, uint64_t objset, uint64_t object)
|
||||
{
|
||||
int j, err = 0, lasterr = 0;
|
||||
zbookmark_t czb;
|
||||
boolean_t hard = (td->td_flags & TRAVERSE_HARD);
|
||||
|
||||
for (j = 0; j < dnp->dn_nblkptr; j++) {
|
||||
SET_BOOKMARK(&czb, objset, object, dnp->dn_nlevels - 1, j);
|
||||
err = traverse_visitbp(td, dnp, buf,
|
||||
(blkptr_t *)&dnp->dn_blkptr[j], &czb);
|
||||
if (err) {
|
||||
if (!hard)
|
||||
break;
|
||||
lasterr = err;
|
||||
}
|
||||
}
|
||||
|
||||
if (dnp->dn_flags & DNODE_FLAG_SPILL_BLKPTR) {
|
||||
SET_BOOKMARK(&czb, objset,
|
||||
object, 0, DMU_SPILL_BLKID);
|
||||
err = traverse_visitbp(td, dnp, buf,
|
||||
(blkptr_t *)&dnp->dn_spill, &czb);
|
||||
if (err) {
|
||||
if (!hard)
|
||||
return (err);
|
||||
lasterr = err;
|
||||
}
|
||||
}
|
||||
return (err != 0 ? err : lasterr);
|
||||
}
|
||||
|
||||
/* ARGSUSED */
|
||||
static int
|
||||
traverse_prefetcher(spa_t *spa, zilog_t *zilog, const blkptr_t *bp,
|
||||
arc_buf_t *pbuf, const zbookmark_t *zb, const dnode_phys_t *dnp,
|
||||
void *arg)
|
||||
{
|
||||
prefetch_data_t *pfd = arg;
|
||||
uint32_t aflags = ARC_NOWAIT | ARC_PREFETCH;
|
||||
|
||||
ASSERT(pfd->pd_blks_fetched >= 0);
|
||||
if (pfd->pd_cancel)
|
||||
return (EINTR);
|
||||
|
||||
if (bp == NULL || !((pfd->pd_flags & TRAVERSE_PREFETCH_DATA) ||
|
||||
BP_GET_TYPE(bp) == DMU_OT_DNODE || BP_GET_LEVEL(bp) > 0) ||
|
||||
BP_GET_TYPE(bp) == DMU_OT_INTENT_LOG)
|
||||
return (0);
|
||||
|
||||
mutex_enter(&pfd->pd_mtx);
|
||||
while (!pfd->pd_cancel && pfd->pd_blks_fetched >= pfd->pd_blks_max)
|
||||
cv_wait(&pfd->pd_cv, &pfd->pd_mtx);
|
||||
pfd->pd_blks_fetched++;
|
||||
cv_broadcast(&pfd->pd_cv);
|
||||
mutex_exit(&pfd->pd_mtx);
|
||||
|
||||
(void) dsl_read(NULL, spa, bp, pbuf, NULL, NULL,
|
||||
ZIO_PRIORITY_ASYNC_READ,
|
||||
ZIO_FLAG_CANFAIL | ZIO_FLAG_SPECULATIVE,
|
||||
&aflags, zb);
|
||||
|
||||
return (0);
|
||||
}
|
||||
|
||||
static void
|
||||
traverse_prefetch_thread(void *arg)
|
||||
{
|
||||
traverse_data_t *td_main = arg;
|
||||
traverse_data_t td = *td_main;
|
||||
zbookmark_t czb;
|
||||
|
||||
td.td_func = traverse_prefetcher;
|
||||
td.td_arg = td_main->td_pfd;
|
||||
td.td_pfd = NULL;
|
||||
|
||||
SET_BOOKMARK(&czb, td.td_objset,
|
||||
ZB_ROOT_OBJECT, ZB_ROOT_LEVEL, ZB_ROOT_BLKID);
|
||||
(void) traverse_visitbp(&td, NULL, NULL, td.td_rootbp, &czb);
|
||||
|
||||
mutex_enter(&td_main->td_pfd->pd_mtx);
|
||||
td_main->td_pfd->pd_exited = B_TRUE;
|
||||
cv_broadcast(&td_main->td_pfd->pd_cv);
|
||||
mutex_exit(&td_main->td_pfd->pd_mtx);
|
||||
}
|
||||
|
||||
/*
|
||||
* NB: dataset must not be changing on-disk (eg, is a snapshot or we are
|
||||
* in syncing context).
|
||||
*/
|
||||
static int
|
||||
traverse_impl(spa_t *spa, dsl_dataset_t *ds, blkptr_t *rootbp,
|
||||
uint64_t txg_start, int flags, blkptr_cb_t func, void *arg)
|
||||
{
|
||||
traverse_data_t td;
|
||||
prefetch_data_t pd = { 0 };
|
||||
zbookmark_t czb;
|
||||
int err;
|
||||
|
||||
td.td_spa = spa;
|
||||
td.td_objset = ds ? ds->ds_object : 0;
|
||||
td.td_rootbp = rootbp;
|
||||
td.td_min_txg = txg_start;
|
||||
td.td_func = func;
|
||||
td.td_arg = arg;
|
||||
td.td_pfd = &pd;
|
||||
td.td_flags = flags;
|
||||
|
||||
pd.pd_blks_max = zfs_pd_blks_max;
|
||||
pd.pd_flags = flags;
|
||||
mutex_init(&pd.pd_mtx, NULL, MUTEX_DEFAULT, NULL);
|
||||
cv_init(&pd.pd_cv, NULL, CV_DEFAULT, NULL);
|
||||
|
||||
/* See comment on ZIL traversal in dsl_scan_visitds. */
|
||||
if (ds != NULL && !dsl_dataset_is_snapshot(ds)) {
|
||||
objset_t *os;
|
||||
|
||||
err = dmu_objset_from_ds(ds, &os);
|
||||
if (err)
|
||||
return (err);
|
||||
|
||||
traverse_zil(&td, &os->os_zil_header);
|
||||
}
|
||||
|
||||
if (!(flags & TRAVERSE_PREFETCH) ||
|
||||
0 == taskq_dispatch(system_taskq, traverse_prefetch_thread,
|
||||
&td, TQ_NOQUEUE))
|
||||
pd.pd_exited = B_TRUE;
|
||||
|
||||
SET_BOOKMARK(&czb, td.td_objset,
|
||||
ZB_ROOT_OBJECT, ZB_ROOT_LEVEL, ZB_ROOT_BLKID);
|
||||
err = traverse_visitbp(&td, NULL, NULL, rootbp, &czb);
|
||||
|
||||
mutex_enter(&pd.pd_mtx);
|
||||
pd.pd_cancel = B_TRUE;
|
||||
cv_broadcast(&pd.pd_cv);
|
||||
while (!pd.pd_exited)
|
||||
cv_wait(&pd.pd_cv, &pd.pd_mtx);
|
||||
mutex_exit(&pd.pd_mtx);
|
||||
|
||||
mutex_destroy(&pd.pd_mtx);
|
||||
cv_destroy(&pd.pd_cv);
|
||||
|
||||
return (err);
|
||||
}
|
||||
|
||||
/*
|
||||
* NB: dataset must not be changing on-disk (eg, is a snapshot or we are
|
||||
* in syncing context).
|
||||
*/
|
||||
int
|
||||
traverse_dataset(dsl_dataset_t *ds, uint64_t txg_start, int flags,
|
||||
blkptr_cb_t func, void *arg)
|
||||
{
|
||||
return (traverse_impl(ds->ds_dir->dd_pool->dp_spa, ds,
|
||||
&ds->ds_phys->ds_bp, txg_start, flags, func, arg));
|
||||
}
|
||||
|
||||
/*
|
||||
* NB: pool must not be changing on-disk (eg, from zdb or sync context).
|
||||
*/
|
||||
int
|
||||
traverse_pool(spa_t *spa, uint64_t txg_start, int flags,
|
||||
blkptr_cb_t func, void *arg)
|
||||
{
|
||||
int err, lasterr = 0;
|
||||
uint64_t obj;
|
||||
dsl_pool_t *dp = spa_get_dsl(spa);
|
||||
objset_t *mos = dp->dp_meta_objset;
|
||||
boolean_t hard = (flags & TRAVERSE_HARD);
|
||||
|
||||
/* visit the MOS */
|
||||
err = traverse_impl(spa, NULL, spa_get_rootblkptr(spa),
|
||||
txg_start, flags, func, arg);
|
||||
if (err)
|
||||
return (err);
|
||||
|
||||
/* visit each dataset */
|
||||
for (obj = 1; err == 0 || (err != ESRCH && hard);
|
||||
err = dmu_object_next(mos, &obj, FALSE, txg_start)) {
|
||||
dmu_object_info_t doi;
|
||||
|
||||
err = dmu_object_info(mos, obj, &doi);
|
||||
if (err) {
|
||||
if (!hard)
|
||||
return (err);
|
||||
lasterr = err;
|
||||
continue;
|
||||
}
|
||||
|
||||
if (doi.doi_type == DMU_OT_DSL_DATASET) {
|
||||
dsl_dataset_t *ds;
|
||||
uint64_t txg = txg_start;
|
||||
|
||||
rw_enter(&dp->dp_config_rwlock, RW_READER);
|
||||
err = dsl_dataset_hold_obj(dp, obj, FTAG, &ds);
|
||||
rw_exit(&dp->dp_config_rwlock);
|
||||
if (err) {
|
||||
if (!hard)
|
||||
return (err);
|
||||
lasterr = err;
|
||||
continue;
|
||||
}
|
||||
if (ds->ds_phys->ds_prev_snap_txg > txg)
|
||||
txg = ds->ds_phys->ds_prev_snap_txg;
|
||||
err = traverse_dataset(ds, txg, flags, func, arg);
|
||||
dsl_dataset_rele(ds, FTAG);
|
||||
if (err) {
|
||||
if (!hard)
|
||||
return (err);
|
||||
lasterr = err;
|
||||
}
|
||||
}
|
||||
}
|
||||
if (err == ESRCH)
|
||||
err = 0;
|
||||
return (err != 0 ? err : lasterr);
|
||||
}
|
1382
uts/common/fs/zfs/dmu_tx.c
Normal file
1382
uts/common/fs/zfs/dmu_tx.c
Normal file
File diff suppressed because it is too large
Load Diff
724
uts/common/fs/zfs/dmu_zfetch.c
Normal file
724
uts/common/fs/zfs/dmu_zfetch.c
Normal file
@ -0,0 +1,724 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/dnode.h>
|
||||
#include <sys/dmu_objset.h>
|
||||
#include <sys/dmu_zfetch.h>
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/dbuf.h>
|
||||
#include <sys/kstat.h>
|
||||
|
||||
/*
|
||||
* I'm against tune-ables, but these should probably exist as tweakable globals
|
||||
* until we can get this working the way we want it to.
|
||||
*/
|
||||
|
||||
int zfs_prefetch_disable = 0;
|
||||
|
||||
/* max # of streams per zfetch */
|
||||
uint32_t zfetch_max_streams = 8;
|
||||
/* min time before stream reclaim */
|
||||
uint32_t zfetch_min_sec_reap = 2;
|
||||
/* max number of blocks to fetch at a time */
|
||||
uint32_t zfetch_block_cap = 256;
|
||||
/* number of bytes in a array_read at which we stop prefetching (1Mb) */
|
||||
uint64_t zfetch_array_rd_sz = 1024 * 1024;
|
||||
|
||||
/* forward decls for static routines */
|
||||
static int dmu_zfetch_colinear(zfetch_t *, zstream_t *);
|
||||
static void dmu_zfetch_dofetch(zfetch_t *, zstream_t *);
|
||||
static uint64_t dmu_zfetch_fetch(dnode_t *, uint64_t, uint64_t);
|
||||
static uint64_t dmu_zfetch_fetchsz(dnode_t *, uint64_t, uint64_t);
|
||||
static int dmu_zfetch_find(zfetch_t *, zstream_t *, int);
|
||||
static int dmu_zfetch_stream_insert(zfetch_t *, zstream_t *);
|
||||
static zstream_t *dmu_zfetch_stream_reclaim(zfetch_t *);
|
||||
static void dmu_zfetch_stream_remove(zfetch_t *, zstream_t *);
|
||||
static int dmu_zfetch_streams_equal(zstream_t *, zstream_t *);
|
||||
|
||||
typedef struct zfetch_stats {
|
||||
kstat_named_t zfetchstat_hits;
|
||||
kstat_named_t zfetchstat_misses;
|
||||
kstat_named_t zfetchstat_colinear_hits;
|
||||
kstat_named_t zfetchstat_colinear_misses;
|
||||
kstat_named_t zfetchstat_stride_hits;
|
||||
kstat_named_t zfetchstat_stride_misses;
|
||||
kstat_named_t zfetchstat_reclaim_successes;
|
||||
kstat_named_t zfetchstat_reclaim_failures;
|
||||
kstat_named_t zfetchstat_stream_resets;
|
||||
kstat_named_t zfetchstat_stream_noresets;
|
||||
kstat_named_t zfetchstat_bogus_streams;
|
||||
} zfetch_stats_t;
|
||||
|
||||
static zfetch_stats_t zfetch_stats = {
|
||||
{ "hits", KSTAT_DATA_UINT64 },
|
||||
{ "misses", KSTAT_DATA_UINT64 },
|
||||
{ "colinear_hits", KSTAT_DATA_UINT64 },
|
||||
{ "colinear_misses", KSTAT_DATA_UINT64 },
|
||||
{ "stride_hits", KSTAT_DATA_UINT64 },
|
||||
{ "stride_misses", KSTAT_DATA_UINT64 },
|
||||
{ "reclaim_successes", KSTAT_DATA_UINT64 },
|
||||
{ "reclaim_failures", KSTAT_DATA_UINT64 },
|
||||
{ "streams_resets", KSTAT_DATA_UINT64 },
|
||||
{ "streams_noresets", KSTAT_DATA_UINT64 },
|
||||
{ "bogus_streams", KSTAT_DATA_UINT64 },
|
||||
};
|
||||
|
||||
#define ZFETCHSTAT_INCR(stat, val) \
|
||||
atomic_add_64(&zfetch_stats.stat.value.ui64, (val));
|
||||
|
||||
#define ZFETCHSTAT_BUMP(stat) ZFETCHSTAT_INCR(stat, 1);
|
||||
|
||||
kstat_t *zfetch_ksp;
|
||||
|
||||
/*
|
||||
* Given a zfetch structure and a zstream structure, determine whether the
|
||||
* blocks to be read are part of a co-linear pair of existing prefetch
|
||||
* streams. If a set is found, coalesce the streams, removing one, and
|
||||
* configure the prefetch so it looks for a strided access pattern.
|
||||
*
|
||||
* In other words: if we find two sequential access streams that are
|
||||
* the same length and distance N appart, and this read is N from the
|
||||
* last stream, then we are probably in a strided access pattern. So
|
||||
* combine the two sequential streams into a single strided stream.
|
||||
*
|
||||
* If no co-linear streams are found, return NULL.
|
||||
*/
|
||||
static int
|
||||
dmu_zfetch_colinear(zfetch_t *zf, zstream_t *zh)
|
||||
{
|
||||
zstream_t *z_walk;
|
||||
zstream_t *z_comp;
|
||||
|
||||
if (! rw_tryenter(&zf->zf_rwlock, RW_WRITER))
|
||||
return (0);
|
||||
|
||||
if (zh == NULL) {
|
||||
rw_exit(&zf->zf_rwlock);
|
||||
return (0);
|
||||
}
|
||||
|
||||
for (z_walk = list_head(&zf->zf_stream); z_walk;
|
||||
z_walk = list_next(&zf->zf_stream, z_walk)) {
|
||||
for (z_comp = list_next(&zf->zf_stream, z_walk); z_comp;
|
||||
z_comp = list_next(&zf->zf_stream, z_comp)) {
|
||||
int64_t diff;
|
||||
|
||||
if (z_walk->zst_len != z_walk->zst_stride ||
|
||||
z_comp->zst_len != z_comp->zst_stride) {
|
||||
continue;
|
||||
}
|
||||
|
||||
diff = z_comp->zst_offset - z_walk->zst_offset;
|
||||
if (z_comp->zst_offset + diff == zh->zst_offset) {
|
||||
z_walk->zst_offset = zh->zst_offset;
|
||||
z_walk->zst_direction = diff < 0 ? -1 : 1;
|
||||
z_walk->zst_stride =
|
||||
diff * z_walk->zst_direction;
|
||||
z_walk->zst_ph_offset =
|
||||
zh->zst_offset + z_walk->zst_stride;
|
||||
dmu_zfetch_stream_remove(zf, z_comp);
|
||||
mutex_destroy(&z_comp->zst_lock);
|
||||
kmem_free(z_comp, sizeof (zstream_t));
|
||||
|
||||
dmu_zfetch_dofetch(zf, z_walk);
|
||||
|
||||
rw_exit(&zf->zf_rwlock);
|
||||
return (1);
|
||||
}
|
||||
|
||||
diff = z_walk->zst_offset - z_comp->zst_offset;
|
||||
if (z_walk->zst_offset + diff == zh->zst_offset) {
|
||||
z_walk->zst_offset = zh->zst_offset;
|
||||
z_walk->zst_direction = diff < 0 ? -1 : 1;
|
||||
z_walk->zst_stride =
|
||||
diff * z_walk->zst_direction;
|
||||
z_walk->zst_ph_offset =
|
||||
zh->zst_offset + z_walk->zst_stride;
|
||||
dmu_zfetch_stream_remove(zf, z_comp);
|
||||
mutex_destroy(&z_comp->zst_lock);
|
||||
kmem_free(z_comp, sizeof (zstream_t));
|
||||
|
||||
dmu_zfetch_dofetch(zf, z_walk);
|
||||
|
||||
rw_exit(&zf->zf_rwlock);
|
||||
return (1);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
rw_exit(&zf->zf_rwlock);
|
||||
return (0);
|
||||
}
|
||||
|
||||
/*
|
||||
* Given a zstream_t, determine the bounds of the prefetch. Then call the
|
||||
* routine that actually prefetches the individual blocks.
|
||||
*/
|
||||
static void
|
||||
dmu_zfetch_dofetch(zfetch_t *zf, zstream_t *zs)
|
||||
{
|
||||
uint64_t prefetch_tail;
|
||||
uint64_t prefetch_limit;
|
||||
uint64_t prefetch_ofst;
|
||||
uint64_t prefetch_len;
|
||||
uint64_t blocks_fetched;
|
||||
|
||||
zs->zst_stride = MAX((int64_t)zs->zst_stride, zs->zst_len);
|
||||
zs->zst_cap = MIN(zfetch_block_cap, 2 * zs->zst_cap);
|
||||
|
||||
prefetch_tail = MAX((int64_t)zs->zst_ph_offset,
|
||||
(int64_t)(zs->zst_offset + zs->zst_stride));
|
||||
/*
|
||||
* XXX: use a faster division method?
|
||||
*/
|
||||
prefetch_limit = zs->zst_offset + zs->zst_len +
|
||||
(zs->zst_cap * zs->zst_stride) / zs->zst_len;
|
||||
|
||||
while (prefetch_tail < prefetch_limit) {
|
||||
prefetch_ofst = zs->zst_offset + zs->zst_direction *
|
||||
(prefetch_tail - zs->zst_offset);
|
||||
|
||||
prefetch_len = zs->zst_len;
|
||||
|
||||
/*
|
||||
* Don't prefetch beyond the end of the file, if working
|
||||
* backwards.
|
||||
*/
|
||||
if ((zs->zst_direction == ZFETCH_BACKWARD) &&
|
||||
(prefetch_ofst > prefetch_tail)) {
|
||||
prefetch_len += prefetch_ofst;
|
||||
prefetch_ofst = 0;
|
||||
}
|
||||
|
||||
/* don't prefetch more than we're supposed to */
|
||||
if (prefetch_len > zs->zst_len)
|
||||
break;
|
||||
|
||||
blocks_fetched = dmu_zfetch_fetch(zf->zf_dnode,
|
||||
prefetch_ofst, zs->zst_len);
|
||||
|
||||
prefetch_tail += zs->zst_stride;
|
||||
/* stop if we've run out of stuff to prefetch */
|
||||
if (blocks_fetched < zs->zst_len)
|
||||
break;
|
||||
}
|
||||
zs->zst_ph_offset = prefetch_tail;
|
||||
zs->zst_last = ddi_get_lbolt();
|
||||
}
|
||||
|
||||
void
|
||||
zfetch_init(void)
|
||||
{
|
||||
|
||||
zfetch_ksp = kstat_create("zfs", 0, "zfetchstats", "misc",
|
||||
KSTAT_TYPE_NAMED, sizeof (zfetch_stats) / sizeof (kstat_named_t),
|
||||
KSTAT_FLAG_VIRTUAL);
|
||||
|
||||
if (zfetch_ksp != NULL) {
|
||||
zfetch_ksp->ks_data = &zfetch_stats;
|
||||
kstat_install(zfetch_ksp);
|
||||
}
|
||||
}
|
||||
|
||||
void
|
||||
zfetch_fini(void)
|
||||
{
|
||||
if (zfetch_ksp != NULL) {
|
||||
kstat_delete(zfetch_ksp);
|
||||
zfetch_ksp = NULL;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* This takes a pointer to a zfetch structure and a dnode. It performs the
|
||||
* necessary setup for the zfetch structure, grokking data from the
|
||||
* associated dnode.
|
||||
*/
|
||||
void
|
||||
dmu_zfetch_init(zfetch_t *zf, dnode_t *dno)
|
||||
{
|
||||
if (zf == NULL) {
|
||||
return;
|
||||
}
|
||||
|
||||
zf->zf_dnode = dno;
|
||||
zf->zf_stream_cnt = 0;
|
||||
zf->zf_alloc_fail = 0;
|
||||
|
||||
list_create(&zf->zf_stream, sizeof (zstream_t),
|
||||
offsetof(zstream_t, zst_node));
|
||||
|
||||
rw_init(&zf->zf_rwlock, NULL, RW_DEFAULT, NULL);
|
||||
}
|
||||
|
||||
/*
|
||||
* This function computes the actual size, in blocks, that can be prefetched,
|
||||
* and fetches it.
|
||||
*/
|
||||
static uint64_t
|
||||
dmu_zfetch_fetch(dnode_t *dn, uint64_t blkid, uint64_t nblks)
|
||||
{
|
||||
uint64_t fetchsz;
|
||||
uint64_t i;
|
||||
|
||||
fetchsz = dmu_zfetch_fetchsz(dn, blkid, nblks);
|
||||
|
||||
for (i = 0; i < fetchsz; i++) {
|
||||
dbuf_prefetch(dn, blkid + i);
|
||||
}
|
||||
|
||||
return (fetchsz);
|
||||
}
|
||||
|
||||
/*
|
||||
* this function returns the number of blocks that would be prefetched, based
|
||||
* upon the supplied dnode, blockid, and nblks. This is used so that we can
|
||||
* update streams in place, and then prefetch with their old value after the
|
||||
* fact. This way, we can delay the prefetch, but subsequent accesses to the
|
||||
* stream won't result in the same data being prefetched multiple times.
|
||||
*/
|
||||
static uint64_t
|
||||
dmu_zfetch_fetchsz(dnode_t *dn, uint64_t blkid, uint64_t nblks)
|
||||
{
|
||||
uint64_t fetchsz;
|
||||
|
||||
if (blkid > dn->dn_maxblkid) {
|
||||
return (0);
|
||||
}
|
||||
|
||||
/* compute fetch size */
|
||||
if (blkid + nblks + 1 > dn->dn_maxblkid) {
|
||||
fetchsz = (dn->dn_maxblkid - blkid) + 1;
|
||||
ASSERT(blkid + fetchsz - 1 <= dn->dn_maxblkid);
|
||||
} else {
|
||||
fetchsz = nblks;
|
||||
}
|
||||
|
||||
|
||||
return (fetchsz);
|
||||
}
|
||||
|
||||
/*
|
||||
* given a zfetch and a zstream structure, see if there is an associated zstream
|
||||
* for this block read. If so, it starts a prefetch for the stream it
|
||||
* located and returns true, otherwise it returns false
|
||||
*/
|
||||
static int
|
||||
dmu_zfetch_find(zfetch_t *zf, zstream_t *zh, int prefetched)
|
||||
{
|
||||
zstream_t *zs;
|
||||
int64_t diff;
|
||||
int reset = !prefetched;
|
||||
int rc = 0;
|
||||
|
||||
if (zh == NULL)
|
||||
return (0);
|
||||
|
||||
/*
|
||||
* XXX: This locking strategy is a bit coarse; however, it's impact has
|
||||
* yet to be tested. If this turns out to be an issue, it can be
|
||||
* modified in a number of different ways.
|
||||
*/
|
||||
|
||||
rw_enter(&zf->zf_rwlock, RW_READER);
|
||||
top:
|
||||
|
||||
for (zs = list_head(&zf->zf_stream); zs;
|
||||
zs = list_next(&zf->zf_stream, zs)) {
|
||||
|
||||
/*
|
||||
* XXX - should this be an assert?
|
||||
*/
|
||||
if (zs->zst_len == 0) {
|
||||
/* bogus stream */
|
||||
ZFETCHSTAT_BUMP(zfetchstat_bogus_streams);
|
||||
continue;
|
||||
}
|
||||
|
||||
/*
|
||||
* We hit this case when we are in a strided prefetch stream:
|
||||
* we will read "len" blocks before "striding".
|
||||
*/
|
||||
if (zh->zst_offset >= zs->zst_offset &&
|
||||
zh->zst_offset < zs->zst_offset + zs->zst_len) {
|
||||
if (prefetched) {
|
||||
/* already fetched */
|
||||
ZFETCHSTAT_BUMP(zfetchstat_stride_hits);
|
||||
rc = 1;
|
||||
goto out;
|
||||
} else {
|
||||
ZFETCHSTAT_BUMP(zfetchstat_stride_misses);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* This is the forward sequential read case: we increment
|
||||
* len by one each time we hit here, so we will enter this
|
||||
* case on every read.
|
||||
*/
|
||||
if (zh->zst_offset == zs->zst_offset + zs->zst_len) {
|
||||
|
||||
reset = !prefetched && zs->zst_len > 1;
|
||||
|
||||
mutex_enter(&zs->zst_lock);
|
||||
|
||||
if (zh->zst_offset != zs->zst_offset + zs->zst_len) {
|
||||
mutex_exit(&zs->zst_lock);
|
||||
goto top;
|
||||
}
|
||||
zs->zst_len += zh->zst_len;
|
||||
diff = zs->zst_len - zfetch_block_cap;
|
||||
if (diff > 0) {
|
||||
zs->zst_offset += diff;
|
||||
zs->zst_len = zs->zst_len > diff ?
|
||||
zs->zst_len - diff : 0;
|
||||
}
|
||||
zs->zst_direction = ZFETCH_FORWARD;
|
||||
|
||||
break;
|
||||
|
||||
/*
|
||||
* Same as above, but reading backwards through the file.
|
||||
*/
|
||||
} else if (zh->zst_offset == zs->zst_offset - zh->zst_len) {
|
||||
/* backwards sequential access */
|
||||
|
||||
reset = !prefetched && zs->zst_len > 1;
|
||||
|
||||
mutex_enter(&zs->zst_lock);
|
||||
|
||||
if (zh->zst_offset != zs->zst_offset - zh->zst_len) {
|
||||
mutex_exit(&zs->zst_lock);
|
||||
goto top;
|
||||
}
|
||||
|
||||
zs->zst_offset = zs->zst_offset > zh->zst_len ?
|
||||
zs->zst_offset - zh->zst_len : 0;
|
||||
zs->zst_ph_offset = zs->zst_ph_offset > zh->zst_len ?
|
||||
zs->zst_ph_offset - zh->zst_len : 0;
|
||||
zs->zst_len += zh->zst_len;
|
||||
|
||||
diff = zs->zst_len - zfetch_block_cap;
|
||||
if (diff > 0) {
|
||||
zs->zst_ph_offset = zs->zst_ph_offset > diff ?
|
||||
zs->zst_ph_offset - diff : 0;
|
||||
zs->zst_len = zs->zst_len > diff ?
|
||||
zs->zst_len - diff : zs->zst_len;
|
||||
}
|
||||
zs->zst_direction = ZFETCH_BACKWARD;
|
||||
|
||||
break;
|
||||
|
||||
} else if ((zh->zst_offset - zs->zst_offset - zs->zst_stride <
|
||||
zs->zst_len) && (zs->zst_len != zs->zst_stride)) {
|
||||
/* strided forward access */
|
||||
|
||||
mutex_enter(&zs->zst_lock);
|
||||
|
||||
if ((zh->zst_offset - zs->zst_offset - zs->zst_stride >=
|
||||
zs->zst_len) || (zs->zst_len == zs->zst_stride)) {
|
||||
mutex_exit(&zs->zst_lock);
|
||||
goto top;
|
||||
}
|
||||
|
||||
zs->zst_offset += zs->zst_stride;
|
||||
zs->zst_direction = ZFETCH_FORWARD;
|
||||
|
||||
break;
|
||||
|
||||
} else if ((zh->zst_offset - zs->zst_offset + zs->zst_stride <
|
||||
zs->zst_len) && (zs->zst_len != zs->zst_stride)) {
|
||||
/* strided reverse access */
|
||||
|
||||
mutex_enter(&zs->zst_lock);
|
||||
|
||||
if ((zh->zst_offset - zs->zst_offset + zs->zst_stride >=
|
||||
zs->zst_len) || (zs->zst_len == zs->zst_stride)) {
|
||||
mutex_exit(&zs->zst_lock);
|
||||
goto top;
|
||||
}
|
||||
|
||||
zs->zst_offset = zs->zst_offset > zs->zst_stride ?
|
||||
zs->zst_offset - zs->zst_stride : 0;
|
||||
zs->zst_ph_offset = (zs->zst_ph_offset >
|
||||
(2 * zs->zst_stride)) ?
|
||||
(zs->zst_ph_offset - (2 * zs->zst_stride)) : 0;
|
||||
zs->zst_direction = ZFETCH_BACKWARD;
|
||||
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (zs) {
|
||||
if (reset) {
|
||||
zstream_t *remove = zs;
|
||||
|
||||
ZFETCHSTAT_BUMP(zfetchstat_stream_resets);
|
||||
rc = 0;
|
||||
mutex_exit(&zs->zst_lock);
|
||||
rw_exit(&zf->zf_rwlock);
|
||||
rw_enter(&zf->zf_rwlock, RW_WRITER);
|
||||
/*
|
||||
* Relocate the stream, in case someone removes
|
||||
* it while we were acquiring the WRITER lock.
|
||||
*/
|
||||
for (zs = list_head(&zf->zf_stream); zs;
|
||||
zs = list_next(&zf->zf_stream, zs)) {
|
||||
if (zs == remove) {
|
||||
dmu_zfetch_stream_remove(zf, zs);
|
||||
mutex_destroy(&zs->zst_lock);
|
||||
kmem_free(zs, sizeof (zstream_t));
|
||||
break;
|
||||
}
|
||||
}
|
||||
} else {
|
||||
ZFETCHSTAT_BUMP(zfetchstat_stream_noresets);
|
||||
rc = 1;
|
||||
dmu_zfetch_dofetch(zf, zs);
|
||||
mutex_exit(&zs->zst_lock);
|
||||
}
|
||||
}
|
||||
out:
|
||||
rw_exit(&zf->zf_rwlock);
|
||||
return (rc);
|
||||
}
|
||||
|
||||
/*
|
||||
* Clean-up state associated with a zfetch structure. This frees allocated
|
||||
* structure members, empties the zf_stream tree, and generally makes things
|
||||
* nice. This doesn't free the zfetch_t itself, that's left to the caller.
|
||||
*/
|
||||
void
|
||||
dmu_zfetch_rele(zfetch_t *zf)
|
||||
{
|
||||
zstream_t *zs;
|
||||
zstream_t *zs_next;
|
||||
|
||||
ASSERT(!RW_LOCK_HELD(&zf->zf_rwlock));
|
||||
|
||||
for (zs = list_head(&zf->zf_stream); zs; zs = zs_next) {
|
||||
zs_next = list_next(&zf->zf_stream, zs);
|
||||
|
||||
list_remove(&zf->zf_stream, zs);
|
||||
mutex_destroy(&zs->zst_lock);
|
||||
kmem_free(zs, sizeof (zstream_t));
|
||||
}
|
||||
list_destroy(&zf->zf_stream);
|
||||
rw_destroy(&zf->zf_rwlock);
|
||||
|
||||
zf->zf_dnode = NULL;
|
||||
}
|
||||
|
||||
/*
|
||||
* Given a zfetch and zstream structure, insert the zstream structure into the
|
||||
* AVL tree contained within the zfetch structure. Peform the appropriate
|
||||
* book-keeping. It is possible that another thread has inserted a stream which
|
||||
* matches one that we are about to insert, so we must be sure to check for this
|
||||
* case. If one is found, return failure, and let the caller cleanup the
|
||||
* duplicates.
|
||||
*/
|
||||
static int
|
||||
dmu_zfetch_stream_insert(zfetch_t *zf, zstream_t *zs)
|
||||
{
|
||||
zstream_t *zs_walk;
|
||||
zstream_t *zs_next;
|
||||
|
||||
ASSERT(RW_WRITE_HELD(&zf->zf_rwlock));
|
||||
|
||||
for (zs_walk = list_head(&zf->zf_stream); zs_walk; zs_walk = zs_next) {
|
||||
zs_next = list_next(&zf->zf_stream, zs_walk);
|
||||
|
||||
if (dmu_zfetch_streams_equal(zs_walk, zs)) {
|
||||
return (0);
|
||||
}
|
||||
}
|
||||
|
||||
list_insert_head(&zf->zf_stream, zs);
|
||||
zf->zf_stream_cnt++;
|
||||
return (1);
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* Walk the list of zstreams in the given zfetch, find an old one (by time), and
|
||||
* reclaim it for use by the caller.
|
||||
*/
|
||||
static zstream_t *
|
||||
dmu_zfetch_stream_reclaim(zfetch_t *zf)
|
||||
{
|
||||
zstream_t *zs;
|
||||
|
||||
if (! rw_tryenter(&zf->zf_rwlock, RW_WRITER))
|
||||
return (0);
|
||||
|
||||
for (zs = list_head(&zf->zf_stream); zs;
|
||||
zs = list_next(&zf->zf_stream, zs)) {
|
||||
|
||||
if (((ddi_get_lbolt() - zs->zst_last)/hz) > zfetch_min_sec_reap)
|
||||
break;
|
||||
}
|
||||
|
||||
if (zs) {
|
||||
dmu_zfetch_stream_remove(zf, zs);
|
||||
mutex_destroy(&zs->zst_lock);
|
||||
bzero(zs, sizeof (zstream_t));
|
||||
} else {
|
||||
zf->zf_alloc_fail++;
|
||||
}
|
||||
rw_exit(&zf->zf_rwlock);
|
||||
|
||||
return (zs);
|
||||
}
|
||||
|
||||
/*
|
||||
* Given a zfetch and zstream structure, remove the zstream structure from its
|
||||
* container in the zfetch structure. Perform the appropriate book-keeping.
|
||||
*/
|
||||
static void
|
||||
dmu_zfetch_stream_remove(zfetch_t *zf, zstream_t *zs)
|
||||
{
|
||||
ASSERT(RW_WRITE_HELD(&zf->zf_rwlock));
|
||||
|
||||
list_remove(&zf->zf_stream, zs);
|
||||
zf->zf_stream_cnt--;
|
||||
}
|
||||
|
||||
static int
|
||||
dmu_zfetch_streams_equal(zstream_t *zs1, zstream_t *zs2)
|
||||
{
|
||||
if (zs1->zst_offset != zs2->zst_offset)
|
||||
return (0);
|
||||
|
||||
if (zs1->zst_len != zs2->zst_len)
|
||||
return (0);
|
||||
|
||||
if (zs1->zst_stride != zs2->zst_stride)
|
||||
return (0);
|
||||
|
||||
if (zs1->zst_ph_offset != zs2->zst_ph_offset)
|
||||
return (0);
|
||||
|
||||
if (zs1->zst_cap != zs2->zst_cap)
|
||||
return (0);
|
||||
|
||||
if (zs1->zst_direction != zs2->zst_direction)
|
||||
return (0);
|
||||
|
||||
return (1);
|
||||
}
|
||||
|
||||
/*
|
||||
* This is the prefetch entry point. It calls all of the other dmu_zfetch
|
||||
* routines to create, delete, find, or operate upon prefetch streams.
|
||||
*/
|
||||
void
|
||||
dmu_zfetch(zfetch_t *zf, uint64_t offset, uint64_t size, int prefetched)
|
||||
{
|
||||
zstream_t zst;
|
||||
zstream_t *newstream;
|
||||
int fetched;
|
||||
int inserted;
|
||||
unsigned int blkshft;
|
||||
uint64_t blksz;
|
||||
|
||||
if (zfs_prefetch_disable)
|
||||
return;
|
||||
|
||||
/* files that aren't ln2 blocksz are only one block -- nothing to do */
|
||||
if (!zf->zf_dnode->dn_datablkshift)
|
||||
return;
|
||||
|
||||
/* convert offset and size, into blockid and nblocks */
|
||||
blkshft = zf->zf_dnode->dn_datablkshift;
|
||||
blksz = (1 << blkshft);
|
||||
|
||||
bzero(&zst, sizeof (zstream_t));
|
||||
zst.zst_offset = offset >> blkshft;
|
||||
zst.zst_len = (P2ROUNDUP(offset + size, blksz) -
|
||||
P2ALIGN(offset, blksz)) >> blkshft;
|
||||
|
||||
fetched = dmu_zfetch_find(zf, &zst, prefetched);
|
||||
if (fetched) {
|
||||
ZFETCHSTAT_BUMP(zfetchstat_hits);
|
||||
} else {
|
||||
ZFETCHSTAT_BUMP(zfetchstat_misses);
|
||||
if (fetched = dmu_zfetch_colinear(zf, &zst)) {
|
||||
ZFETCHSTAT_BUMP(zfetchstat_colinear_hits);
|
||||
} else {
|
||||
ZFETCHSTAT_BUMP(zfetchstat_colinear_misses);
|
||||
}
|
||||
}
|
||||
|
||||
if (!fetched) {
|
||||
newstream = dmu_zfetch_stream_reclaim(zf);
|
||||
|
||||
/*
|
||||
* we still couldn't find a stream, drop the lock, and allocate
|
||||
* one if possible. Otherwise, give up and go home.
|
||||
*/
|
||||
if (newstream) {
|
||||
ZFETCHSTAT_BUMP(zfetchstat_reclaim_successes);
|
||||
} else {
|
||||
uint64_t maxblocks;
|
||||
uint32_t max_streams;
|
||||
uint32_t cur_streams;
|
||||
|
||||
ZFETCHSTAT_BUMP(zfetchstat_reclaim_failures);
|
||||
cur_streams = zf->zf_stream_cnt;
|
||||
maxblocks = zf->zf_dnode->dn_maxblkid;
|
||||
|
||||
max_streams = MIN(zfetch_max_streams,
|
||||
(maxblocks / zfetch_block_cap));
|
||||
if (max_streams == 0) {
|
||||
max_streams++;
|
||||
}
|
||||
|
||||
if (cur_streams >= max_streams) {
|
||||
return;
|
||||
}
|
||||
newstream = kmem_zalloc(sizeof (zstream_t), KM_SLEEP);
|
||||
}
|
||||
|
||||
newstream->zst_offset = zst.zst_offset;
|
||||
newstream->zst_len = zst.zst_len;
|
||||
newstream->zst_stride = zst.zst_len;
|
||||
newstream->zst_ph_offset = zst.zst_len + zst.zst_offset;
|
||||
newstream->zst_cap = zst.zst_len;
|
||||
newstream->zst_direction = ZFETCH_FORWARD;
|
||||
newstream->zst_last = ddi_get_lbolt();
|
||||
|
||||
mutex_init(&newstream->zst_lock, NULL, MUTEX_DEFAULT, NULL);
|
||||
|
||||
rw_enter(&zf->zf_rwlock, RW_WRITER);
|
||||
inserted = dmu_zfetch_stream_insert(zf, newstream);
|
||||
rw_exit(&zf->zf_rwlock);
|
||||
|
||||
if (!inserted) {
|
||||
mutex_destroy(&newstream->zst_lock);
|
||||
kmem_free(newstream, sizeof (zstream_t));
|
||||
}
|
||||
}
|
||||
}
|
1993
uts/common/fs/zfs/dnode.c
Normal file
1993
uts/common/fs/zfs/dnode.c
Normal file
File diff suppressed because it is too large
Load Diff
693
uts/common/fs/zfs/dnode_sync.c
Normal file
693
uts/common/fs/zfs/dnode_sync.c
Normal file
@ -0,0 +1,693 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/dbuf.h>
|
||||
#include <sys/dnode.h>
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/dmu_tx.h>
|
||||
#include <sys/dmu_objset.h>
|
||||
#include <sys/dsl_dataset.h>
|
||||
#include <sys/spa.h>
|
||||
|
||||
static void
|
||||
dnode_increase_indirection(dnode_t *dn, dmu_tx_t *tx)
|
||||
{
|
||||
dmu_buf_impl_t *db;
|
||||
int txgoff = tx->tx_txg & TXG_MASK;
|
||||
int nblkptr = dn->dn_phys->dn_nblkptr;
|
||||
int old_toplvl = dn->dn_phys->dn_nlevels - 1;
|
||||
int new_level = dn->dn_next_nlevels[txgoff];
|
||||
int i;
|
||||
|
||||
rw_enter(&dn->dn_struct_rwlock, RW_WRITER);
|
||||
|
||||
/* this dnode can't be paged out because it's dirty */
|
||||
ASSERT(dn->dn_phys->dn_type != DMU_OT_NONE);
|
||||
ASSERT(RW_WRITE_HELD(&dn->dn_struct_rwlock));
|
||||
ASSERT(new_level > 1 && dn->dn_phys->dn_nlevels > 0);
|
||||
|
||||
db = dbuf_hold_level(dn, dn->dn_phys->dn_nlevels, 0, FTAG);
|
||||
ASSERT(db != NULL);
|
||||
|
||||
dn->dn_phys->dn_nlevels = new_level;
|
||||
dprintf("os=%p obj=%llu, increase to %d\n", dn->dn_objset,
|
||||
dn->dn_object, dn->dn_phys->dn_nlevels);
|
||||
|
||||
/* check for existing blkptrs in the dnode */
|
||||
for (i = 0; i < nblkptr; i++)
|
||||
if (!BP_IS_HOLE(&dn->dn_phys->dn_blkptr[i]))
|
||||
break;
|
||||
if (i != nblkptr) {
|
||||
/* transfer dnode's block pointers to new indirect block */
|
||||
(void) dbuf_read(db, NULL, DB_RF_MUST_SUCCEED|DB_RF_HAVESTRUCT);
|
||||
ASSERT(db->db.db_data);
|
||||
ASSERT(arc_released(db->db_buf));
|
||||
ASSERT3U(sizeof (blkptr_t) * nblkptr, <=, db->db.db_size);
|
||||
bcopy(dn->dn_phys->dn_blkptr, db->db.db_data,
|
||||
sizeof (blkptr_t) * nblkptr);
|
||||
arc_buf_freeze(db->db_buf);
|
||||
}
|
||||
|
||||
/* set dbuf's parent pointers to new indirect buf */
|
||||
for (i = 0; i < nblkptr; i++) {
|
||||
dmu_buf_impl_t *child = dbuf_find(dn, old_toplvl, i);
|
||||
|
||||
if (child == NULL)
|
||||
continue;
|
||||
#ifdef DEBUG
|
||||
DB_DNODE_ENTER(child);
|
||||
ASSERT3P(DB_DNODE(child), ==, dn);
|
||||
DB_DNODE_EXIT(child);
|
||||
#endif /* DEBUG */
|
||||
if (child->db_parent && child->db_parent != dn->dn_dbuf) {
|
||||
ASSERT(child->db_parent->db_level == db->db_level);
|
||||
ASSERT(child->db_blkptr !=
|
||||
&dn->dn_phys->dn_blkptr[child->db_blkid]);
|
||||
mutex_exit(&child->db_mtx);
|
||||
continue;
|
||||
}
|
||||
ASSERT(child->db_parent == NULL ||
|
||||
child->db_parent == dn->dn_dbuf);
|
||||
|
||||
child->db_parent = db;
|
||||
dbuf_add_ref(db, child);
|
||||
if (db->db.db_data)
|
||||
child->db_blkptr = (blkptr_t *)db->db.db_data + i;
|
||||
else
|
||||
child->db_blkptr = NULL;
|
||||
dprintf_dbuf_bp(child, child->db_blkptr,
|
||||
"changed db_blkptr to new indirect %s", "");
|
||||
|
||||
mutex_exit(&child->db_mtx);
|
||||
}
|
||||
|
||||
bzero(dn->dn_phys->dn_blkptr, sizeof (blkptr_t) * nblkptr);
|
||||
|
||||
dbuf_rele(db, FTAG);
|
||||
|
||||
rw_exit(&dn->dn_struct_rwlock);
|
||||
}
|
||||
|
||||
static int
|
||||
free_blocks(dnode_t *dn, blkptr_t *bp, int num, dmu_tx_t *tx)
|
||||
{
|
||||
dsl_dataset_t *ds = dn->dn_objset->os_dsl_dataset;
|
||||
uint64_t bytesfreed = 0;
|
||||
int i, blocks_freed = 0;
|
||||
|
||||
dprintf("ds=%p obj=%llx num=%d\n", ds, dn->dn_object, num);
|
||||
|
||||
for (i = 0; i < num; i++, bp++) {
|
||||
if (BP_IS_HOLE(bp))
|
||||
continue;
|
||||
|
||||
bytesfreed += dsl_dataset_block_kill(ds, bp, tx, B_FALSE);
|
||||
ASSERT3U(bytesfreed, <=, DN_USED_BYTES(dn->dn_phys));
|
||||
bzero(bp, sizeof (blkptr_t));
|
||||
blocks_freed += 1;
|
||||
}
|
||||
dnode_diduse_space(dn, -bytesfreed);
|
||||
return (blocks_freed);
|
||||
}
|
||||
|
||||
#ifdef ZFS_DEBUG
|
||||
static void
|
||||
free_verify(dmu_buf_impl_t *db, uint64_t start, uint64_t end, dmu_tx_t *tx)
|
||||
{
|
||||
int off, num;
|
||||
int i, err, epbs;
|
||||
uint64_t txg = tx->tx_txg;
|
||||
dnode_t *dn;
|
||||
|
||||
DB_DNODE_ENTER(db);
|
||||
dn = DB_DNODE(db);
|
||||
epbs = dn->dn_phys->dn_indblkshift - SPA_BLKPTRSHIFT;
|
||||
off = start - (db->db_blkid * 1<<epbs);
|
||||
num = end - start + 1;
|
||||
|
||||
ASSERT3U(off, >=, 0);
|
||||
ASSERT3U(num, >=, 0);
|
||||
ASSERT3U(db->db_level, >, 0);
|
||||
ASSERT3U(db->db.db_size, ==, 1 << dn->dn_phys->dn_indblkshift);
|
||||
ASSERT3U(off+num, <=, db->db.db_size >> SPA_BLKPTRSHIFT);
|
||||
ASSERT(db->db_blkptr != NULL);
|
||||
|
||||
for (i = off; i < off+num; i++) {
|
||||
uint64_t *buf;
|
||||
dmu_buf_impl_t *child;
|
||||
dbuf_dirty_record_t *dr;
|
||||
int j;
|
||||
|
||||
ASSERT(db->db_level == 1);
|
||||
|
||||
rw_enter(&dn->dn_struct_rwlock, RW_READER);
|
||||
err = dbuf_hold_impl(dn, db->db_level-1,
|
||||
(db->db_blkid << epbs) + i, TRUE, FTAG, &child);
|
||||
rw_exit(&dn->dn_struct_rwlock);
|
||||
if (err == ENOENT)
|
||||
continue;
|
||||
ASSERT(err == 0);
|
||||
ASSERT(child->db_level == 0);
|
||||
dr = child->db_last_dirty;
|
||||
while (dr && dr->dr_txg > txg)
|
||||
dr = dr->dr_next;
|
||||
ASSERT(dr == NULL || dr->dr_txg == txg);
|
||||
|
||||
/* data_old better be zeroed */
|
||||
if (dr) {
|
||||
buf = dr->dt.dl.dr_data->b_data;
|
||||
for (j = 0; j < child->db.db_size >> 3; j++) {
|
||||
if (buf[j] != 0) {
|
||||
panic("freed data not zero: "
|
||||
"child=%p i=%d off=%d num=%d\n",
|
||||
(void *)child, i, off, num);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* db_data better be zeroed unless it's dirty in a
|
||||
* future txg.
|
||||
*/
|
||||
mutex_enter(&child->db_mtx);
|
||||
buf = child->db.db_data;
|
||||
if (buf != NULL && child->db_state != DB_FILL &&
|
||||
child->db_last_dirty == NULL) {
|
||||
for (j = 0; j < child->db.db_size >> 3; j++) {
|
||||
if (buf[j] != 0) {
|
||||
panic("freed data not zero: "
|
||||
"child=%p i=%d off=%d num=%d\n",
|
||||
(void *)child, i, off, num);
|
||||
}
|
||||
}
|
||||
}
|
||||
mutex_exit(&child->db_mtx);
|
||||
|
||||
dbuf_rele(child, FTAG);
|
||||
}
|
||||
DB_DNODE_EXIT(db);
|
||||
}
|
||||
#endif
|
||||
|
||||
#define ALL -1
|
||||
|
||||
static int
|
||||
free_children(dmu_buf_impl_t *db, uint64_t blkid, uint64_t nblks, int trunc,
|
||||
dmu_tx_t *tx)
|
||||
{
|
||||
dnode_t *dn;
|
||||
blkptr_t *bp;
|
||||
dmu_buf_impl_t *subdb;
|
||||
uint64_t start, end, dbstart, dbend, i;
|
||||
int epbs, shift, err;
|
||||
int all = TRUE;
|
||||
int blocks_freed = 0;
|
||||
|
||||
/*
|
||||
* There is a small possibility that this block will not be cached:
|
||||
* 1 - if level > 1 and there are no children with level <= 1
|
||||
* 2 - if we didn't get a dirty hold (because this block had just
|
||||
* finished being written -- and so had no holds), and then this
|
||||
* block got evicted before we got here.
|
||||
*/
|
||||
if (db->db_state != DB_CACHED)
|
||||
(void) dbuf_read(db, NULL, DB_RF_MUST_SUCCEED);
|
||||
|
||||
dbuf_release_bp(db);
|
||||
bp = (blkptr_t *)db->db.db_data;
|
||||
|
||||
DB_DNODE_ENTER(db);
|
||||
dn = DB_DNODE(db);
|
||||
epbs = dn->dn_phys->dn_indblkshift - SPA_BLKPTRSHIFT;
|
||||
shift = (db->db_level - 1) * epbs;
|
||||
dbstart = db->db_blkid << epbs;
|
||||
start = blkid >> shift;
|
||||
if (dbstart < start) {
|
||||
bp += start - dbstart;
|
||||
all = FALSE;
|
||||
} else {
|
||||
start = dbstart;
|
||||
}
|
||||
dbend = ((db->db_blkid + 1) << epbs) - 1;
|
||||
end = (blkid + nblks - 1) >> shift;
|
||||
if (dbend <= end)
|
||||
end = dbend;
|
||||
else if (all)
|
||||
all = trunc;
|
||||
ASSERT3U(start, <=, end);
|
||||
|
||||
if (db->db_level == 1) {
|
||||
FREE_VERIFY(db, start, end, tx);
|
||||
blocks_freed = free_blocks(dn, bp, end-start+1, tx);
|
||||
arc_buf_freeze(db->db_buf);
|
||||
ASSERT(all || blocks_freed == 0 || db->db_last_dirty);
|
||||
DB_DNODE_EXIT(db);
|
||||
return (all ? ALL : blocks_freed);
|
||||
}
|
||||
|
||||
for (i = start; i <= end; i++, bp++) {
|
||||
if (BP_IS_HOLE(bp))
|
||||
continue;
|
||||
rw_enter(&dn->dn_struct_rwlock, RW_READER);
|
||||
err = dbuf_hold_impl(dn, db->db_level-1, i, TRUE, FTAG, &subdb);
|
||||
ASSERT3U(err, ==, 0);
|
||||
rw_exit(&dn->dn_struct_rwlock);
|
||||
|
||||
if (free_children(subdb, blkid, nblks, trunc, tx) == ALL) {
|
||||
ASSERT3P(subdb->db_blkptr, ==, bp);
|
||||
blocks_freed += free_blocks(dn, bp, 1, tx);
|
||||
} else {
|
||||
all = FALSE;
|
||||
}
|
||||
dbuf_rele(subdb, FTAG);
|
||||
}
|
||||
DB_DNODE_EXIT(db);
|
||||
arc_buf_freeze(db->db_buf);
|
||||
#ifdef ZFS_DEBUG
|
||||
bp -= (end-start)+1;
|
||||
for (i = start; i <= end; i++, bp++) {
|
||||
if (i == start && blkid != 0)
|
||||
continue;
|
||||
else if (i == end && !trunc)
|
||||
continue;
|
||||
ASSERT3U(bp->blk_birth, ==, 0);
|
||||
}
|
||||
#endif
|
||||
ASSERT(all || blocks_freed == 0 || db->db_last_dirty);
|
||||
return (all ? ALL : blocks_freed);
|
||||
}
|
||||
|
||||
/*
|
||||
* free_range: Traverse the indicated range of the provided file
|
||||
* and "free" all the blocks contained there.
|
||||
*/
|
||||
static void
|
||||
dnode_sync_free_range(dnode_t *dn, uint64_t blkid, uint64_t nblks, dmu_tx_t *tx)
|
||||
{
|
||||
blkptr_t *bp = dn->dn_phys->dn_blkptr;
|
||||
dmu_buf_impl_t *db;
|
||||
int trunc, start, end, shift, i, err;
|
||||
int dnlevel = dn->dn_phys->dn_nlevels;
|
||||
|
||||
if (blkid > dn->dn_phys->dn_maxblkid)
|
||||
return;
|
||||
|
||||
ASSERT(dn->dn_phys->dn_maxblkid < UINT64_MAX);
|
||||
trunc = blkid + nblks > dn->dn_phys->dn_maxblkid;
|
||||
if (trunc)
|
||||
nblks = dn->dn_phys->dn_maxblkid - blkid + 1;
|
||||
|
||||
/* There are no indirect blocks in the object */
|
||||
if (dnlevel == 1) {
|
||||
if (blkid >= dn->dn_phys->dn_nblkptr) {
|
||||
/* this range was never made persistent */
|
||||
return;
|
||||
}
|
||||
ASSERT3U(blkid + nblks, <=, dn->dn_phys->dn_nblkptr);
|
||||
(void) free_blocks(dn, bp + blkid, nblks, tx);
|
||||
if (trunc) {
|
||||
uint64_t off = (dn->dn_phys->dn_maxblkid + 1) *
|
||||
(dn->dn_phys->dn_datablkszsec << SPA_MINBLOCKSHIFT);
|
||||
dn->dn_phys->dn_maxblkid = (blkid ? blkid - 1 : 0);
|
||||
ASSERT(off < dn->dn_phys->dn_maxblkid ||
|
||||
dn->dn_phys->dn_maxblkid == 0 ||
|
||||
dnode_next_offset(dn, 0, &off, 1, 1, 0) != 0);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
shift = (dnlevel - 1) * (dn->dn_phys->dn_indblkshift - SPA_BLKPTRSHIFT);
|
||||
start = blkid >> shift;
|
||||
ASSERT(start < dn->dn_phys->dn_nblkptr);
|
||||
end = (blkid + nblks - 1) >> shift;
|
||||
bp += start;
|
||||
for (i = start; i <= end; i++, bp++) {
|
||||
if (BP_IS_HOLE(bp))
|
||||
continue;
|
||||
rw_enter(&dn->dn_struct_rwlock, RW_READER);
|
||||
err = dbuf_hold_impl(dn, dnlevel-1, i, TRUE, FTAG, &db);
|
||||
ASSERT3U(err, ==, 0);
|
||||
rw_exit(&dn->dn_struct_rwlock);
|
||||
|
||||
if (free_children(db, blkid, nblks, trunc, tx) == ALL) {
|
||||
ASSERT3P(db->db_blkptr, ==, bp);
|
||||
(void) free_blocks(dn, bp, 1, tx);
|
||||
}
|
||||
dbuf_rele(db, FTAG);
|
||||
}
|
||||
if (trunc) {
|
||||
uint64_t off = (dn->dn_phys->dn_maxblkid + 1) *
|
||||
(dn->dn_phys->dn_datablkszsec << SPA_MINBLOCKSHIFT);
|
||||
dn->dn_phys->dn_maxblkid = (blkid ? blkid - 1 : 0);
|
||||
ASSERT(off < dn->dn_phys->dn_maxblkid ||
|
||||
dn->dn_phys->dn_maxblkid == 0 ||
|
||||
dnode_next_offset(dn, 0, &off, 1, 1, 0) != 0);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Try to kick all the dnodes dbufs out of the cache...
|
||||
*/
|
||||
void
|
||||
dnode_evict_dbufs(dnode_t *dn)
|
||||
{
|
||||
int progress;
|
||||
int pass = 0;
|
||||
|
||||
do {
|
||||
dmu_buf_impl_t *db, marker;
|
||||
int evicting = FALSE;
|
||||
|
||||
progress = FALSE;
|
||||
mutex_enter(&dn->dn_dbufs_mtx);
|
||||
list_insert_tail(&dn->dn_dbufs, &marker);
|
||||
db = list_head(&dn->dn_dbufs);
|
||||
for (; db != ▮ db = list_head(&dn->dn_dbufs)) {
|
||||
list_remove(&dn->dn_dbufs, db);
|
||||
list_insert_tail(&dn->dn_dbufs, db);
|
||||
#ifdef DEBUG
|
||||
DB_DNODE_ENTER(db);
|
||||
ASSERT3P(DB_DNODE(db), ==, dn);
|
||||
DB_DNODE_EXIT(db);
|
||||
#endif /* DEBUG */
|
||||
|
||||
mutex_enter(&db->db_mtx);
|
||||
if (db->db_state == DB_EVICTING) {
|
||||
progress = TRUE;
|
||||
evicting = TRUE;
|
||||
mutex_exit(&db->db_mtx);
|
||||
} else if (refcount_is_zero(&db->db_holds)) {
|
||||
progress = TRUE;
|
||||
dbuf_clear(db); /* exits db_mtx for us */
|
||||
} else {
|
||||
mutex_exit(&db->db_mtx);
|
||||
}
|
||||
|
||||
}
|
||||
list_remove(&dn->dn_dbufs, &marker);
|
||||
/*
|
||||
* NB: we need to drop dn_dbufs_mtx between passes so
|
||||
* that any DB_EVICTING dbufs can make progress.
|
||||
* Ideally, we would have some cv we could wait on, but
|
||||
* since we don't, just wait a bit to give the other
|
||||
* thread a chance to run.
|
||||
*/
|
||||
mutex_exit(&dn->dn_dbufs_mtx);
|
||||
if (evicting)
|
||||
delay(1);
|
||||
pass++;
|
||||
ASSERT(pass < 100); /* sanity check */
|
||||
} while (progress);
|
||||
|
||||
rw_enter(&dn->dn_struct_rwlock, RW_WRITER);
|
||||
if (dn->dn_bonus && refcount_is_zero(&dn->dn_bonus->db_holds)) {
|
||||
mutex_enter(&dn->dn_bonus->db_mtx);
|
||||
dbuf_evict(dn->dn_bonus);
|
||||
dn->dn_bonus = NULL;
|
||||
}
|
||||
rw_exit(&dn->dn_struct_rwlock);
|
||||
}
|
||||
|
||||
static void
|
||||
dnode_undirty_dbufs(list_t *list)
|
||||
{
|
||||
dbuf_dirty_record_t *dr;
|
||||
|
||||
while (dr = list_head(list)) {
|
||||
dmu_buf_impl_t *db = dr->dr_dbuf;
|
||||
uint64_t txg = dr->dr_txg;
|
||||
|
||||
if (db->db_level != 0)
|
||||
dnode_undirty_dbufs(&dr->dt.di.dr_children);
|
||||
|
||||
mutex_enter(&db->db_mtx);
|
||||
/* XXX - use dbuf_undirty()? */
|
||||
list_remove(list, dr);
|
||||
ASSERT(db->db_last_dirty == dr);
|
||||
db->db_last_dirty = NULL;
|
||||
db->db_dirtycnt -= 1;
|
||||
if (db->db_level == 0) {
|
||||
ASSERT(db->db_blkid == DMU_BONUS_BLKID ||
|
||||
dr->dt.dl.dr_data == db->db_buf);
|
||||
dbuf_unoverride(dr);
|
||||
}
|
||||
kmem_free(dr, sizeof (dbuf_dirty_record_t));
|
||||
dbuf_rele_and_unlock(db, (void *)(uintptr_t)txg);
|
||||
}
|
||||
}
|
||||
|
||||
static void
|
||||
dnode_sync_free(dnode_t *dn, dmu_tx_t *tx)
|
||||
{
|
||||
int txgoff = tx->tx_txg & TXG_MASK;
|
||||
|
||||
ASSERT(dmu_tx_is_syncing(tx));
|
||||
|
||||
/*
|
||||
* Our contents should have been freed in dnode_sync() by the
|
||||
* free range record inserted by the caller of dnode_free().
|
||||
*/
|
||||
ASSERT3U(DN_USED_BYTES(dn->dn_phys), ==, 0);
|
||||
ASSERT(BP_IS_HOLE(dn->dn_phys->dn_blkptr));
|
||||
|
||||
dnode_undirty_dbufs(&dn->dn_dirty_records[txgoff]);
|
||||
dnode_evict_dbufs(dn);
|
||||
ASSERT3P(list_head(&dn->dn_dbufs), ==, NULL);
|
||||
|
||||
/*
|
||||
* XXX - It would be nice to assert this, but we may still
|
||||
* have residual holds from async evictions from the arc...
|
||||
*
|
||||
* zfs_obj_to_path() also depends on this being
|
||||
* commented out.
|
||||
*
|
||||
* ASSERT3U(refcount_count(&dn->dn_holds), ==, 1);
|
||||
*/
|
||||
|
||||
/* Undirty next bits */
|
||||
dn->dn_next_nlevels[txgoff] = 0;
|
||||
dn->dn_next_indblkshift[txgoff] = 0;
|
||||
dn->dn_next_blksz[txgoff] = 0;
|
||||
|
||||
/* ASSERT(blkptrs are zero); */
|
||||
ASSERT(dn->dn_phys->dn_type != DMU_OT_NONE);
|
||||
ASSERT(dn->dn_type != DMU_OT_NONE);
|
||||
|
||||
ASSERT(dn->dn_free_txg > 0);
|
||||
if (dn->dn_allocated_txg != dn->dn_free_txg)
|
||||
dbuf_will_dirty(dn->dn_dbuf, tx);
|
||||
bzero(dn->dn_phys, sizeof (dnode_phys_t));
|
||||
|
||||
mutex_enter(&dn->dn_mtx);
|
||||
dn->dn_type = DMU_OT_NONE;
|
||||
dn->dn_maxblkid = 0;
|
||||
dn->dn_allocated_txg = 0;
|
||||
dn->dn_free_txg = 0;
|
||||
dn->dn_have_spill = B_FALSE;
|
||||
mutex_exit(&dn->dn_mtx);
|
||||
|
||||
ASSERT(dn->dn_object != DMU_META_DNODE_OBJECT);
|
||||
|
||||
dnode_rele(dn, (void *)(uintptr_t)tx->tx_txg);
|
||||
/*
|
||||
* Now that we've released our hold, the dnode may
|
||||
* be evicted, so we musn't access it.
|
||||
*/
|
||||
}
|
||||
|
||||
/*
|
||||
* Write out the dnode's dirty buffers.
|
||||
*/
|
||||
void
|
||||
dnode_sync(dnode_t *dn, dmu_tx_t *tx)
|
||||
{
|
||||
free_range_t *rp;
|
||||
dnode_phys_t *dnp = dn->dn_phys;
|
||||
int txgoff = tx->tx_txg & TXG_MASK;
|
||||
list_t *list = &dn->dn_dirty_records[txgoff];
|
||||
static const dnode_phys_t zerodn = { 0 };
|
||||
boolean_t kill_spill = B_FALSE;
|
||||
|
||||
ASSERT(dmu_tx_is_syncing(tx));
|
||||
ASSERT(dnp->dn_type != DMU_OT_NONE || dn->dn_allocated_txg);
|
||||
ASSERT(dnp->dn_type != DMU_OT_NONE ||
|
||||
bcmp(dnp, &zerodn, DNODE_SIZE) == 0);
|
||||
DNODE_VERIFY(dn);
|
||||
|
||||
ASSERT(dn->dn_dbuf == NULL || arc_released(dn->dn_dbuf->db_buf));
|
||||
|
||||
if (dmu_objset_userused_enabled(dn->dn_objset) &&
|
||||
!DMU_OBJECT_IS_SPECIAL(dn->dn_object)) {
|
||||
mutex_enter(&dn->dn_mtx);
|
||||
dn->dn_oldused = DN_USED_BYTES(dn->dn_phys);
|
||||
dn->dn_oldflags = dn->dn_phys->dn_flags;
|
||||
dn->dn_phys->dn_flags |= DNODE_FLAG_USERUSED_ACCOUNTED;
|
||||
mutex_exit(&dn->dn_mtx);
|
||||
dmu_objset_userquota_get_ids(dn, B_FALSE, tx);
|
||||
} else {
|
||||
/* Once we account for it, we should always account for it. */
|
||||
ASSERT(!(dn->dn_phys->dn_flags &
|
||||
DNODE_FLAG_USERUSED_ACCOUNTED));
|
||||
}
|
||||
|
||||
mutex_enter(&dn->dn_mtx);
|
||||
if (dn->dn_allocated_txg == tx->tx_txg) {
|
||||
/* The dnode is newly allocated or reallocated */
|
||||
if (dnp->dn_type == DMU_OT_NONE) {
|
||||
/* this is a first alloc, not a realloc */
|
||||
dnp->dn_nlevels = 1;
|
||||
dnp->dn_nblkptr = dn->dn_nblkptr;
|
||||
}
|
||||
|
||||
dnp->dn_type = dn->dn_type;
|
||||
dnp->dn_bonustype = dn->dn_bonustype;
|
||||
dnp->dn_bonuslen = dn->dn_bonuslen;
|
||||
}
|
||||
|
||||
ASSERT(dnp->dn_nlevels > 1 ||
|
||||
BP_IS_HOLE(&dnp->dn_blkptr[0]) ||
|
||||
BP_GET_LSIZE(&dnp->dn_blkptr[0]) ==
|
||||
dnp->dn_datablkszsec << SPA_MINBLOCKSHIFT);
|
||||
|
||||
if (dn->dn_next_blksz[txgoff]) {
|
||||
ASSERT(P2PHASE(dn->dn_next_blksz[txgoff],
|
||||
SPA_MINBLOCKSIZE) == 0);
|
||||
ASSERT(BP_IS_HOLE(&dnp->dn_blkptr[0]) ||
|
||||
dn->dn_maxblkid == 0 || list_head(list) != NULL ||
|
||||
avl_last(&dn->dn_ranges[txgoff]) ||
|
||||
dn->dn_next_blksz[txgoff] >> SPA_MINBLOCKSHIFT ==
|
||||
dnp->dn_datablkszsec);
|
||||
dnp->dn_datablkszsec =
|
||||
dn->dn_next_blksz[txgoff] >> SPA_MINBLOCKSHIFT;
|
||||
dn->dn_next_blksz[txgoff] = 0;
|
||||
}
|
||||
|
||||
if (dn->dn_next_bonuslen[txgoff]) {
|
||||
if (dn->dn_next_bonuslen[txgoff] == DN_ZERO_BONUSLEN)
|
||||
dnp->dn_bonuslen = 0;
|
||||
else
|
||||
dnp->dn_bonuslen = dn->dn_next_bonuslen[txgoff];
|
||||
ASSERT(dnp->dn_bonuslen <= DN_MAX_BONUSLEN);
|
||||
dn->dn_next_bonuslen[txgoff] = 0;
|
||||
}
|
||||
|
||||
if (dn->dn_next_bonustype[txgoff]) {
|
||||
ASSERT(dn->dn_next_bonustype[txgoff] < DMU_OT_NUMTYPES);
|
||||
dnp->dn_bonustype = dn->dn_next_bonustype[txgoff];
|
||||
dn->dn_next_bonustype[txgoff] = 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* We will either remove a spill block when a file is being removed
|
||||
* or we have been asked to remove it.
|
||||
*/
|
||||
if (dn->dn_rm_spillblk[txgoff] ||
|
||||
((dnp->dn_flags & DNODE_FLAG_SPILL_BLKPTR) &&
|
||||
dn->dn_free_txg > 0 && dn->dn_free_txg <= tx->tx_txg)) {
|
||||
if ((dnp->dn_flags & DNODE_FLAG_SPILL_BLKPTR))
|
||||
kill_spill = B_TRUE;
|
||||
dn->dn_rm_spillblk[txgoff] = 0;
|
||||
}
|
||||
|
||||
if (dn->dn_next_indblkshift[txgoff]) {
|
||||
ASSERT(dnp->dn_nlevels == 1);
|
||||
dnp->dn_indblkshift = dn->dn_next_indblkshift[txgoff];
|
||||
dn->dn_next_indblkshift[txgoff] = 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Just take the live (open-context) values for checksum and compress.
|
||||
* Strictly speaking it's a future leak, but nothing bad happens if we
|
||||
* start using the new checksum or compress algorithm a little early.
|
||||
*/
|
||||
dnp->dn_checksum = dn->dn_checksum;
|
||||
dnp->dn_compress = dn->dn_compress;
|
||||
|
||||
mutex_exit(&dn->dn_mtx);
|
||||
|
||||
if (kill_spill) {
|
||||
(void) free_blocks(dn, &dn->dn_phys->dn_spill, 1, tx);
|
||||
mutex_enter(&dn->dn_mtx);
|
||||
dnp->dn_flags &= ~DNODE_FLAG_SPILL_BLKPTR;
|
||||
mutex_exit(&dn->dn_mtx);
|
||||
}
|
||||
|
||||
/* process all the "freed" ranges in the file */
|
||||
while (rp = avl_last(&dn->dn_ranges[txgoff])) {
|
||||
dnode_sync_free_range(dn, rp->fr_blkid, rp->fr_nblks, tx);
|
||||
/* grab the mutex so we don't race with dnode_block_freed() */
|
||||
mutex_enter(&dn->dn_mtx);
|
||||
avl_remove(&dn->dn_ranges[txgoff], rp);
|
||||
mutex_exit(&dn->dn_mtx);
|
||||
kmem_free(rp, sizeof (free_range_t));
|
||||
}
|
||||
|
||||
if (dn->dn_free_txg > 0 && dn->dn_free_txg <= tx->tx_txg) {
|
||||
dnode_sync_free(dn, tx);
|
||||
return;
|
||||
}
|
||||
|
||||
if (dn->dn_next_nblkptr[txgoff]) {
|
||||
/* this should only happen on a realloc */
|
||||
ASSERT(dn->dn_allocated_txg == tx->tx_txg);
|
||||
if (dn->dn_next_nblkptr[txgoff] > dnp->dn_nblkptr) {
|
||||
/* zero the new blkptrs we are gaining */
|
||||
bzero(dnp->dn_blkptr + dnp->dn_nblkptr,
|
||||
sizeof (blkptr_t) *
|
||||
(dn->dn_next_nblkptr[txgoff] - dnp->dn_nblkptr));
|
||||
#ifdef ZFS_DEBUG
|
||||
} else {
|
||||
int i;
|
||||
ASSERT(dn->dn_next_nblkptr[txgoff] < dnp->dn_nblkptr);
|
||||
/* the blkptrs we are losing better be unallocated */
|
||||
for (i = dn->dn_next_nblkptr[txgoff];
|
||||
i < dnp->dn_nblkptr; i++)
|
||||
ASSERT(BP_IS_HOLE(&dnp->dn_blkptr[i]));
|
||||
#endif
|
||||
}
|
||||
mutex_enter(&dn->dn_mtx);
|
||||
dnp->dn_nblkptr = dn->dn_next_nblkptr[txgoff];
|
||||
dn->dn_next_nblkptr[txgoff] = 0;
|
||||
mutex_exit(&dn->dn_mtx);
|
||||
}
|
||||
|
||||
if (dn->dn_next_nlevels[txgoff]) {
|
||||
dnode_increase_indirection(dn, tx);
|
||||
dn->dn_next_nlevels[txgoff] = 0;
|
||||
}
|
||||
|
||||
dbuf_sync_list(list, tx);
|
||||
|
||||
if (!DMU_OBJECT_IS_SPECIAL(dn->dn_object)) {
|
||||
ASSERT3P(list_head(list), ==, NULL);
|
||||
dnode_rele(dn, (void *)(uintptr_t)tx->tx_txg);
|
||||
}
|
||||
|
||||
/*
|
||||
* Although we have dropped our reference to the dnode, it
|
||||
* can't be evicted until its written, and we haven't yet
|
||||
* initiated the IO for the dnode's dbuf.
|
||||
*/
|
||||
}
|
4030
uts/common/fs/zfs/dsl_dataset.c
Normal file
4030
uts/common/fs/zfs/dsl_dataset.c
Normal file
File diff suppressed because it is too large
Load Diff
474
uts/common/fs/zfs/dsl_deadlist.c
Normal file
474
uts/common/fs/zfs/dsl_deadlist.c
Normal file
@ -0,0 +1,474 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#include <sys/dsl_dataset.h>
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/refcount.h>
|
||||
#include <sys/zap.h>
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/dsl_pool.h>
|
||||
|
||||
static int
|
||||
dsl_deadlist_compare(const void *arg1, const void *arg2)
|
||||
{
|
||||
const dsl_deadlist_entry_t *dle1 = arg1;
|
||||
const dsl_deadlist_entry_t *dle2 = arg2;
|
||||
|
||||
if (dle1->dle_mintxg < dle2->dle_mintxg)
|
||||
return (-1);
|
||||
else if (dle1->dle_mintxg > dle2->dle_mintxg)
|
||||
return (+1);
|
||||
else
|
||||
return (0);
|
||||
}
|
||||
|
||||
static void
|
||||
dsl_deadlist_load_tree(dsl_deadlist_t *dl)
|
||||
{
|
||||
zap_cursor_t zc;
|
||||
zap_attribute_t za;
|
||||
|
||||
ASSERT(!dl->dl_oldfmt);
|
||||
if (dl->dl_havetree)
|
||||
return;
|
||||
|
||||
avl_create(&dl->dl_tree, dsl_deadlist_compare,
|
||||
sizeof (dsl_deadlist_entry_t),
|
||||
offsetof(dsl_deadlist_entry_t, dle_node));
|
||||
for (zap_cursor_init(&zc, dl->dl_os, dl->dl_object);
|
||||
zap_cursor_retrieve(&zc, &za) == 0;
|
||||
zap_cursor_advance(&zc)) {
|
||||
dsl_deadlist_entry_t *dle = kmem_alloc(sizeof (*dle), KM_SLEEP);
|
||||
dle->dle_mintxg = strtonum(za.za_name, NULL);
|
||||
VERIFY3U(0, ==, bpobj_open(&dle->dle_bpobj, dl->dl_os,
|
||||
za.za_first_integer));
|
||||
avl_add(&dl->dl_tree, dle);
|
||||
}
|
||||
zap_cursor_fini(&zc);
|
||||
dl->dl_havetree = B_TRUE;
|
||||
}
|
||||
|
||||
void
|
||||
dsl_deadlist_open(dsl_deadlist_t *dl, objset_t *os, uint64_t object)
|
||||
{
|
||||
dmu_object_info_t doi;
|
||||
|
||||
mutex_init(&dl->dl_lock, NULL, MUTEX_DEFAULT, NULL);
|
||||
dl->dl_os = os;
|
||||
dl->dl_object = object;
|
||||
VERIFY3U(0, ==, dmu_bonus_hold(os, object, dl, &dl->dl_dbuf));
|
||||
dmu_object_info_from_db(dl->dl_dbuf, &doi);
|
||||
if (doi.doi_type == DMU_OT_BPOBJ) {
|
||||
dmu_buf_rele(dl->dl_dbuf, dl);
|
||||
dl->dl_dbuf = NULL;
|
||||
dl->dl_oldfmt = B_TRUE;
|
||||
VERIFY3U(0, ==, bpobj_open(&dl->dl_bpobj, os, object));
|
||||
return;
|
||||
}
|
||||
|
||||
dl->dl_oldfmt = B_FALSE;
|
||||
dl->dl_phys = dl->dl_dbuf->db_data;
|
||||
dl->dl_havetree = B_FALSE;
|
||||
}
|
||||
|
||||
void
|
||||
dsl_deadlist_close(dsl_deadlist_t *dl)
|
||||
{
|
||||
void *cookie = NULL;
|
||||
dsl_deadlist_entry_t *dle;
|
||||
|
||||
if (dl->dl_oldfmt) {
|
||||
dl->dl_oldfmt = B_FALSE;
|
||||
bpobj_close(&dl->dl_bpobj);
|
||||
return;
|
||||
}
|
||||
|
||||
if (dl->dl_havetree) {
|
||||
while ((dle = avl_destroy_nodes(&dl->dl_tree, &cookie))
|
||||
!= NULL) {
|
||||
bpobj_close(&dle->dle_bpobj);
|
||||
kmem_free(dle, sizeof (*dle));
|
||||
}
|
||||
avl_destroy(&dl->dl_tree);
|
||||
}
|
||||
dmu_buf_rele(dl->dl_dbuf, dl);
|
||||
mutex_destroy(&dl->dl_lock);
|
||||
dl->dl_dbuf = NULL;
|
||||
dl->dl_phys = NULL;
|
||||
}
|
||||
|
||||
uint64_t
|
||||
dsl_deadlist_alloc(objset_t *os, dmu_tx_t *tx)
|
||||
{
|
||||
if (spa_version(dmu_objset_spa(os)) < SPA_VERSION_DEADLISTS)
|
||||
return (bpobj_alloc(os, SPA_MAXBLOCKSIZE, tx));
|
||||
return (zap_create(os, DMU_OT_DEADLIST, DMU_OT_DEADLIST_HDR,
|
||||
sizeof (dsl_deadlist_phys_t), tx));
|
||||
}
|
||||
|
||||
void
|
||||
dsl_deadlist_free(objset_t *os, uint64_t dlobj, dmu_tx_t *tx)
|
||||
{
|
||||
dmu_object_info_t doi;
|
||||
zap_cursor_t zc;
|
||||
zap_attribute_t za;
|
||||
|
||||
VERIFY3U(0, ==, dmu_object_info(os, dlobj, &doi));
|
||||
if (doi.doi_type == DMU_OT_BPOBJ) {
|
||||
bpobj_free(os, dlobj, tx);
|
||||
return;
|
||||
}
|
||||
|
||||
for (zap_cursor_init(&zc, os, dlobj);
|
||||
zap_cursor_retrieve(&zc, &za) == 0;
|
||||
zap_cursor_advance(&zc))
|
||||
bpobj_free(os, za.za_first_integer, tx);
|
||||
zap_cursor_fini(&zc);
|
||||
VERIFY3U(0, ==, dmu_object_free(os, dlobj, tx));
|
||||
}
|
||||
|
||||
void
|
||||
dsl_deadlist_insert(dsl_deadlist_t *dl, const blkptr_t *bp, dmu_tx_t *tx)
|
||||
{
|
||||
dsl_deadlist_entry_t dle_tofind;
|
||||
dsl_deadlist_entry_t *dle;
|
||||
avl_index_t where;
|
||||
|
||||
if (dl->dl_oldfmt) {
|
||||
bpobj_enqueue(&dl->dl_bpobj, bp, tx);
|
||||
return;
|
||||
}
|
||||
|
||||
dsl_deadlist_load_tree(dl);
|
||||
|
||||
dmu_buf_will_dirty(dl->dl_dbuf, tx);
|
||||
mutex_enter(&dl->dl_lock);
|
||||
dl->dl_phys->dl_used +=
|
||||
bp_get_dsize_sync(dmu_objset_spa(dl->dl_os), bp);
|
||||
dl->dl_phys->dl_comp += BP_GET_PSIZE(bp);
|
||||
dl->dl_phys->dl_uncomp += BP_GET_UCSIZE(bp);
|
||||
mutex_exit(&dl->dl_lock);
|
||||
|
||||
dle_tofind.dle_mintxg = bp->blk_birth;
|
||||
dle = avl_find(&dl->dl_tree, &dle_tofind, &where);
|
||||
if (dle == NULL)
|
||||
dle = avl_nearest(&dl->dl_tree, where, AVL_BEFORE);
|
||||
else
|
||||
dle = AVL_PREV(&dl->dl_tree, dle);
|
||||
bpobj_enqueue(&dle->dle_bpobj, bp, tx);
|
||||
}
|
||||
|
||||
/*
|
||||
* Insert new key in deadlist, which must be > all current entries.
|
||||
* mintxg is not inclusive.
|
||||
*/
|
||||
void
|
||||
dsl_deadlist_add_key(dsl_deadlist_t *dl, uint64_t mintxg, dmu_tx_t *tx)
|
||||
{
|
||||
uint64_t obj;
|
||||
dsl_deadlist_entry_t *dle;
|
||||
|
||||
if (dl->dl_oldfmt)
|
||||
return;
|
||||
|
||||
dsl_deadlist_load_tree(dl);
|
||||
|
||||
dle = kmem_alloc(sizeof (*dle), KM_SLEEP);
|
||||
dle->dle_mintxg = mintxg;
|
||||
obj = bpobj_alloc(dl->dl_os, SPA_MAXBLOCKSIZE, tx);
|
||||
VERIFY3U(0, ==, bpobj_open(&dle->dle_bpobj, dl->dl_os, obj));
|
||||
avl_add(&dl->dl_tree, dle);
|
||||
|
||||
VERIFY3U(0, ==, zap_add_int_key(dl->dl_os, dl->dl_object,
|
||||
mintxg, obj, tx));
|
||||
}
|
||||
|
||||
/*
|
||||
* Remove this key, merging its entries into the previous key.
|
||||
*/
|
||||
void
|
||||
dsl_deadlist_remove_key(dsl_deadlist_t *dl, uint64_t mintxg, dmu_tx_t *tx)
|
||||
{
|
||||
dsl_deadlist_entry_t dle_tofind;
|
||||
dsl_deadlist_entry_t *dle, *dle_prev;
|
||||
|
||||
if (dl->dl_oldfmt)
|
||||
return;
|
||||
|
||||
dsl_deadlist_load_tree(dl);
|
||||
|
||||
dle_tofind.dle_mintxg = mintxg;
|
||||
dle = avl_find(&dl->dl_tree, &dle_tofind, NULL);
|
||||
dle_prev = AVL_PREV(&dl->dl_tree, dle);
|
||||
|
||||
bpobj_enqueue_subobj(&dle_prev->dle_bpobj,
|
||||
dle->dle_bpobj.bpo_object, tx);
|
||||
|
||||
avl_remove(&dl->dl_tree, dle);
|
||||
bpobj_close(&dle->dle_bpobj);
|
||||
kmem_free(dle, sizeof (*dle));
|
||||
|
||||
VERIFY3U(0, ==, zap_remove_int(dl->dl_os, dl->dl_object, mintxg, tx));
|
||||
}
|
||||
|
||||
/*
|
||||
* Walk ds's snapshots to regenerate generate ZAP & AVL.
|
||||
*/
|
||||
static void
|
||||
dsl_deadlist_regenerate(objset_t *os, uint64_t dlobj,
|
||||
uint64_t mrs_obj, dmu_tx_t *tx)
|
||||
{
|
||||
dsl_deadlist_t dl;
|
||||
dsl_pool_t *dp = dmu_objset_pool(os);
|
||||
|
||||
dsl_deadlist_open(&dl, os, dlobj);
|
||||
if (dl.dl_oldfmt) {
|
||||
dsl_deadlist_close(&dl);
|
||||
return;
|
||||
}
|
||||
|
||||
while (mrs_obj != 0) {
|
||||
dsl_dataset_t *ds;
|
||||
VERIFY3U(0, ==, dsl_dataset_hold_obj(dp, mrs_obj, FTAG, &ds));
|
||||
dsl_deadlist_add_key(&dl, ds->ds_phys->ds_prev_snap_txg, tx);
|
||||
mrs_obj = ds->ds_phys->ds_prev_snap_obj;
|
||||
dsl_dataset_rele(ds, FTAG);
|
||||
}
|
||||
dsl_deadlist_close(&dl);
|
||||
}
|
||||
|
||||
uint64_t
|
||||
dsl_deadlist_clone(dsl_deadlist_t *dl, uint64_t maxtxg,
|
||||
uint64_t mrs_obj, dmu_tx_t *tx)
|
||||
{
|
||||
dsl_deadlist_entry_t *dle;
|
||||
uint64_t newobj;
|
||||
|
||||
newobj = dsl_deadlist_alloc(dl->dl_os, tx);
|
||||
|
||||
if (dl->dl_oldfmt) {
|
||||
dsl_deadlist_regenerate(dl->dl_os, newobj, mrs_obj, tx);
|
||||
return (newobj);
|
||||
}
|
||||
|
||||
dsl_deadlist_load_tree(dl);
|
||||
|
||||
for (dle = avl_first(&dl->dl_tree); dle;
|
||||
dle = AVL_NEXT(&dl->dl_tree, dle)) {
|
||||
uint64_t obj;
|
||||
|
||||
if (dle->dle_mintxg >= maxtxg)
|
||||
break;
|
||||
|
||||
obj = bpobj_alloc(dl->dl_os, SPA_MAXBLOCKSIZE, tx);
|
||||
VERIFY3U(0, ==, zap_add_int_key(dl->dl_os, newobj,
|
||||
dle->dle_mintxg, obj, tx));
|
||||
}
|
||||
return (newobj);
|
||||
}
|
||||
|
||||
void
|
||||
dsl_deadlist_space(dsl_deadlist_t *dl,
|
||||
uint64_t *usedp, uint64_t *compp, uint64_t *uncompp)
|
||||
{
|
||||
if (dl->dl_oldfmt) {
|
||||
VERIFY3U(0, ==, bpobj_space(&dl->dl_bpobj,
|
||||
usedp, compp, uncompp));
|
||||
return;
|
||||
}
|
||||
|
||||
mutex_enter(&dl->dl_lock);
|
||||
*usedp = dl->dl_phys->dl_used;
|
||||
*compp = dl->dl_phys->dl_comp;
|
||||
*uncompp = dl->dl_phys->dl_uncomp;
|
||||
mutex_exit(&dl->dl_lock);
|
||||
}
|
||||
|
||||
/*
|
||||
* return space used in the range (mintxg, maxtxg].
|
||||
* Includes maxtxg, does not include mintxg.
|
||||
* mintxg and maxtxg must both be keys in the deadlist (unless maxtxg is
|
||||
* UINT64_MAX).
|
||||
*/
|
||||
void
|
||||
dsl_deadlist_space_range(dsl_deadlist_t *dl, uint64_t mintxg, uint64_t maxtxg,
|
||||
uint64_t *usedp, uint64_t *compp, uint64_t *uncompp)
|
||||
{
|
||||
dsl_deadlist_entry_t dle_tofind;
|
||||
dsl_deadlist_entry_t *dle;
|
||||
avl_index_t where;
|
||||
|
||||
if (dl->dl_oldfmt) {
|
||||
VERIFY3U(0, ==, bpobj_space_range(&dl->dl_bpobj,
|
||||
mintxg, maxtxg, usedp, compp, uncompp));
|
||||
return;
|
||||
}
|
||||
|
||||
dsl_deadlist_load_tree(dl);
|
||||
*usedp = *compp = *uncompp = 0;
|
||||
|
||||
dle_tofind.dle_mintxg = mintxg;
|
||||
dle = avl_find(&dl->dl_tree, &dle_tofind, &where);
|
||||
/*
|
||||
* If we don't find this mintxg, there shouldn't be anything
|
||||
* after it either.
|
||||
*/
|
||||
ASSERT(dle != NULL ||
|
||||
avl_nearest(&dl->dl_tree, where, AVL_AFTER) == NULL);
|
||||
for (; dle && dle->dle_mintxg < maxtxg;
|
||||
dle = AVL_NEXT(&dl->dl_tree, dle)) {
|
||||
uint64_t used, comp, uncomp;
|
||||
|
||||
VERIFY3U(0, ==, bpobj_space(&dle->dle_bpobj,
|
||||
&used, &comp, &uncomp));
|
||||
|
||||
*usedp += used;
|
||||
*compp += comp;
|
||||
*uncompp += uncomp;
|
||||
}
|
||||
}
|
||||
|
||||
static void
|
||||
dsl_deadlist_insert_bpobj(dsl_deadlist_t *dl, uint64_t obj, uint64_t birth,
|
||||
dmu_tx_t *tx)
|
||||
{
|
||||
dsl_deadlist_entry_t dle_tofind;
|
||||
dsl_deadlist_entry_t *dle;
|
||||
avl_index_t where;
|
||||
uint64_t used, comp, uncomp;
|
||||
bpobj_t bpo;
|
||||
|
||||
VERIFY3U(0, ==, bpobj_open(&bpo, dl->dl_os, obj));
|
||||
VERIFY3U(0, ==, bpobj_space(&bpo, &used, &comp, &uncomp));
|
||||
bpobj_close(&bpo);
|
||||
|
||||
dsl_deadlist_load_tree(dl);
|
||||
|
||||
dmu_buf_will_dirty(dl->dl_dbuf, tx);
|
||||
mutex_enter(&dl->dl_lock);
|
||||
dl->dl_phys->dl_used += used;
|
||||
dl->dl_phys->dl_comp += comp;
|
||||
dl->dl_phys->dl_uncomp += uncomp;
|
||||
mutex_exit(&dl->dl_lock);
|
||||
|
||||
dle_tofind.dle_mintxg = birth;
|
||||
dle = avl_find(&dl->dl_tree, &dle_tofind, &where);
|
||||
if (dle == NULL)
|
||||
dle = avl_nearest(&dl->dl_tree, where, AVL_BEFORE);
|
||||
bpobj_enqueue_subobj(&dle->dle_bpobj, obj, tx);
|
||||
}
|
||||
|
||||
static int
|
||||
dsl_deadlist_insert_cb(void *arg, const blkptr_t *bp, dmu_tx_t *tx)
|
||||
{
|
||||
dsl_deadlist_t *dl = arg;
|
||||
dsl_deadlist_insert(dl, bp, tx);
|
||||
return (0);
|
||||
}
|
||||
|
||||
/*
|
||||
* Merge the deadlist pointed to by 'obj' into dl. obj will be left as
|
||||
* an empty deadlist.
|
||||
*/
|
||||
void
|
||||
dsl_deadlist_merge(dsl_deadlist_t *dl, uint64_t obj, dmu_tx_t *tx)
|
||||
{
|
||||
zap_cursor_t zc;
|
||||
zap_attribute_t za;
|
||||
dmu_buf_t *bonus;
|
||||
dsl_deadlist_phys_t *dlp;
|
||||
dmu_object_info_t doi;
|
||||
|
||||
VERIFY3U(0, ==, dmu_object_info(dl->dl_os, obj, &doi));
|
||||
if (doi.doi_type == DMU_OT_BPOBJ) {
|
||||
bpobj_t bpo;
|
||||
VERIFY3U(0, ==, bpobj_open(&bpo, dl->dl_os, obj));
|
||||
VERIFY3U(0, ==, bpobj_iterate(&bpo,
|
||||
dsl_deadlist_insert_cb, dl, tx));
|
||||
bpobj_close(&bpo);
|
||||
return;
|
||||
}
|
||||
|
||||
for (zap_cursor_init(&zc, dl->dl_os, obj);
|
||||
zap_cursor_retrieve(&zc, &za) == 0;
|
||||
zap_cursor_advance(&zc)) {
|
||||
uint64_t mintxg = strtonum(za.za_name, NULL);
|
||||
dsl_deadlist_insert_bpobj(dl, za.za_first_integer, mintxg, tx);
|
||||
VERIFY3U(0, ==, zap_remove_int(dl->dl_os, obj, mintxg, tx));
|
||||
}
|
||||
zap_cursor_fini(&zc);
|
||||
|
||||
VERIFY3U(0, ==, dmu_bonus_hold(dl->dl_os, obj, FTAG, &bonus));
|
||||
dlp = bonus->db_data;
|
||||
dmu_buf_will_dirty(bonus, tx);
|
||||
bzero(dlp, sizeof (*dlp));
|
||||
dmu_buf_rele(bonus, FTAG);
|
||||
}
|
||||
|
||||
/*
|
||||
* Remove entries on dl that are >= mintxg, and put them on the bpobj.
|
||||
*/
|
||||
void
|
||||
dsl_deadlist_move_bpobj(dsl_deadlist_t *dl, bpobj_t *bpo, uint64_t mintxg,
|
||||
dmu_tx_t *tx)
|
||||
{
|
||||
dsl_deadlist_entry_t dle_tofind;
|
||||
dsl_deadlist_entry_t *dle;
|
||||
avl_index_t where;
|
||||
|
||||
ASSERT(!dl->dl_oldfmt);
|
||||
dmu_buf_will_dirty(dl->dl_dbuf, tx);
|
||||
dsl_deadlist_load_tree(dl);
|
||||
|
||||
dle_tofind.dle_mintxg = mintxg;
|
||||
dle = avl_find(&dl->dl_tree, &dle_tofind, &where);
|
||||
if (dle == NULL)
|
||||
dle = avl_nearest(&dl->dl_tree, where, AVL_AFTER);
|
||||
while (dle) {
|
||||
uint64_t used, comp, uncomp;
|
||||
dsl_deadlist_entry_t *dle_next;
|
||||
|
||||
bpobj_enqueue_subobj(bpo, dle->dle_bpobj.bpo_object, tx);
|
||||
|
||||
VERIFY3U(0, ==, bpobj_space(&dle->dle_bpobj,
|
||||
&used, &comp, &uncomp));
|
||||
mutex_enter(&dl->dl_lock);
|
||||
ASSERT3U(dl->dl_phys->dl_used, >=, used);
|
||||
ASSERT3U(dl->dl_phys->dl_comp, >=, comp);
|
||||
ASSERT3U(dl->dl_phys->dl_uncomp, >=, uncomp);
|
||||
dl->dl_phys->dl_used -= used;
|
||||
dl->dl_phys->dl_comp -= comp;
|
||||
dl->dl_phys->dl_uncomp -= uncomp;
|
||||
mutex_exit(&dl->dl_lock);
|
||||
|
||||
VERIFY3U(0, ==, zap_remove_int(dl->dl_os, dl->dl_object,
|
||||
dle->dle_mintxg, tx));
|
||||
|
||||
dle_next = AVL_NEXT(&dl->dl_tree, dle);
|
||||
avl_remove(&dl->dl_tree, dle);
|
||||
bpobj_close(&dle->dle_bpobj);
|
||||
kmem_free(dle, sizeof (*dle));
|
||||
dle = dle_next;
|
||||
}
|
||||
}
|
746
uts/common/fs/zfs/dsl_deleg.c
Normal file
746
uts/common/fs/zfs/dsl_deleg.c
Normal file
@ -0,0 +1,746 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
/*
|
||||
* DSL permissions are stored in a two level zap attribute
|
||||
* mechanism. The first level identifies the "class" of
|
||||
* entry. The class is identified by the first 2 letters of
|
||||
* the attribute. The second letter "l" or "d" identifies whether
|
||||
* it is a local or descendent permission. The first letter
|
||||
* identifies the type of entry.
|
||||
*
|
||||
* ul$<id> identifies permissions granted locally for this userid.
|
||||
* ud$<id> identifies permissions granted on descendent datasets for
|
||||
* this userid.
|
||||
* Ul$<id> identifies permission sets granted locally for this userid.
|
||||
* Ud$<id> identifies permission sets granted on descendent datasets for
|
||||
* this userid.
|
||||
* gl$<id> identifies permissions granted locally for this groupid.
|
||||
* gd$<id> identifies permissions granted on descendent datasets for
|
||||
* this groupid.
|
||||
* Gl$<id> identifies permission sets granted locally for this groupid.
|
||||
* Gd$<id> identifies permission sets granted on descendent datasets for
|
||||
* this groupid.
|
||||
* el$ identifies permissions granted locally for everyone.
|
||||
* ed$ identifies permissions granted on descendent datasets
|
||||
* for everyone.
|
||||
* El$ identifies permission sets granted locally for everyone.
|
||||
* Ed$ identifies permission sets granted to descendent datasets for
|
||||
* everyone.
|
||||
* c-$ identifies permission to create at dataset creation time.
|
||||
* C-$ identifies permission sets to grant locally at dataset creation
|
||||
* time.
|
||||
* s-$@<name> permissions defined in specified set @<name>
|
||||
* S-$@<name> Sets defined in named set @<name>
|
||||
*
|
||||
* Each of the above entities points to another zap attribute that contains one
|
||||
* attribute for each allowed permission, such as create, destroy,...
|
||||
* All of the "upper" case class types will specify permission set names
|
||||
* rather than permissions.
|
||||
*
|
||||
* Basically it looks something like this:
|
||||
* ul$12 -> ZAP OBJ -> permissions...
|
||||
*
|
||||
* The ZAP OBJ is referred to as the jump object.
|
||||
*/
|
||||
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/dmu_objset.h>
|
||||
#include <sys/dmu_tx.h>
|
||||
#include <sys/dsl_dataset.h>
|
||||
#include <sys/dsl_dir.h>
|
||||
#include <sys/dsl_prop.h>
|
||||
#include <sys/dsl_synctask.h>
|
||||
#include <sys/dsl_deleg.h>
|
||||
#include <sys/spa.h>
|
||||
#include <sys/zap.h>
|
||||
#include <sys/fs/zfs.h>
|
||||
#include <sys/cred.h>
|
||||
#include <sys/sunddi.h>
|
||||
|
||||
#include "zfs_deleg.h"
|
||||
|
||||
/*
|
||||
* Validate that user is allowed to delegate specified permissions.
|
||||
*
|
||||
* In order to delegate "create" you must have "create"
|
||||
* and "allow".
|
||||
*/
|
||||
int
|
||||
dsl_deleg_can_allow(char *ddname, nvlist_t *nvp, cred_t *cr)
|
||||
{
|
||||
nvpair_t *whopair = NULL;
|
||||
int error;
|
||||
|
||||
if ((error = dsl_deleg_access(ddname, ZFS_DELEG_PERM_ALLOW, cr)) != 0)
|
||||
return (error);
|
||||
|
||||
while (whopair = nvlist_next_nvpair(nvp, whopair)) {
|
||||
nvlist_t *perms;
|
||||
nvpair_t *permpair = NULL;
|
||||
|
||||
VERIFY(nvpair_value_nvlist(whopair, &perms) == 0);
|
||||
|
||||
while (permpair = nvlist_next_nvpair(perms, permpair)) {
|
||||
const char *perm = nvpair_name(permpair);
|
||||
|
||||
if (strcmp(perm, ZFS_DELEG_PERM_ALLOW) == 0)
|
||||
return (EPERM);
|
||||
|
||||
if ((error = dsl_deleg_access(ddname, perm, cr)) != 0)
|
||||
return (error);
|
||||
}
|
||||
}
|
||||
return (0);
|
||||
}
|
||||
|
||||
/*
|
||||
* Validate that user is allowed to unallow specified permissions. They
|
||||
* must have the 'allow' permission, and even then can only unallow
|
||||
* perms for their uid.
|
||||
*/
|
||||
int
|
||||
dsl_deleg_can_unallow(char *ddname, nvlist_t *nvp, cred_t *cr)
|
||||
{
|
||||
nvpair_t *whopair = NULL;
|
||||
int error;
|
||||
char idstr[32];
|
||||
|
||||
if ((error = dsl_deleg_access(ddname, ZFS_DELEG_PERM_ALLOW, cr)) != 0)
|
||||
return (error);
|
||||
|
||||
(void) snprintf(idstr, sizeof (idstr), "%lld",
|
||||
(longlong_t)crgetuid(cr));
|
||||
|
||||
while (whopair = nvlist_next_nvpair(nvp, whopair)) {
|
||||
zfs_deleg_who_type_t type = nvpair_name(whopair)[0];
|
||||
|
||||
if (type != ZFS_DELEG_USER &&
|
||||
type != ZFS_DELEG_USER_SETS)
|
||||
return (EPERM);
|
||||
|
||||
if (strcmp(idstr, &nvpair_name(whopair)[3]) != 0)
|
||||
return (EPERM);
|
||||
}
|
||||
return (0);
|
||||
}
|
||||
|
||||
static void
|
||||
dsl_deleg_set_sync(void *arg1, void *arg2, dmu_tx_t *tx)
|
||||
{
|
||||
dsl_dir_t *dd = arg1;
|
||||
nvlist_t *nvp = arg2;
|
||||
objset_t *mos = dd->dd_pool->dp_meta_objset;
|
||||
nvpair_t *whopair = NULL;
|
||||
uint64_t zapobj = dd->dd_phys->dd_deleg_zapobj;
|
||||
|
||||
if (zapobj == 0) {
|
||||
dmu_buf_will_dirty(dd->dd_dbuf, tx);
|
||||
zapobj = dd->dd_phys->dd_deleg_zapobj = zap_create(mos,
|
||||
DMU_OT_DSL_PERMS, DMU_OT_NONE, 0, tx);
|
||||
}
|
||||
|
||||
while (whopair = nvlist_next_nvpair(nvp, whopair)) {
|
||||
const char *whokey = nvpair_name(whopair);
|
||||
nvlist_t *perms;
|
||||
nvpair_t *permpair = NULL;
|
||||
uint64_t jumpobj;
|
||||
|
||||
VERIFY(nvpair_value_nvlist(whopair, &perms) == 0);
|
||||
|
||||
if (zap_lookup(mos, zapobj, whokey, 8, 1, &jumpobj) != 0) {
|
||||
jumpobj = zap_create(mos, DMU_OT_DSL_PERMS,
|
||||
DMU_OT_NONE, 0, tx);
|
||||
VERIFY(zap_update(mos, zapobj,
|
||||
whokey, 8, 1, &jumpobj, tx) == 0);
|
||||
}
|
||||
|
||||
while (permpair = nvlist_next_nvpair(perms, permpair)) {
|
||||
const char *perm = nvpair_name(permpair);
|
||||
uint64_t n = 0;
|
||||
|
||||
VERIFY(zap_update(mos, jumpobj,
|
||||
perm, 8, 1, &n, tx) == 0);
|
||||
spa_history_log_internal(LOG_DS_PERM_UPDATE,
|
||||
dd->dd_pool->dp_spa, tx,
|
||||
"%s %s dataset = %llu", whokey, perm,
|
||||
dd->dd_phys->dd_head_dataset_obj);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
static void
|
||||
dsl_deleg_unset_sync(void *arg1, void *arg2, dmu_tx_t *tx)
|
||||
{
|
||||
dsl_dir_t *dd = arg1;
|
||||
nvlist_t *nvp = arg2;
|
||||
objset_t *mos = dd->dd_pool->dp_meta_objset;
|
||||
nvpair_t *whopair = NULL;
|
||||
uint64_t zapobj = dd->dd_phys->dd_deleg_zapobj;
|
||||
|
||||
if (zapobj == 0)
|
||||
return;
|
||||
|
||||
while (whopair = nvlist_next_nvpair(nvp, whopair)) {
|
||||
const char *whokey = nvpair_name(whopair);
|
||||
nvlist_t *perms;
|
||||
nvpair_t *permpair = NULL;
|
||||
uint64_t jumpobj;
|
||||
|
||||
if (nvpair_value_nvlist(whopair, &perms) != 0) {
|
||||
if (zap_lookup(mos, zapobj, whokey, 8,
|
||||
1, &jumpobj) == 0) {
|
||||
(void) zap_remove(mos, zapobj, whokey, tx);
|
||||
VERIFY(0 == zap_destroy(mos, jumpobj, tx));
|
||||
}
|
||||
spa_history_log_internal(LOG_DS_PERM_WHO_REMOVE,
|
||||
dd->dd_pool->dp_spa, tx,
|
||||
"%s dataset = %llu", whokey,
|
||||
dd->dd_phys->dd_head_dataset_obj);
|
||||
continue;
|
||||
}
|
||||
|
||||
if (zap_lookup(mos, zapobj, whokey, 8, 1, &jumpobj) != 0)
|
||||
continue;
|
||||
|
||||
while (permpair = nvlist_next_nvpair(perms, permpair)) {
|
||||
const char *perm = nvpair_name(permpair);
|
||||
uint64_t n = 0;
|
||||
|
||||
(void) zap_remove(mos, jumpobj, perm, tx);
|
||||
if (zap_count(mos, jumpobj, &n) == 0 && n == 0) {
|
||||
(void) zap_remove(mos, zapobj,
|
||||
whokey, tx);
|
||||
VERIFY(0 == zap_destroy(mos,
|
||||
jumpobj, tx));
|
||||
}
|
||||
spa_history_log_internal(LOG_DS_PERM_REMOVE,
|
||||
dd->dd_pool->dp_spa, tx,
|
||||
"%s %s dataset = %llu", whokey, perm,
|
||||
dd->dd_phys->dd_head_dataset_obj);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
int
|
||||
dsl_deleg_set(const char *ddname, nvlist_t *nvp, boolean_t unset)
|
||||
{
|
||||
dsl_dir_t *dd;
|
||||
int error;
|
||||
nvpair_t *whopair = NULL;
|
||||
int blocks_modified = 0;
|
||||
|
||||
error = dsl_dir_open(ddname, FTAG, &dd, NULL);
|
||||
if (error)
|
||||
return (error);
|
||||
|
||||
if (spa_version(dmu_objset_spa(dd->dd_pool->dp_meta_objset)) <
|
||||
SPA_VERSION_DELEGATED_PERMS) {
|
||||
dsl_dir_close(dd, FTAG);
|
||||
return (ENOTSUP);
|
||||
}
|
||||
|
||||
while (whopair = nvlist_next_nvpair(nvp, whopair))
|
||||
blocks_modified++;
|
||||
|
||||
error = dsl_sync_task_do(dd->dd_pool, NULL,
|
||||
unset ? dsl_deleg_unset_sync : dsl_deleg_set_sync,
|
||||
dd, nvp, blocks_modified);
|
||||
dsl_dir_close(dd, FTAG);
|
||||
|
||||
return (error);
|
||||
}
|
||||
|
||||
/*
|
||||
* Find all 'allow' permissions from a given point and then continue
|
||||
* traversing up to the root.
|
||||
*
|
||||
* This function constructs an nvlist of nvlists.
|
||||
* each setpoint is an nvlist composed of an nvlist of an nvlist
|
||||
* of the individual * users/groups/everyone/create
|
||||
* permissions.
|
||||
*
|
||||
* The nvlist will look like this.
|
||||
*
|
||||
* { source fsname -> { whokeys { permissions,...}, ...}}
|
||||
*
|
||||
* The fsname nvpairs will be arranged in a bottom up order. For example,
|
||||
* if we have the following structure a/b/c then the nvpairs for the fsnames
|
||||
* will be ordered a/b/c, a/b, a.
|
||||
*/
|
||||
int
|
||||
dsl_deleg_get(const char *ddname, nvlist_t **nvp)
|
||||
{
|
||||
dsl_dir_t *dd, *startdd;
|
||||
dsl_pool_t *dp;
|
||||
int error;
|
||||
objset_t *mos;
|
||||
|
||||
error = dsl_dir_open(ddname, FTAG, &startdd, NULL);
|
||||
if (error)
|
||||
return (error);
|
||||
|
||||
dp = startdd->dd_pool;
|
||||
mos = dp->dp_meta_objset;
|
||||
|
||||
VERIFY(nvlist_alloc(nvp, NV_UNIQUE_NAME, KM_SLEEP) == 0);
|
||||
|
||||
rw_enter(&dp->dp_config_rwlock, RW_READER);
|
||||
for (dd = startdd; dd != NULL; dd = dd->dd_parent) {
|
||||
zap_cursor_t basezc;
|
||||
zap_attribute_t baseza;
|
||||
nvlist_t *sp_nvp;
|
||||
uint64_t n;
|
||||
char source[MAXNAMELEN];
|
||||
|
||||
if (dd->dd_phys->dd_deleg_zapobj &&
|
||||
(zap_count(mos, dd->dd_phys->dd_deleg_zapobj,
|
||||
&n) == 0) && n) {
|
||||
VERIFY(nvlist_alloc(&sp_nvp,
|
||||
NV_UNIQUE_NAME, KM_SLEEP) == 0);
|
||||
} else {
|
||||
continue;
|
||||
}
|
||||
|
||||
for (zap_cursor_init(&basezc, mos,
|
||||
dd->dd_phys->dd_deleg_zapobj);
|
||||
zap_cursor_retrieve(&basezc, &baseza) == 0;
|
||||
zap_cursor_advance(&basezc)) {
|
||||
zap_cursor_t zc;
|
||||
zap_attribute_t za;
|
||||
nvlist_t *perms_nvp;
|
||||
|
||||
ASSERT(baseza.za_integer_length == 8);
|
||||
ASSERT(baseza.za_num_integers == 1);
|
||||
|
||||
VERIFY(nvlist_alloc(&perms_nvp,
|
||||
NV_UNIQUE_NAME, KM_SLEEP) == 0);
|
||||
for (zap_cursor_init(&zc, mos, baseza.za_first_integer);
|
||||
zap_cursor_retrieve(&zc, &za) == 0;
|
||||
zap_cursor_advance(&zc)) {
|
||||
VERIFY(nvlist_add_boolean(perms_nvp,
|
||||
za.za_name) == 0);
|
||||
}
|
||||
zap_cursor_fini(&zc);
|
||||
VERIFY(nvlist_add_nvlist(sp_nvp, baseza.za_name,
|
||||
perms_nvp) == 0);
|
||||
nvlist_free(perms_nvp);
|
||||
}
|
||||
|
||||
zap_cursor_fini(&basezc);
|
||||
|
||||
dsl_dir_name(dd, source);
|
||||
VERIFY(nvlist_add_nvlist(*nvp, source, sp_nvp) == 0);
|
||||
nvlist_free(sp_nvp);
|
||||
}
|
||||
rw_exit(&dp->dp_config_rwlock);
|
||||
|
||||
dsl_dir_close(startdd, FTAG);
|
||||
return (0);
|
||||
}
|
||||
|
||||
/*
|
||||
* Routines for dsl_deleg_access() -- access checking.
|
||||
*/
|
||||
typedef struct perm_set {
|
||||
avl_node_t p_node;
|
||||
boolean_t p_matched;
|
||||
char p_setname[ZFS_MAX_DELEG_NAME];
|
||||
} perm_set_t;
|
||||
|
||||
static int
|
||||
perm_set_compare(const void *arg1, const void *arg2)
|
||||
{
|
||||
const perm_set_t *node1 = arg1;
|
||||
const perm_set_t *node2 = arg2;
|
||||
int val;
|
||||
|
||||
val = strcmp(node1->p_setname, node2->p_setname);
|
||||
if (val == 0)
|
||||
return (0);
|
||||
return (val > 0 ? 1 : -1);
|
||||
}
|
||||
|
||||
/*
|
||||
* Determine whether a specified permission exists.
|
||||
*
|
||||
* First the base attribute has to be retrieved. i.e. ul$12
|
||||
* Once the base object has been retrieved the actual permission
|
||||
* is lookup up in the zap object the base object points to.
|
||||
*
|
||||
* Return 0 if permission exists, ENOENT if there is no whokey, EPERM if
|
||||
* there is no perm in that jumpobj.
|
||||
*/
|
||||
static int
|
||||
dsl_check_access(objset_t *mos, uint64_t zapobj,
|
||||
char type, char checkflag, void *valp, const char *perm)
|
||||
{
|
||||
int error;
|
||||
uint64_t jumpobj, zero;
|
||||
char whokey[ZFS_MAX_DELEG_NAME];
|
||||
|
||||
zfs_deleg_whokey(whokey, type, checkflag, valp);
|
||||
error = zap_lookup(mos, zapobj, whokey, 8, 1, &jumpobj);
|
||||
if (error == 0) {
|
||||
error = zap_lookup(mos, jumpobj, perm, 8, 1, &zero);
|
||||
if (error == ENOENT)
|
||||
error = EPERM;
|
||||
}
|
||||
return (error);
|
||||
}
|
||||
|
||||
/*
|
||||
* check a specified user/group for a requested permission
|
||||
*/
|
||||
static int
|
||||
dsl_check_user_access(objset_t *mos, uint64_t zapobj, const char *perm,
|
||||
int checkflag, cred_t *cr)
|
||||
{
|
||||
const gid_t *gids;
|
||||
int ngids;
|
||||
int i;
|
||||
uint64_t id;
|
||||
|
||||
/* check for user */
|
||||
id = crgetuid(cr);
|
||||
if (dsl_check_access(mos, zapobj,
|
||||
ZFS_DELEG_USER, checkflag, &id, perm) == 0)
|
||||
return (0);
|
||||
|
||||
/* check for users primary group */
|
||||
id = crgetgid(cr);
|
||||
if (dsl_check_access(mos, zapobj,
|
||||
ZFS_DELEG_GROUP, checkflag, &id, perm) == 0)
|
||||
return (0);
|
||||
|
||||
/* check for everyone entry */
|
||||
id = -1;
|
||||
if (dsl_check_access(mos, zapobj,
|
||||
ZFS_DELEG_EVERYONE, checkflag, &id, perm) == 0)
|
||||
return (0);
|
||||
|
||||
/* check each supplemental group user is a member of */
|
||||
ngids = crgetngroups(cr);
|
||||
gids = crgetgroups(cr);
|
||||
for (i = 0; i != ngids; i++) {
|
||||
id = gids[i];
|
||||
if (dsl_check_access(mos, zapobj,
|
||||
ZFS_DELEG_GROUP, checkflag, &id, perm) == 0)
|
||||
return (0);
|
||||
}
|
||||
|
||||
return (EPERM);
|
||||
}
|
||||
|
||||
/*
|
||||
* Iterate over the sets specified in the specified zapobj
|
||||
* and load them into the permsets avl tree.
|
||||
*/
|
||||
static int
|
||||
dsl_load_sets(objset_t *mos, uint64_t zapobj,
|
||||
char type, char checkflag, void *valp, avl_tree_t *avl)
|
||||
{
|
||||
zap_cursor_t zc;
|
||||
zap_attribute_t za;
|
||||
perm_set_t *permnode;
|
||||
avl_index_t idx;
|
||||
uint64_t jumpobj;
|
||||
int error;
|
||||
char whokey[ZFS_MAX_DELEG_NAME];
|
||||
|
||||
zfs_deleg_whokey(whokey, type, checkflag, valp);
|
||||
|
||||
error = zap_lookup(mos, zapobj, whokey, 8, 1, &jumpobj);
|
||||
if (error != 0)
|
||||
return (error);
|
||||
|
||||
for (zap_cursor_init(&zc, mos, jumpobj);
|
||||
zap_cursor_retrieve(&zc, &za) == 0;
|
||||
zap_cursor_advance(&zc)) {
|
||||
permnode = kmem_alloc(sizeof (perm_set_t), KM_SLEEP);
|
||||
(void) strlcpy(permnode->p_setname, za.za_name,
|
||||
sizeof (permnode->p_setname));
|
||||
permnode->p_matched = B_FALSE;
|
||||
|
||||
if (avl_find(avl, permnode, &idx) == NULL) {
|
||||
avl_insert(avl, permnode, idx);
|
||||
} else {
|
||||
kmem_free(permnode, sizeof (perm_set_t));
|
||||
}
|
||||
}
|
||||
zap_cursor_fini(&zc);
|
||||
return (0);
|
||||
}
|
||||
|
||||
/*
|
||||
* Load all permissions user based on cred belongs to.
|
||||
*/
|
||||
static void
|
||||
dsl_load_user_sets(objset_t *mos, uint64_t zapobj, avl_tree_t *avl,
|
||||
char checkflag, cred_t *cr)
|
||||
{
|
||||
const gid_t *gids;
|
||||
int ngids, i;
|
||||
uint64_t id;
|
||||
|
||||
id = crgetuid(cr);
|
||||
(void) dsl_load_sets(mos, zapobj,
|
||||
ZFS_DELEG_USER_SETS, checkflag, &id, avl);
|
||||
|
||||
id = crgetgid(cr);
|
||||
(void) dsl_load_sets(mos, zapobj,
|
||||
ZFS_DELEG_GROUP_SETS, checkflag, &id, avl);
|
||||
|
||||
(void) dsl_load_sets(mos, zapobj,
|
||||
ZFS_DELEG_EVERYONE_SETS, checkflag, NULL, avl);
|
||||
|
||||
ngids = crgetngroups(cr);
|
||||
gids = crgetgroups(cr);
|
||||
for (i = 0; i != ngids; i++) {
|
||||
id = gids[i];
|
||||
(void) dsl_load_sets(mos, zapobj,
|
||||
ZFS_DELEG_GROUP_SETS, checkflag, &id, avl);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Check if user has requested permission.
|
||||
*/
|
||||
int
|
||||
dsl_deleg_access_impl(dsl_dataset_t *ds, const char *perm, cred_t *cr)
|
||||
{
|
||||
dsl_dir_t *dd;
|
||||
dsl_pool_t *dp;
|
||||
void *cookie;
|
||||
int error;
|
||||
char checkflag;
|
||||
objset_t *mos;
|
||||
avl_tree_t permsets;
|
||||
perm_set_t *setnode;
|
||||
|
||||
dp = ds->ds_dir->dd_pool;
|
||||
mos = dp->dp_meta_objset;
|
||||
|
||||
if (dsl_delegation_on(mos) == B_FALSE)
|
||||
return (ECANCELED);
|
||||
|
||||
if (spa_version(dmu_objset_spa(dp->dp_meta_objset)) <
|
||||
SPA_VERSION_DELEGATED_PERMS)
|
||||
return (EPERM);
|
||||
|
||||
if (dsl_dataset_is_snapshot(ds)) {
|
||||
/*
|
||||
* Snapshots are treated as descendents only,
|
||||
* local permissions do not apply.
|
||||
*/
|
||||
checkflag = ZFS_DELEG_DESCENDENT;
|
||||
} else {
|
||||
checkflag = ZFS_DELEG_LOCAL;
|
||||
}
|
||||
|
||||
avl_create(&permsets, perm_set_compare, sizeof (perm_set_t),
|
||||
offsetof(perm_set_t, p_node));
|
||||
|
||||
rw_enter(&dp->dp_config_rwlock, RW_READER);
|
||||
for (dd = ds->ds_dir; dd != NULL; dd = dd->dd_parent,
|
||||
checkflag = ZFS_DELEG_DESCENDENT) {
|
||||
uint64_t zapobj;
|
||||
boolean_t expanded;
|
||||
|
||||
/*
|
||||
* If not in global zone then make sure
|
||||
* the zoned property is set
|
||||
*/
|
||||
if (!INGLOBALZONE(curproc)) {
|
||||
uint64_t zoned;
|
||||
|
||||
if (dsl_prop_get_dd(dd,
|
||||
zfs_prop_to_name(ZFS_PROP_ZONED),
|
||||
8, 1, &zoned, NULL, B_FALSE) != 0)
|
||||
break;
|
||||
if (!zoned)
|
||||
break;
|
||||
}
|
||||
zapobj = dd->dd_phys->dd_deleg_zapobj;
|
||||
|
||||
if (zapobj == 0)
|
||||
continue;
|
||||
|
||||
dsl_load_user_sets(mos, zapobj, &permsets, checkflag, cr);
|
||||
again:
|
||||
expanded = B_FALSE;
|
||||
for (setnode = avl_first(&permsets); setnode;
|
||||
setnode = AVL_NEXT(&permsets, setnode)) {
|
||||
if (setnode->p_matched == B_TRUE)
|
||||
continue;
|
||||
|
||||
/* See if this set directly grants this permission */
|
||||
error = dsl_check_access(mos, zapobj,
|
||||
ZFS_DELEG_NAMED_SET, 0, setnode->p_setname, perm);
|
||||
if (error == 0)
|
||||
goto success;
|
||||
if (error == EPERM)
|
||||
setnode->p_matched = B_TRUE;
|
||||
|
||||
/* See if this set includes other sets */
|
||||
error = dsl_load_sets(mos, zapobj,
|
||||
ZFS_DELEG_NAMED_SET_SETS, 0,
|
||||
setnode->p_setname, &permsets);
|
||||
if (error == 0)
|
||||
setnode->p_matched = expanded = B_TRUE;
|
||||
}
|
||||
/*
|
||||
* If we expanded any sets, that will define more sets,
|
||||
* which we need to check.
|
||||
*/
|
||||
if (expanded)
|
||||
goto again;
|
||||
|
||||
error = dsl_check_user_access(mos, zapobj, perm, checkflag, cr);
|
||||
if (error == 0)
|
||||
goto success;
|
||||
}
|
||||
error = EPERM;
|
||||
success:
|
||||
rw_exit(&dp->dp_config_rwlock);
|
||||
|
||||
cookie = NULL;
|
||||
while ((setnode = avl_destroy_nodes(&permsets, &cookie)) != NULL)
|
||||
kmem_free(setnode, sizeof (perm_set_t));
|
||||
|
||||
return (error);
|
||||
}
|
||||
|
||||
int
|
||||
dsl_deleg_access(const char *dsname, const char *perm, cred_t *cr)
|
||||
{
|
||||
dsl_dataset_t *ds;
|
||||
int error;
|
||||
|
||||
error = dsl_dataset_hold(dsname, FTAG, &ds);
|
||||
if (error)
|
||||
return (error);
|
||||
|
||||
error = dsl_deleg_access_impl(ds, perm, cr);
|
||||
dsl_dataset_rele(ds, FTAG);
|
||||
|
||||
return (error);
|
||||
}
|
||||
|
||||
/*
|
||||
* Other routines.
|
||||
*/
|
||||
|
||||
static void
|
||||
copy_create_perms(dsl_dir_t *dd, uint64_t pzapobj,
|
||||
boolean_t dosets, uint64_t uid, dmu_tx_t *tx)
|
||||
{
|
||||
objset_t *mos = dd->dd_pool->dp_meta_objset;
|
||||
uint64_t jumpobj, pjumpobj;
|
||||
uint64_t zapobj = dd->dd_phys->dd_deleg_zapobj;
|
||||
zap_cursor_t zc;
|
||||
zap_attribute_t za;
|
||||
char whokey[ZFS_MAX_DELEG_NAME];
|
||||
|
||||
zfs_deleg_whokey(whokey,
|
||||
dosets ? ZFS_DELEG_CREATE_SETS : ZFS_DELEG_CREATE,
|
||||
ZFS_DELEG_LOCAL, NULL);
|
||||
if (zap_lookup(mos, pzapobj, whokey, 8, 1, &pjumpobj) != 0)
|
||||
return;
|
||||
|
||||
if (zapobj == 0) {
|
||||
dmu_buf_will_dirty(dd->dd_dbuf, tx);
|
||||
zapobj = dd->dd_phys->dd_deleg_zapobj = zap_create(mos,
|
||||
DMU_OT_DSL_PERMS, DMU_OT_NONE, 0, tx);
|
||||
}
|
||||
|
||||
zfs_deleg_whokey(whokey,
|
||||
dosets ? ZFS_DELEG_USER_SETS : ZFS_DELEG_USER,
|
||||
ZFS_DELEG_LOCAL, &uid);
|
||||
if (zap_lookup(mos, zapobj, whokey, 8, 1, &jumpobj) == ENOENT) {
|
||||
jumpobj = zap_create(mos, DMU_OT_DSL_PERMS, DMU_OT_NONE, 0, tx);
|
||||
VERIFY(zap_add(mos, zapobj, whokey, 8, 1, &jumpobj, tx) == 0);
|
||||
}
|
||||
|
||||
for (zap_cursor_init(&zc, mos, pjumpobj);
|
||||
zap_cursor_retrieve(&zc, &za) == 0;
|
||||
zap_cursor_advance(&zc)) {
|
||||
uint64_t zero = 0;
|
||||
ASSERT(za.za_integer_length == 8 && za.za_num_integers == 1);
|
||||
|
||||
VERIFY(zap_update(mos, jumpobj, za.za_name,
|
||||
8, 1, &zero, tx) == 0);
|
||||
}
|
||||
zap_cursor_fini(&zc);
|
||||
}
|
||||
|
||||
/*
|
||||
* set all create time permission on new dataset.
|
||||
*/
|
||||
void
|
||||
dsl_deleg_set_create_perms(dsl_dir_t *sdd, dmu_tx_t *tx, cred_t *cr)
|
||||
{
|
||||
dsl_dir_t *dd;
|
||||
uint64_t uid = crgetuid(cr);
|
||||
|
||||
if (spa_version(dmu_objset_spa(sdd->dd_pool->dp_meta_objset)) <
|
||||
SPA_VERSION_DELEGATED_PERMS)
|
||||
return;
|
||||
|
||||
for (dd = sdd->dd_parent; dd != NULL; dd = dd->dd_parent) {
|
||||
uint64_t pzapobj = dd->dd_phys->dd_deleg_zapobj;
|
||||
|
||||
if (pzapobj == 0)
|
||||
continue;
|
||||
|
||||
copy_create_perms(sdd, pzapobj, B_FALSE, uid, tx);
|
||||
copy_create_perms(sdd, pzapobj, B_TRUE, uid, tx);
|
||||
}
|
||||
}
|
||||
|
||||
int
|
||||
dsl_deleg_destroy(objset_t *mos, uint64_t zapobj, dmu_tx_t *tx)
|
||||
{
|
||||
zap_cursor_t zc;
|
||||
zap_attribute_t za;
|
||||
|
||||
if (zapobj == 0)
|
||||
return (0);
|
||||
|
||||
for (zap_cursor_init(&zc, mos, zapobj);
|
||||
zap_cursor_retrieve(&zc, &za) == 0;
|
||||
zap_cursor_advance(&zc)) {
|
||||
ASSERT(za.za_integer_length == 8 && za.za_num_integers == 1);
|
||||
VERIFY(0 == zap_destroy(mos, za.za_first_integer, tx));
|
||||
}
|
||||
zap_cursor_fini(&zc);
|
||||
VERIFY(0 == zap_destroy(mos, zapobj, tx));
|
||||
return (0);
|
||||
}
|
||||
|
||||
boolean_t
|
||||
dsl_delegation_on(objset_t *os)
|
||||
{
|
||||
return (!!spa_delegation(os->os_spa));
|
||||
}
|
1416
uts/common/fs/zfs/dsl_dir.c
Normal file
1416
uts/common/fs/zfs/dsl_dir.c
Normal file
File diff suppressed because it is too large
Load Diff
848
uts/common/fs/zfs/dsl_pool.c
Normal file
848
uts/common/fs/zfs/dsl_pool.c
Normal file
@ -0,0 +1,848 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#include <sys/dsl_pool.h>
|
||||
#include <sys/dsl_dataset.h>
|
||||
#include <sys/dsl_prop.h>
|
||||
#include <sys/dsl_dir.h>
|
||||
#include <sys/dsl_synctask.h>
|
||||
#include <sys/dsl_scan.h>
|
||||
#include <sys/dnode.h>
|
||||
#include <sys/dmu_tx.h>
|
||||
#include <sys/dmu_objset.h>
|
||||
#include <sys/arc.h>
|
||||
#include <sys/zap.h>
|
||||
#include <sys/zio.h>
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/fs/zfs.h>
|
||||
#include <sys/zfs_znode.h>
|
||||
#include <sys/spa_impl.h>
|
||||
#include <sys/dsl_deadlist.h>
|
||||
|
||||
int zfs_no_write_throttle = 0;
|
||||
int zfs_write_limit_shift = 3; /* 1/8th of physical memory */
|
||||
int zfs_txg_synctime_ms = 1000; /* target millisecs to sync a txg */
|
||||
|
||||
uint64_t zfs_write_limit_min = 32 << 20; /* min write limit is 32MB */
|
||||
uint64_t zfs_write_limit_max = 0; /* max data payload per txg */
|
||||
uint64_t zfs_write_limit_inflated = 0;
|
||||
uint64_t zfs_write_limit_override = 0;
|
||||
|
||||
kmutex_t zfs_write_limit_lock;
|
||||
|
||||
static pgcnt_t old_physmem = 0;
|
||||
|
||||
int
|
||||
dsl_pool_open_special_dir(dsl_pool_t *dp, const char *name, dsl_dir_t **ddp)
|
||||
{
|
||||
uint64_t obj;
|
||||
int err;
|
||||
|
||||
err = zap_lookup(dp->dp_meta_objset,
|
||||
dp->dp_root_dir->dd_phys->dd_child_dir_zapobj,
|
||||
name, sizeof (obj), 1, &obj);
|
||||
if (err)
|
||||
return (err);
|
||||
|
||||
return (dsl_dir_open_obj(dp, obj, name, dp, ddp));
|
||||
}
|
||||
|
||||
static dsl_pool_t *
|
||||
dsl_pool_open_impl(spa_t *spa, uint64_t txg)
|
||||
{
|
||||
dsl_pool_t *dp;
|
||||
blkptr_t *bp = spa_get_rootblkptr(spa);
|
||||
|
||||
dp = kmem_zalloc(sizeof (dsl_pool_t), KM_SLEEP);
|
||||
dp->dp_spa = spa;
|
||||
dp->dp_meta_rootbp = *bp;
|
||||
rw_init(&dp->dp_config_rwlock, NULL, RW_DEFAULT, NULL);
|
||||
dp->dp_write_limit = zfs_write_limit_min;
|
||||
txg_init(dp, txg);
|
||||
|
||||
txg_list_create(&dp->dp_dirty_datasets,
|
||||
offsetof(dsl_dataset_t, ds_dirty_link));
|
||||
txg_list_create(&dp->dp_dirty_dirs,
|
||||
offsetof(dsl_dir_t, dd_dirty_link));
|
||||
txg_list_create(&dp->dp_sync_tasks,
|
||||
offsetof(dsl_sync_task_group_t, dstg_node));
|
||||
list_create(&dp->dp_synced_datasets, sizeof (dsl_dataset_t),
|
||||
offsetof(dsl_dataset_t, ds_synced_link));
|
||||
|
||||
mutex_init(&dp->dp_lock, NULL, MUTEX_DEFAULT, NULL);
|
||||
|
||||
dp->dp_vnrele_taskq = taskq_create("zfs_vn_rele_taskq", 1, minclsyspri,
|
||||
1, 4, 0);
|
||||
|
||||
return (dp);
|
||||
}
|
||||
|
||||
int
|
||||
dsl_pool_open(spa_t *spa, uint64_t txg, dsl_pool_t **dpp)
|
||||
{
|
||||
int err;
|
||||
dsl_pool_t *dp = dsl_pool_open_impl(spa, txg);
|
||||
dsl_dir_t *dd;
|
||||
dsl_dataset_t *ds;
|
||||
uint64_t obj;
|
||||
|
||||
rw_enter(&dp->dp_config_rwlock, RW_WRITER);
|
||||
err = dmu_objset_open_impl(spa, NULL, &dp->dp_meta_rootbp,
|
||||
&dp->dp_meta_objset);
|
||||
if (err)
|
||||
goto out;
|
||||
|
||||
err = zap_lookup(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
|
||||
DMU_POOL_ROOT_DATASET, sizeof (uint64_t), 1,
|
||||
&dp->dp_root_dir_obj);
|
||||
if (err)
|
||||
goto out;
|
||||
|
||||
err = dsl_dir_open_obj(dp, dp->dp_root_dir_obj,
|
||||
NULL, dp, &dp->dp_root_dir);
|
||||
if (err)
|
||||
goto out;
|
||||
|
||||
err = dsl_pool_open_special_dir(dp, MOS_DIR_NAME, &dp->dp_mos_dir);
|
||||
if (err)
|
||||
goto out;
|
||||
|
||||
if (spa_version(spa) >= SPA_VERSION_ORIGIN) {
|
||||
err = dsl_pool_open_special_dir(dp, ORIGIN_DIR_NAME, &dd);
|
||||
if (err)
|
||||
goto out;
|
||||
err = dsl_dataset_hold_obj(dp, dd->dd_phys->dd_head_dataset_obj,
|
||||
FTAG, &ds);
|
||||
if (err == 0) {
|
||||
err = dsl_dataset_hold_obj(dp,
|
||||
ds->ds_phys->ds_prev_snap_obj, dp,
|
||||
&dp->dp_origin_snap);
|
||||
dsl_dataset_rele(ds, FTAG);
|
||||
}
|
||||
dsl_dir_close(dd, dp);
|
||||
if (err)
|
||||
goto out;
|
||||
}
|
||||
|
||||
if (spa_version(spa) >= SPA_VERSION_DEADLISTS) {
|
||||
err = dsl_pool_open_special_dir(dp, FREE_DIR_NAME,
|
||||
&dp->dp_free_dir);
|
||||
if (err)
|
||||
goto out;
|
||||
|
||||
err = zap_lookup(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
|
||||
DMU_POOL_FREE_BPOBJ, sizeof (uint64_t), 1, &obj);
|
||||
if (err)
|
||||
goto out;
|
||||
VERIFY3U(0, ==, bpobj_open(&dp->dp_free_bpobj,
|
||||
dp->dp_meta_objset, obj));
|
||||
}
|
||||
|
||||
err = zap_lookup(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
|
||||
DMU_POOL_TMP_USERREFS, sizeof (uint64_t), 1,
|
||||
&dp->dp_tmp_userrefs_obj);
|
||||
if (err == ENOENT)
|
||||
err = 0;
|
||||
if (err)
|
||||
goto out;
|
||||
|
||||
err = dsl_scan_init(dp, txg);
|
||||
|
||||
out:
|
||||
rw_exit(&dp->dp_config_rwlock);
|
||||
if (err)
|
||||
dsl_pool_close(dp);
|
||||
else
|
||||
*dpp = dp;
|
||||
|
||||
return (err);
|
||||
}
|
||||
|
||||
void
|
||||
dsl_pool_close(dsl_pool_t *dp)
|
||||
{
|
||||
/* drop our references from dsl_pool_open() */
|
||||
|
||||
/*
|
||||
* Since we held the origin_snap from "syncing" context (which
|
||||
* includes pool-opening context), it actually only got a "ref"
|
||||
* and not a hold, so just drop that here.
|
||||
*/
|
||||
if (dp->dp_origin_snap)
|
||||
dsl_dataset_drop_ref(dp->dp_origin_snap, dp);
|
||||
if (dp->dp_mos_dir)
|
||||
dsl_dir_close(dp->dp_mos_dir, dp);
|
||||
if (dp->dp_free_dir)
|
||||
dsl_dir_close(dp->dp_free_dir, dp);
|
||||
if (dp->dp_root_dir)
|
||||
dsl_dir_close(dp->dp_root_dir, dp);
|
||||
|
||||
bpobj_close(&dp->dp_free_bpobj);
|
||||
|
||||
/* undo the dmu_objset_open_impl(mos) from dsl_pool_open() */
|
||||
if (dp->dp_meta_objset)
|
||||
dmu_objset_evict(dp->dp_meta_objset);
|
||||
|
||||
txg_list_destroy(&dp->dp_dirty_datasets);
|
||||
txg_list_destroy(&dp->dp_sync_tasks);
|
||||
txg_list_destroy(&dp->dp_dirty_dirs);
|
||||
list_destroy(&dp->dp_synced_datasets);
|
||||
|
||||
arc_flush(dp->dp_spa);
|
||||
txg_fini(dp);
|
||||
dsl_scan_fini(dp);
|
||||
rw_destroy(&dp->dp_config_rwlock);
|
||||
mutex_destroy(&dp->dp_lock);
|
||||
taskq_destroy(dp->dp_vnrele_taskq);
|
||||
if (dp->dp_blkstats)
|
||||
kmem_free(dp->dp_blkstats, sizeof (zfs_all_blkstats_t));
|
||||
kmem_free(dp, sizeof (dsl_pool_t));
|
||||
}
|
||||
|
||||
dsl_pool_t *
|
||||
dsl_pool_create(spa_t *spa, nvlist_t *zplprops, uint64_t txg)
|
||||
{
|
||||
int err;
|
||||
dsl_pool_t *dp = dsl_pool_open_impl(spa, txg);
|
||||
dmu_tx_t *tx = dmu_tx_create_assigned(dp, txg);
|
||||
objset_t *os;
|
||||
dsl_dataset_t *ds;
|
||||
uint64_t obj;
|
||||
|
||||
/* create and open the MOS (meta-objset) */
|
||||
dp->dp_meta_objset = dmu_objset_create_impl(spa,
|
||||
NULL, &dp->dp_meta_rootbp, DMU_OST_META, tx);
|
||||
|
||||
/* create the pool directory */
|
||||
err = zap_create_claim(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
|
||||
DMU_OT_OBJECT_DIRECTORY, DMU_OT_NONE, 0, tx);
|
||||
ASSERT3U(err, ==, 0);
|
||||
|
||||
/* Initialize scan structures */
|
||||
VERIFY3U(0, ==, dsl_scan_init(dp, txg));
|
||||
|
||||
/* create and open the root dir */
|
||||
dp->dp_root_dir_obj = dsl_dir_create_sync(dp, NULL, NULL, tx);
|
||||
VERIFY(0 == dsl_dir_open_obj(dp, dp->dp_root_dir_obj,
|
||||
NULL, dp, &dp->dp_root_dir));
|
||||
|
||||
/* create and open the meta-objset dir */
|
||||
(void) dsl_dir_create_sync(dp, dp->dp_root_dir, MOS_DIR_NAME, tx);
|
||||
VERIFY(0 == dsl_pool_open_special_dir(dp,
|
||||
MOS_DIR_NAME, &dp->dp_mos_dir));
|
||||
|
||||
if (spa_version(spa) >= SPA_VERSION_DEADLISTS) {
|
||||
/* create and open the free dir */
|
||||
(void) dsl_dir_create_sync(dp, dp->dp_root_dir,
|
||||
FREE_DIR_NAME, tx);
|
||||
VERIFY(0 == dsl_pool_open_special_dir(dp,
|
||||
FREE_DIR_NAME, &dp->dp_free_dir));
|
||||
|
||||
/* create and open the free_bplist */
|
||||
obj = bpobj_alloc(dp->dp_meta_objset, SPA_MAXBLOCKSIZE, tx);
|
||||
VERIFY(zap_add(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
|
||||
DMU_POOL_FREE_BPOBJ, sizeof (uint64_t), 1, &obj, tx) == 0);
|
||||
VERIFY3U(0, ==, bpobj_open(&dp->dp_free_bpobj,
|
||||
dp->dp_meta_objset, obj));
|
||||
}
|
||||
|
||||
if (spa_version(spa) >= SPA_VERSION_DSL_SCRUB)
|
||||
dsl_pool_create_origin(dp, tx);
|
||||
|
||||
/* create the root dataset */
|
||||
obj = dsl_dataset_create_sync_dd(dp->dp_root_dir, NULL, 0, tx);
|
||||
|
||||
/* create the root objset */
|
||||
VERIFY(0 == dsl_dataset_hold_obj(dp, obj, FTAG, &ds));
|
||||
os = dmu_objset_create_impl(dp->dp_spa, ds,
|
||||
dsl_dataset_get_blkptr(ds), DMU_OST_ZFS, tx);
|
||||
#ifdef _KERNEL
|
||||
zfs_create_fs(os, kcred, zplprops, tx);
|
||||
#endif
|
||||
dsl_dataset_rele(ds, FTAG);
|
||||
|
||||
dmu_tx_commit(tx);
|
||||
|
||||
return (dp);
|
||||
}
|
||||
|
||||
static int
|
||||
deadlist_enqueue_cb(void *arg, const blkptr_t *bp, dmu_tx_t *tx)
|
||||
{
|
||||
dsl_deadlist_t *dl = arg;
|
||||
dsl_deadlist_insert(dl, bp, tx);
|
||||
return (0);
|
||||
}
|
||||
|
||||
void
|
||||
dsl_pool_sync(dsl_pool_t *dp, uint64_t txg)
|
||||
{
|
||||
zio_t *zio;
|
||||
dmu_tx_t *tx;
|
||||
dsl_dir_t *dd;
|
||||
dsl_dataset_t *ds;
|
||||
dsl_sync_task_group_t *dstg;
|
||||
objset_t *mos = dp->dp_meta_objset;
|
||||
hrtime_t start, write_time;
|
||||
uint64_t data_written;
|
||||
int err;
|
||||
|
||||
/*
|
||||
* We need to copy dp_space_towrite() before doing
|
||||
* dsl_sync_task_group_sync(), because
|
||||
* dsl_dataset_snapshot_reserve_space() will increase
|
||||
* dp_space_towrite but not actually write anything.
|
||||
*/
|
||||
data_written = dp->dp_space_towrite[txg & TXG_MASK];
|
||||
|
||||
tx = dmu_tx_create_assigned(dp, txg);
|
||||
|
||||
dp->dp_read_overhead = 0;
|
||||
start = gethrtime();
|
||||
|
||||
zio = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED);
|
||||
while (ds = txg_list_remove(&dp->dp_dirty_datasets, txg)) {
|
||||
/*
|
||||
* We must not sync any non-MOS datasets twice, because
|
||||
* we may have taken a snapshot of them. However, we
|
||||
* may sync newly-created datasets on pass 2.
|
||||
*/
|
||||
ASSERT(!list_link_active(&ds->ds_synced_link));
|
||||
list_insert_tail(&dp->dp_synced_datasets, ds);
|
||||
dsl_dataset_sync(ds, zio, tx);
|
||||
}
|
||||
DTRACE_PROBE(pool_sync__1setup);
|
||||
err = zio_wait(zio);
|
||||
|
||||
write_time = gethrtime() - start;
|
||||
ASSERT(err == 0);
|
||||
DTRACE_PROBE(pool_sync__2rootzio);
|
||||
|
||||
for (ds = list_head(&dp->dp_synced_datasets); ds;
|
||||
ds = list_next(&dp->dp_synced_datasets, ds))
|
||||
dmu_objset_do_userquota_updates(ds->ds_objset, tx);
|
||||
|
||||
/*
|
||||
* Sync the datasets again to push out the changes due to
|
||||
* userspace updates. This must be done before we process the
|
||||
* sync tasks, because that could cause a snapshot of a dataset
|
||||
* whose ds_bp will be rewritten when we do this 2nd sync.
|
||||
*/
|
||||
zio = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED);
|
||||
while (ds = txg_list_remove(&dp->dp_dirty_datasets, txg)) {
|
||||
ASSERT(list_link_active(&ds->ds_synced_link));
|
||||
dmu_buf_rele(ds->ds_dbuf, ds);
|
||||
dsl_dataset_sync(ds, zio, tx);
|
||||
}
|
||||
err = zio_wait(zio);
|
||||
|
||||
/*
|
||||
* Move dead blocks from the pending deadlist to the on-disk
|
||||
* deadlist.
|
||||
*/
|
||||
for (ds = list_head(&dp->dp_synced_datasets); ds;
|
||||
ds = list_next(&dp->dp_synced_datasets, ds)) {
|
||||
bplist_iterate(&ds->ds_pending_deadlist,
|
||||
deadlist_enqueue_cb, &ds->ds_deadlist, tx);
|
||||
}
|
||||
|
||||
while (dstg = txg_list_remove(&dp->dp_sync_tasks, txg)) {
|
||||
/*
|
||||
* No more sync tasks should have been added while we
|
||||
* were syncing.
|
||||
*/
|
||||
ASSERT(spa_sync_pass(dp->dp_spa) == 1);
|
||||
dsl_sync_task_group_sync(dstg, tx);
|
||||
}
|
||||
DTRACE_PROBE(pool_sync__3task);
|
||||
|
||||
start = gethrtime();
|
||||
while (dd = txg_list_remove(&dp->dp_dirty_dirs, txg))
|
||||
dsl_dir_sync(dd, tx);
|
||||
write_time += gethrtime() - start;
|
||||
|
||||
start = gethrtime();
|
||||
if (list_head(&mos->os_dirty_dnodes[txg & TXG_MASK]) != NULL ||
|
||||
list_head(&mos->os_free_dnodes[txg & TXG_MASK]) != NULL) {
|
||||
zio = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED);
|
||||
dmu_objset_sync(mos, zio, tx);
|
||||
err = zio_wait(zio);
|
||||
ASSERT(err == 0);
|
||||
dprintf_bp(&dp->dp_meta_rootbp, "meta objset rootbp is %s", "");
|
||||
spa_set_rootblkptr(dp->dp_spa, &dp->dp_meta_rootbp);
|
||||
}
|
||||
write_time += gethrtime() - start;
|
||||
DTRACE_PROBE2(pool_sync__4io, hrtime_t, write_time,
|
||||
hrtime_t, dp->dp_read_overhead);
|
||||
write_time -= dp->dp_read_overhead;
|
||||
|
||||
dmu_tx_commit(tx);
|
||||
|
||||
dp->dp_space_towrite[txg & TXG_MASK] = 0;
|
||||
ASSERT(dp->dp_tempreserved[txg & TXG_MASK] == 0);
|
||||
|
||||
/*
|
||||
* If the write limit max has not been explicitly set, set it
|
||||
* to a fraction of available physical memory (default 1/8th).
|
||||
* Note that we must inflate the limit because the spa
|
||||
* inflates write sizes to account for data replication.
|
||||
* Check this each sync phase to catch changing memory size.
|
||||
*/
|
||||
if (physmem != old_physmem && zfs_write_limit_shift) {
|
||||
mutex_enter(&zfs_write_limit_lock);
|
||||
old_physmem = physmem;
|
||||
zfs_write_limit_max = ptob(physmem) >> zfs_write_limit_shift;
|
||||
zfs_write_limit_inflated = MAX(zfs_write_limit_min,
|
||||
spa_get_asize(dp->dp_spa, zfs_write_limit_max));
|
||||
mutex_exit(&zfs_write_limit_lock);
|
||||
}
|
||||
|
||||
/*
|
||||
* Attempt to keep the sync time consistent by adjusting the
|
||||
* amount of write traffic allowed into each transaction group.
|
||||
* Weight the throughput calculation towards the current value:
|
||||
* thru = 3/4 old_thru + 1/4 new_thru
|
||||
*
|
||||
* Note: write_time is in nanosecs, so write_time/MICROSEC
|
||||
* yields millisecs
|
||||
*/
|
||||
ASSERT(zfs_write_limit_min > 0);
|
||||
if (data_written > zfs_write_limit_min / 8 && write_time > MICROSEC) {
|
||||
uint64_t throughput = data_written / (write_time / MICROSEC);
|
||||
|
||||
if (dp->dp_throughput)
|
||||
dp->dp_throughput = throughput / 4 +
|
||||
3 * dp->dp_throughput / 4;
|
||||
else
|
||||
dp->dp_throughput = throughput;
|
||||
dp->dp_write_limit = MIN(zfs_write_limit_inflated,
|
||||
MAX(zfs_write_limit_min,
|
||||
dp->dp_throughput * zfs_txg_synctime_ms));
|
||||
}
|
||||
}
|
||||
|
||||
void
|
||||
dsl_pool_sync_done(dsl_pool_t *dp, uint64_t txg)
|
||||
{
|
||||
dsl_dataset_t *ds;
|
||||
objset_t *os;
|
||||
|
||||
while (ds = list_head(&dp->dp_synced_datasets)) {
|
||||
list_remove(&dp->dp_synced_datasets, ds);
|
||||
os = ds->ds_objset;
|
||||
zil_clean(os->os_zil, txg);
|
||||
ASSERT(!dmu_objset_is_dirty(os, txg));
|
||||
dmu_buf_rele(ds->ds_dbuf, ds);
|
||||
}
|
||||
ASSERT(!dmu_objset_is_dirty(dp->dp_meta_objset, txg));
|
||||
}
|
||||
|
||||
/*
|
||||
* TRUE if the current thread is the tx_sync_thread or if we
|
||||
* are being called from SPA context during pool initialization.
|
||||
*/
|
||||
int
|
||||
dsl_pool_sync_context(dsl_pool_t *dp)
|
||||
{
|
||||
return (curthread == dp->dp_tx.tx_sync_thread ||
|
||||
spa_get_dsl(dp->dp_spa) == NULL);
|
||||
}
|
||||
|
||||
uint64_t
|
||||
dsl_pool_adjustedsize(dsl_pool_t *dp, boolean_t netfree)
|
||||
{
|
||||
uint64_t space, resv;
|
||||
|
||||
/*
|
||||
* Reserve about 1.6% (1/64), or at least 32MB, for allocation
|
||||
* efficiency.
|
||||
* XXX The intent log is not accounted for, so it must fit
|
||||
* within this slop.
|
||||
*
|
||||
* If we're trying to assess whether it's OK to do a free,
|
||||
* cut the reservation in half to allow forward progress
|
||||
* (e.g. make it possible to rm(1) files from a full pool).
|
||||
*/
|
||||
space = spa_get_dspace(dp->dp_spa);
|
||||
resv = MAX(space >> 6, SPA_MINDEVSIZE >> 1);
|
||||
if (netfree)
|
||||
resv >>= 1;
|
||||
|
||||
return (space - resv);
|
||||
}
|
||||
|
||||
int
|
||||
dsl_pool_tempreserve_space(dsl_pool_t *dp, uint64_t space, dmu_tx_t *tx)
|
||||
{
|
||||
uint64_t reserved = 0;
|
||||
uint64_t write_limit = (zfs_write_limit_override ?
|
||||
zfs_write_limit_override : dp->dp_write_limit);
|
||||
|
||||
if (zfs_no_write_throttle) {
|
||||
atomic_add_64(&dp->dp_tempreserved[tx->tx_txg & TXG_MASK],
|
||||
space);
|
||||
return (0);
|
||||
}
|
||||
|
||||
/*
|
||||
* Check to see if we have exceeded the maximum allowed IO for
|
||||
* this transaction group. We can do this without locks since
|
||||
* a little slop here is ok. Note that we do the reserved check
|
||||
* with only half the requested reserve: this is because the
|
||||
* reserve requests are worst-case, and we really don't want to
|
||||
* throttle based off of worst-case estimates.
|
||||
*/
|
||||
if (write_limit > 0) {
|
||||
reserved = dp->dp_space_towrite[tx->tx_txg & TXG_MASK]
|
||||
+ dp->dp_tempreserved[tx->tx_txg & TXG_MASK] / 2;
|
||||
|
||||
if (reserved && reserved > write_limit)
|
||||
return (ERESTART);
|
||||
}
|
||||
|
||||
atomic_add_64(&dp->dp_tempreserved[tx->tx_txg & TXG_MASK], space);
|
||||
|
||||
/*
|
||||
* If this transaction group is over 7/8ths capacity, delay
|
||||
* the caller 1 clock tick. This will slow down the "fill"
|
||||
* rate until the sync process can catch up with us.
|
||||
*/
|
||||
if (reserved && reserved > (write_limit - (write_limit >> 3)))
|
||||
txg_delay(dp, tx->tx_txg, 1);
|
||||
|
||||
return (0);
|
||||
}
|
||||
|
||||
void
|
||||
dsl_pool_tempreserve_clear(dsl_pool_t *dp, int64_t space, dmu_tx_t *tx)
|
||||
{
|
||||
ASSERT(dp->dp_tempreserved[tx->tx_txg & TXG_MASK] >= space);
|
||||
atomic_add_64(&dp->dp_tempreserved[tx->tx_txg & TXG_MASK], -space);
|
||||
}
|
||||
|
||||
void
|
||||
dsl_pool_memory_pressure(dsl_pool_t *dp)
|
||||
{
|
||||
uint64_t space_inuse = 0;
|
||||
int i;
|
||||
|
||||
if (dp->dp_write_limit == zfs_write_limit_min)
|
||||
return;
|
||||
|
||||
for (i = 0; i < TXG_SIZE; i++) {
|
||||
space_inuse += dp->dp_space_towrite[i];
|
||||
space_inuse += dp->dp_tempreserved[i];
|
||||
}
|
||||
dp->dp_write_limit = MAX(zfs_write_limit_min,
|
||||
MIN(dp->dp_write_limit, space_inuse / 4));
|
||||
}
|
||||
|
||||
void
|
||||
dsl_pool_willuse_space(dsl_pool_t *dp, int64_t space, dmu_tx_t *tx)
|
||||
{
|
||||
if (space > 0) {
|
||||
mutex_enter(&dp->dp_lock);
|
||||
dp->dp_space_towrite[tx->tx_txg & TXG_MASK] += space;
|
||||
mutex_exit(&dp->dp_lock);
|
||||
}
|
||||
}
|
||||
|
||||
/* ARGSUSED */
|
||||
static int
|
||||
upgrade_clones_cb(spa_t *spa, uint64_t dsobj, const char *dsname, void *arg)
|
||||
{
|
||||
dmu_tx_t *tx = arg;
|
||||
dsl_dataset_t *ds, *prev = NULL;
|
||||
int err;
|
||||
dsl_pool_t *dp = spa_get_dsl(spa);
|
||||
|
||||
err = dsl_dataset_hold_obj(dp, dsobj, FTAG, &ds);
|
||||
if (err)
|
||||
return (err);
|
||||
|
||||
while (ds->ds_phys->ds_prev_snap_obj != 0) {
|
||||
err = dsl_dataset_hold_obj(dp, ds->ds_phys->ds_prev_snap_obj,
|
||||
FTAG, &prev);
|
||||
if (err) {
|
||||
dsl_dataset_rele(ds, FTAG);
|
||||
return (err);
|
||||
}
|
||||
|
||||
if (prev->ds_phys->ds_next_snap_obj != ds->ds_object)
|
||||
break;
|
||||
dsl_dataset_rele(ds, FTAG);
|
||||
ds = prev;
|
||||
prev = NULL;
|
||||
}
|
||||
|
||||
if (prev == NULL) {
|
||||
prev = dp->dp_origin_snap;
|
||||
|
||||
/*
|
||||
* The $ORIGIN can't have any data, or the accounting
|
||||
* will be wrong.
|
||||
*/
|
||||
ASSERT(prev->ds_phys->ds_bp.blk_birth == 0);
|
||||
|
||||
/* The origin doesn't get attached to itself */
|
||||
if (ds->ds_object == prev->ds_object) {
|
||||
dsl_dataset_rele(ds, FTAG);
|
||||
return (0);
|
||||
}
|
||||
|
||||
dmu_buf_will_dirty(ds->ds_dbuf, tx);
|
||||
ds->ds_phys->ds_prev_snap_obj = prev->ds_object;
|
||||
ds->ds_phys->ds_prev_snap_txg = prev->ds_phys->ds_creation_txg;
|
||||
|
||||
dmu_buf_will_dirty(ds->ds_dir->dd_dbuf, tx);
|
||||
ds->ds_dir->dd_phys->dd_origin_obj = prev->ds_object;
|
||||
|
||||
dmu_buf_will_dirty(prev->ds_dbuf, tx);
|
||||
prev->ds_phys->ds_num_children++;
|
||||
|
||||
if (ds->ds_phys->ds_next_snap_obj == 0) {
|
||||
ASSERT(ds->ds_prev == NULL);
|
||||
VERIFY(0 == dsl_dataset_hold_obj(dp,
|
||||
ds->ds_phys->ds_prev_snap_obj, ds, &ds->ds_prev));
|
||||
}
|
||||
}
|
||||
|
||||
ASSERT(ds->ds_dir->dd_phys->dd_origin_obj == prev->ds_object);
|
||||
ASSERT(ds->ds_phys->ds_prev_snap_obj == prev->ds_object);
|
||||
|
||||
if (prev->ds_phys->ds_next_clones_obj == 0) {
|
||||
dmu_buf_will_dirty(prev->ds_dbuf, tx);
|
||||
prev->ds_phys->ds_next_clones_obj =
|
||||
zap_create(dp->dp_meta_objset,
|
||||
DMU_OT_NEXT_CLONES, DMU_OT_NONE, 0, tx);
|
||||
}
|
||||
VERIFY(0 == zap_add_int(dp->dp_meta_objset,
|
||||
prev->ds_phys->ds_next_clones_obj, ds->ds_object, tx));
|
||||
|
||||
dsl_dataset_rele(ds, FTAG);
|
||||
if (prev != dp->dp_origin_snap)
|
||||
dsl_dataset_rele(prev, FTAG);
|
||||
return (0);
|
||||
}
|
||||
|
||||
void
|
||||
dsl_pool_upgrade_clones(dsl_pool_t *dp, dmu_tx_t *tx)
|
||||
{
|
||||
ASSERT(dmu_tx_is_syncing(tx));
|
||||
ASSERT(dp->dp_origin_snap != NULL);
|
||||
|
||||
VERIFY3U(0, ==, dmu_objset_find_spa(dp->dp_spa, NULL, upgrade_clones_cb,
|
||||
tx, DS_FIND_CHILDREN));
|
||||
}
|
||||
|
||||
/* ARGSUSED */
|
||||
static int
|
||||
upgrade_dir_clones_cb(spa_t *spa, uint64_t dsobj, const char *dsname, void *arg)
|
||||
{
|
||||
dmu_tx_t *tx = arg;
|
||||
dsl_dataset_t *ds;
|
||||
dsl_pool_t *dp = spa_get_dsl(spa);
|
||||
objset_t *mos = dp->dp_meta_objset;
|
||||
|
||||
VERIFY3U(0, ==, dsl_dataset_hold_obj(dp, dsobj, FTAG, &ds));
|
||||
|
||||
if (ds->ds_dir->dd_phys->dd_origin_obj) {
|
||||
dsl_dataset_t *origin;
|
||||
|
||||
VERIFY3U(0, ==, dsl_dataset_hold_obj(dp,
|
||||
ds->ds_dir->dd_phys->dd_origin_obj, FTAG, &origin));
|
||||
|
||||
if (origin->ds_dir->dd_phys->dd_clones == 0) {
|
||||
dmu_buf_will_dirty(origin->ds_dir->dd_dbuf, tx);
|
||||
origin->ds_dir->dd_phys->dd_clones = zap_create(mos,
|
||||
DMU_OT_DSL_CLONES, DMU_OT_NONE, 0, tx);
|
||||
}
|
||||
|
||||
VERIFY3U(0, ==, zap_add_int(dp->dp_meta_objset,
|
||||
origin->ds_dir->dd_phys->dd_clones, dsobj, tx));
|
||||
|
||||
dsl_dataset_rele(origin, FTAG);
|
||||
}
|
||||
|
||||
dsl_dataset_rele(ds, FTAG);
|
||||
return (0);
|
||||
}
|
||||
|
||||
void
|
||||
dsl_pool_upgrade_dir_clones(dsl_pool_t *dp, dmu_tx_t *tx)
|
||||
{
|
||||
ASSERT(dmu_tx_is_syncing(tx));
|
||||
uint64_t obj;
|
||||
|
||||
(void) dsl_dir_create_sync(dp, dp->dp_root_dir, FREE_DIR_NAME, tx);
|
||||
VERIFY(0 == dsl_pool_open_special_dir(dp,
|
||||
FREE_DIR_NAME, &dp->dp_free_dir));
|
||||
|
||||
/*
|
||||
* We can't use bpobj_alloc(), because spa_version() still
|
||||
* returns the old version, and we need a new-version bpobj with
|
||||
* subobj support. So call dmu_object_alloc() directly.
|
||||
*/
|
||||
obj = dmu_object_alloc(dp->dp_meta_objset, DMU_OT_BPOBJ,
|
||||
SPA_MAXBLOCKSIZE, DMU_OT_BPOBJ_HDR, sizeof (bpobj_phys_t), tx);
|
||||
VERIFY3U(0, ==, zap_add(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
|
||||
DMU_POOL_FREE_BPOBJ, sizeof (uint64_t), 1, &obj, tx));
|
||||
VERIFY3U(0, ==, bpobj_open(&dp->dp_free_bpobj,
|
||||
dp->dp_meta_objset, obj));
|
||||
|
||||
VERIFY3U(0, ==, dmu_objset_find_spa(dp->dp_spa, NULL,
|
||||
upgrade_dir_clones_cb, tx, DS_FIND_CHILDREN));
|
||||
}
|
||||
|
||||
void
|
||||
dsl_pool_create_origin(dsl_pool_t *dp, dmu_tx_t *tx)
|
||||
{
|
||||
uint64_t dsobj;
|
||||
dsl_dataset_t *ds;
|
||||
|
||||
ASSERT(dmu_tx_is_syncing(tx));
|
||||
ASSERT(dp->dp_origin_snap == NULL);
|
||||
|
||||
/* create the origin dir, ds, & snap-ds */
|
||||
rw_enter(&dp->dp_config_rwlock, RW_WRITER);
|
||||
dsobj = dsl_dataset_create_sync(dp->dp_root_dir, ORIGIN_DIR_NAME,
|
||||
NULL, 0, kcred, tx);
|
||||
VERIFY(0 == dsl_dataset_hold_obj(dp, dsobj, FTAG, &ds));
|
||||
dsl_dataset_snapshot_sync(ds, ORIGIN_DIR_NAME, tx);
|
||||
VERIFY(0 == dsl_dataset_hold_obj(dp, ds->ds_phys->ds_prev_snap_obj,
|
||||
dp, &dp->dp_origin_snap));
|
||||
dsl_dataset_rele(ds, FTAG);
|
||||
rw_exit(&dp->dp_config_rwlock);
|
||||
}
|
||||
|
||||
taskq_t *
|
||||
dsl_pool_vnrele_taskq(dsl_pool_t *dp)
|
||||
{
|
||||
return (dp->dp_vnrele_taskq);
|
||||
}
|
||||
|
||||
/*
|
||||
* Walk through the pool-wide zap object of temporary snapshot user holds
|
||||
* and release them.
|
||||
*/
|
||||
void
|
||||
dsl_pool_clean_tmp_userrefs(dsl_pool_t *dp)
|
||||
{
|
||||
zap_attribute_t za;
|
||||
zap_cursor_t zc;
|
||||
objset_t *mos = dp->dp_meta_objset;
|
||||
uint64_t zapobj = dp->dp_tmp_userrefs_obj;
|
||||
|
||||
if (zapobj == 0)
|
||||
return;
|
||||
ASSERT(spa_version(dp->dp_spa) >= SPA_VERSION_USERREFS);
|
||||
|
||||
for (zap_cursor_init(&zc, mos, zapobj);
|
||||
zap_cursor_retrieve(&zc, &za) == 0;
|
||||
zap_cursor_advance(&zc)) {
|
||||
char *htag;
|
||||
uint64_t dsobj;
|
||||
|
||||
htag = strchr(za.za_name, '-');
|
||||
*htag = '\0';
|
||||
++htag;
|
||||
dsobj = strtonum(za.za_name, NULL);
|
||||
(void) dsl_dataset_user_release_tmp(dp, dsobj, htag, B_FALSE);
|
||||
}
|
||||
zap_cursor_fini(&zc);
|
||||
}
|
||||
|
||||
/*
|
||||
* Create the pool-wide zap object for storing temporary snapshot holds.
|
||||
*/
|
||||
void
|
||||
dsl_pool_user_hold_create_obj(dsl_pool_t *dp, dmu_tx_t *tx)
|
||||
{
|
||||
objset_t *mos = dp->dp_meta_objset;
|
||||
|
||||
ASSERT(dp->dp_tmp_userrefs_obj == 0);
|
||||
ASSERT(dmu_tx_is_syncing(tx));
|
||||
|
||||
dp->dp_tmp_userrefs_obj = zap_create(mos, DMU_OT_USERREFS,
|
||||
DMU_OT_NONE, 0, tx);
|
||||
|
||||
VERIFY(zap_add(mos, DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_TMP_USERREFS,
|
||||
sizeof (uint64_t), 1, &dp->dp_tmp_userrefs_obj, tx) == 0);
|
||||
}
|
||||
|
||||
static int
|
||||
dsl_pool_user_hold_rele_impl(dsl_pool_t *dp, uint64_t dsobj,
|
||||
const char *tag, uint64_t *now, dmu_tx_t *tx, boolean_t holding)
|
||||
{
|
||||
objset_t *mos = dp->dp_meta_objset;
|
||||
uint64_t zapobj = dp->dp_tmp_userrefs_obj;
|
||||
char *name;
|
||||
int error;
|
||||
|
||||
ASSERT(spa_version(dp->dp_spa) >= SPA_VERSION_USERREFS);
|
||||
ASSERT(dmu_tx_is_syncing(tx));
|
||||
|
||||
/*
|
||||
* If the pool was created prior to SPA_VERSION_USERREFS, the
|
||||
* zap object for temporary holds might not exist yet.
|
||||
*/
|
||||
if (zapobj == 0) {
|
||||
if (holding) {
|
||||
dsl_pool_user_hold_create_obj(dp, tx);
|
||||
zapobj = dp->dp_tmp_userrefs_obj;
|
||||
} else {
|
||||
return (ENOENT);
|
||||
}
|
||||
}
|
||||
|
||||
name = kmem_asprintf("%llx-%s", (u_longlong_t)dsobj, tag);
|
||||
if (holding)
|
||||
error = zap_add(mos, zapobj, name, 8, 1, now, tx);
|
||||
else
|
||||
error = zap_remove(mos, zapobj, name, tx);
|
||||
strfree(name);
|
||||
|
||||
return (error);
|
||||
}
|
||||
|
||||
/*
|
||||
* Add a temporary hold for the given dataset object and tag.
|
||||
*/
|
||||
int
|
||||
dsl_pool_user_hold(dsl_pool_t *dp, uint64_t dsobj, const char *tag,
|
||||
uint64_t *now, dmu_tx_t *tx)
|
||||
{
|
||||
return (dsl_pool_user_hold_rele_impl(dp, dsobj, tag, now, tx, B_TRUE));
|
||||
}
|
||||
|
||||
/*
|
||||
* Release a temporary hold for the given dataset object and tag.
|
||||
*/
|
||||
int
|
||||
dsl_pool_user_release(dsl_pool_t *dp, uint64_t dsobj, const char *tag,
|
||||
dmu_tx_t *tx)
|
||||
{
|
||||
return (dsl_pool_user_hold_rele_impl(dp, dsobj, tag, NULL,
|
||||
tx, B_FALSE));
|
||||
}
|
1153
uts/common/fs/zfs/dsl_prop.c
Normal file
1153
uts/common/fs/zfs/dsl_prop.c
Normal file
File diff suppressed because it is too large
Load Diff
1766
uts/common/fs/zfs/dsl_scan.c
Normal file
1766
uts/common/fs/zfs/dsl_scan.c
Normal file
File diff suppressed because it is too large
Load Diff
240
uts/common/fs/zfs/dsl_synctask.c
Normal file
240
uts/common/fs/zfs/dsl_synctask.c
Normal file
@ -0,0 +1,240 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/dmu_tx.h>
|
||||
#include <sys/dsl_pool.h>
|
||||
#include <sys/dsl_dir.h>
|
||||
#include <sys/dsl_synctask.h>
|
||||
#include <sys/metaslab.h>
|
||||
|
||||
#define DST_AVG_BLKSHIFT 14
|
||||
|
||||
/* ARGSUSED */
|
||||
static int
|
||||
dsl_null_checkfunc(void *arg1, void *arg2, dmu_tx_t *tx)
|
||||
{
|
||||
return (0);
|
||||
}
|
||||
|
||||
dsl_sync_task_group_t *
|
||||
dsl_sync_task_group_create(dsl_pool_t *dp)
|
||||
{
|
||||
dsl_sync_task_group_t *dstg;
|
||||
|
||||
dstg = kmem_zalloc(sizeof (dsl_sync_task_group_t), KM_SLEEP);
|
||||
list_create(&dstg->dstg_tasks, sizeof (dsl_sync_task_t),
|
||||
offsetof(dsl_sync_task_t, dst_node));
|
||||
dstg->dstg_pool = dp;
|
||||
|
||||
return (dstg);
|
||||
}
|
||||
|
||||
void
|
||||
dsl_sync_task_create(dsl_sync_task_group_t *dstg,
|
||||
dsl_checkfunc_t *checkfunc, dsl_syncfunc_t *syncfunc,
|
||||
void *arg1, void *arg2, int blocks_modified)
|
||||
{
|
||||
dsl_sync_task_t *dst;
|
||||
|
||||
if (checkfunc == NULL)
|
||||
checkfunc = dsl_null_checkfunc;
|
||||
dst = kmem_zalloc(sizeof (dsl_sync_task_t), KM_SLEEP);
|
||||
dst->dst_checkfunc = checkfunc;
|
||||
dst->dst_syncfunc = syncfunc;
|
||||
dst->dst_arg1 = arg1;
|
||||
dst->dst_arg2 = arg2;
|
||||
list_insert_tail(&dstg->dstg_tasks, dst);
|
||||
|
||||
dstg->dstg_space += blocks_modified << DST_AVG_BLKSHIFT;
|
||||
}
|
||||
|
||||
int
|
||||
dsl_sync_task_group_wait(dsl_sync_task_group_t *dstg)
|
||||
{
|
||||
dmu_tx_t *tx;
|
||||
uint64_t txg;
|
||||
dsl_sync_task_t *dst;
|
||||
|
||||
top:
|
||||
tx = dmu_tx_create_dd(dstg->dstg_pool->dp_mos_dir);
|
||||
VERIFY(0 == dmu_tx_assign(tx, TXG_WAIT));
|
||||
|
||||
txg = dmu_tx_get_txg(tx);
|
||||
|
||||
/* Do a preliminary error check. */
|
||||
dstg->dstg_err = 0;
|
||||
rw_enter(&dstg->dstg_pool->dp_config_rwlock, RW_READER);
|
||||
for (dst = list_head(&dstg->dstg_tasks); dst;
|
||||
dst = list_next(&dstg->dstg_tasks, dst)) {
|
||||
#ifdef ZFS_DEBUG
|
||||
/*
|
||||
* Only check half the time, otherwise, the sync-context
|
||||
* check will almost never fail.
|
||||
*/
|
||||
if (spa_get_random(2) == 0)
|
||||
continue;
|
||||
#endif
|
||||
dst->dst_err =
|
||||
dst->dst_checkfunc(dst->dst_arg1, dst->dst_arg2, tx);
|
||||
if (dst->dst_err)
|
||||
dstg->dstg_err = dst->dst_err;
|
||||
}
|
||||
rw_exit(&dstg->dstg_pool->dp_config_rwlock);
|
||||
|
||||
if (dstg->dstg_err) {
|
||||
dmu_tx_commit(tx);
|
||||
return (dstg->dstg_err);
|
||||
}
|
||||
|
||||
/*
|
||||
* We don't generally have many sync tasks, so pay the price of
|
||||
* add_tail to get the tasks executed in the right order.
|
||||
*/
|
||||
VERIFY(0 == txg_list_add_tail(&dstg->dstg_pool->dp_sync_tasks,
|
||||
dstg, txg));
|
||||
|
||||
dmu_tx_commit(tx);
|
||||
|
||||
txg_wait_synced(dstg->dstg_pool, txg);
|
||||
|
||||
if (dstg->dstg_err == EAGAIN) {
|
||||
txg_wait_synced(dstg->dstg_pool, txg + TXG_DEFER_SIZE);
|
||||
goto top;
|
||||
}
|
||||
|
||||
return (dstg->dstg_err);
|
||||
}
|
||||
|
||||
void
|
||||
dsl_sync_task_group_nowait(dsl_sync_task_group_t *dstg, dmu_tx_t *tx)
|
||||
{
|
||||
uint64_t txg;
|
||||
|
||||
dstg->dstg_nowaiter = B_TRUE;
|
||||
txg = dmu_tx_get_txg(tx);
|
||||
/*
|
||||
* We don't generally have many sync tasks, so pay the price of
|
||||
* add_tail to get the tasks executed in the right order.
|
||||
*/
|
||||
VERIFY(0 == txg_list_add_tail(&dstg->dstg_pool->dp_sync_tasks,
|
||||
dstg, txg));
|
||||
}
|
||||
|
||||
void
|
||||
dsl_sync_task_group_destroy(dsl_sync_task_group_t *dstg)
|
||||
{
|
||||
dsl_sync_task_t *dst;
|
||||
|
||||
while (dst = list_head(&dstg->dstg_tasks)) {
|
||||
list_remove(&dstg->dstg_tasks, dst);
|
||||
kmem_free(dst, sizeof (dsl_sync_task_t));
|
||||
}
|
||||
kmem_free(dstg, sizeof (dsl_sync_task_group_t));
|
||||
}
|
||||
|
||||
void
|
||||
dsl_sync_task_group_sync(dsl_sync_task_group_t *dstg, dmu_tx_t *tx)
|
||||
{
|
||||
dsl_sync_task_t *dst;
|
||||
dsl_pool_t *dp = dstg->dstg_pool;
|
||||
uint64_t quota, used;
|
||||
|
||||
ASSERT3U(dstg->dstg_err, ==, 0);
|
||||
|
||||
/*
|
||||
* Check for sufficient space. We just check against what's
|
||||
* on-disk; we don't want any in-flight accounting to get in our
|
||||
* way, because open context may have already used up various
|
||||
* in-core limits (arc_tempreserve, dsl_pool_tempreserve).
|
||||
*/
|
||||
quota = dsl_pool_adjustedsize(dp, B_FALSE) -
|
||||
metaslab_class_get_deferred(spa_normal_class(dp->dp_spa));
|
||||
used = dp->dp_root_dir->dd_phys->dd_used_bytes;
|
||||
/* MOS space is triple-dittoed, so we multiply by 3. */
|
||||
if (dstg->dstg_space > 0 && used + dstg->dstg_space * 3 > quota) {
|
||||
dstg->dstg_err = ENOSPC;
|
||||
return;
|
||||
}
|
||||
|
||||
/*
|
||||
* Check for errors by calling checkfuncs.
|
||||
*/
|
||||
rw_enter(&dp->dp_config_rwlock, RW_WRITER);
|
||||
for (dst = list_head(&dstg->dstg_tasks); dst;
|
||||
dst = list_next(&dstg->dstg_tasks, dst)) {
|
||||
dst->dst_err =
|
||||
dst->dst_checkfunc(dst->dst_arg1, dst->dst_arg2, tx);
|
||||
if (dst->dst_err)
|
||||
dstg->dstg_err = dst->dst_err;
|
||||
}
|
||||
|
||||
if (dstg->dstg_err == 0) {
|
||||
/*
|
||||
* Execute sync tasks.
|
||||
*/
|
||||
for (dst = list_head(&dstg->dstg_tasks); dst;
|
||||
dst = list_next(&dstg->dstg_tasks, dst)) {
|
||||
dst->dst_syncfunc(dst->dst_arg1, dst->dst_arg2, tx);
|
||||
}
|
||||
}
|
||||
rw_exit(&dp->dp_config_rwlock);
|
||||
|
||||
if (dstg->dstg_nowaiter)
|
||||
dsl_sync_task_group_destroy(dstg);
|
||||
}
|
||||
|
||||
int
|
||||
dsl_sync_task_do(dsl_pool_t *dp,
|
||||
dsl_checkfunc_t *checkfunc, dsl_syncfunc_t *syncfunc,
|
||||
void *arg1, void *arg2, int blocks_modified)
|
||||
{
|
||||
dsl_sync_task_group_t *dstg;
|
||||
int err;
|
||||
|
||||
ASSERT(spa_writeable(dp->dp_spa));
|
||||
|
||||
dstg = dsl_sync_task_group_create(dp);
|
||||
dsl_sync_task_create(dstg, checkfunc, syncfunc,
|
||||
arg1, arg2, blocks_modified);
|
||||
err = dsl_sync_task_group_wait(dstg);
|
||||
dsl_sync_task_group_destroy(dstg);
|
||||
return (err);
|
||||
}
|
||||
|
||||
void
|
||||
dsl_sync_task_do_nowait(dsl_pool_t *dp,
|
||||
dsl_checkfunc_t *checkfunc, dsl_syncfunc_t *syncfunc,
|
||||
void *arg1, void *arg2, int blocks_modified, dmu_tx_t *tx)
|
||||
{
|
||||
dsl_sync_task_group_t *dstg;
|
||||
|
||||
if (!spa_writeable(dp->dp_spa))
|
||||
return;
|
||||
|
||||
dstg = dsl_sync_task_group_create(dp);
|
||||
dsl_sync_task_create(dstg, checkfunc, syncfunc,
|
||||
arg1, arg2, blocks_modified);
|
||||
dsl_sync_task_group_nowait(dstg, tx);
|
||||
}
|
69
uts/common/fs/zfs/gzip.c
Normal file
69
uts/common/fs/zfs/gzip.c
Normal file
@ -0,0 +1,69 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
|
||||
/*
|
||||
* Copyright 2007 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#pragma ident "%Z%%M% %I% %E% SMI"
|
||||
|
||||
#include <sys/debug.h>
|
||||
#include <sys/types.h>
|
||||
#include <sys/zmod.h>
|
||||
|
||||
#ifdef _KERNEL
|
||||
#include <sys/systm.h>
|
||||
#else
|
||||
#include <strings.h>
|
||||
#endif
|
||||
|
||||
size_t
|
||||
gzip_compress(void *s_start, void *d_start, size_t s_len, size_t d_len, int n)
|
||||
{
|
||||
size_t dstlen = d_len;
|
||||
|
||||
ASSERT(d_len <= s_len);
|
||||
|
||||
if (z_compress_level(d_start, &dstlen, s_start, s_len, n) != Z_OK) {
|
||||
if (d_len != s_len)
|
||||
return (s_len);
|
||||
|
||||
bcopy(s_start, d_start, s_len);
|
||||
return (s_len);
|
||||
}
|
||||
|
||||
return (dstlen);
|
||||
}
|
||||
|
||||
/*ARGSUSED*/
|
||||
int
|
||||
gzip_decompress(void *s_start, void *d_start, size_t s_len, size_t d_len, int n)
|
||||
{
|
||||
size_t dstlen = d_len;
|
||||
|
||||
ASSERT(d_len >= s_len);
|
||||
|
||||
if (z_uncompress(d_start, &dstlen, s_start, s_len) != Z_OK)
|
||||
return (-1);
|
||||
|
||||
return (0);
|
||||
}
|
123
uts/common/fs/zfs/lzjb.c
Normal file
123
uts/common/fs/zfs/lzjb.c
Normal file
@ -0,0 +1,123 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
/*
|
||||
* We keep our own copy of this algorithm for 3 main reasons:
|
||||
* 1. If we didn't, anyone modifying common/os/compress.c would
|
||||
* directly break our on disk format
|
||||
* 2. Our version of lzjb does not have a number of checks that the
|
||||
* common/os version needs and uses
|
||||
* 3. We initialize the lempel to ensure deterministic results,
|
||||
* so that identical blocks can always be deduplicated.
|
||||
* In particular, we are adding the "feature" that compress() can
|
||||
* take a destination buffer size and returns the compressed length, or the
|
||||
* source length if compression would overflow the destination buffer.
|
||||
*/
|
||||
|
||||
#include <sys/types.h>
|
||||
|
||||
#define MATCH_BITS 6
|
||||
#define MATCH_MIN 3
|
||||
#define MATCH_MAX ((1 << MATCH_BITS) + (MATCH_MIN - 1))
|
||||
#define OFFSET_MASK ((1 << (16 - MATCH_BITS)) - 1)
|
||||
#define LEMPEL_SIZE 1024
|
||||
|
||||
/*ARGSUSED*/
|
||||
size_t
|
||||
lzjb_compress(void *s_start, void *d_start, size_t s_len, size_t d_len, int n)
|
||||
{
|
||||
uchar_t *src = s_start;
|
||||
uchar_t *dst = d_start;
|
||||
uchar_t *cpy, *copymap;
|
||||
int copymask = 1 << (NBBY - 1);
|
||||
int mlen, offset, hash;
|
||||
uint16_t *hp;
|
||||
uint16_t lempel[LEMPEL_SIZE] = { 0 };
|
||||
|
||||
while (src < (uchar_t *)s_start + s_len) {
|
||||
if ((copymask <<= 1) == (1 << NBBY)) {
|
||||
if (dst >= (uchar_t *)d_start + d_len - 1 - 2 * NBBY)
|
||||
return (s_len);
|
||||
copymask = 1;
|
||||
copymap = dst;
|
||||
*dst++ = 0;
|
||||
}
|
||||
if (src > (uchar_t *)s_start + s_len - MATCH_MAX) {
|
||||
*dst++ = *src++;
|
||||
continue;
|
||||
}
|
||||
hash = (src[0] << 16) + (src[1] << 8) + src[2];
|
||||
hash += hash >> 9;
|
||||
hash += hash >> 5;
|
||||
hp = &lempel[hash & (LEMPEL_SIZE - 1)];
|
||||
offset = (intptr_t)(src - *hp) & OFFSET_MASK;
|
||||
*hp = (uint16_t)(uintptr_t)src;
|
||||
cpy = src - offset;
|
||||
if (cpy >= (uchar_t *)s_start && cpy != src &&
|
||||
src[0] == cpy[0] && src[1] == cpy[1] && src[2] == cpy[2]) {
|
||||
*copymap |= copymask;
|
||||
for (mlen = MATCH_MIN; mlen < MATCH_MAX; mlen++)
|
||||
if (src[mlen] != cpy[mlen])
|
||||
break;
|
||||
*dst++ = ((mlen - MATCH_MIN) << (NBBY - MATCH_BITS)) |
|
||||
(offset >> NBBY);
|
||||
*dst++ = (uchar_t)offset;
|
||||
src += mlen;
|
||||
} else {
|
||||
*dst++ = *src++;
|
||||
}
|
||||
}
|
||||
return (dst - (uchar_t *)d_start);
|
||||
}
|
||||
|
||||
/*ARGSUSED*/
|
||||
int
|
||||
lzjb_decompress(void *s_start, void *d_start, size_t s_len, size_t d_len, int n)
|
||||
{
|
||||
uchar_t *src = s_start;
|
||||
uchar_t *dst = d_start;
|
||||
uchar_t *d_end = (uchar_t *)d_start + d_len;
|
||||
uchar_t *cpy, copymap;
|
||||
int copymask = 1 << (NBBY - 1);
|
||||
|
||||
while (dst < d_end) {
|
||||
if ((copymask <<= 1) == (1 << NBBY)) {
|
||||
copymask = 1;
|
||||
copymap = *src++;
|
||||
}
|
||||
if (copymap & copymask) {
|
||||
int mlen = (src[0] >> (NBBY - MATCH_BITS)) + MATCH_MIN;
|
||||
int offset = ((src[0] << NBBY) | src[1]) & OFFSET_MASK;
|
||||
src += 2;
|
||||
if ((cpy = dst - offset) < (uchar_t *)d_start)
|
||||
return (-1);
|
||||
while (--mlen >= 0 && dst < d_end)
|
||||
*dst++ = *cpy++;
|
||||
} else {
|
||||
*dst++ = *src++;
|
||||
}
|
||||
}
|
||||
return (0);
|
||||
}
|
1604
uts/common/fs/zfs/metaslab.c
Normal file
1604
uts/common/fs/zfs/metaslab.c
Normal file
File diff suppressed because it is too large
Load Diff
223
uts/common/fs/zfs/refcount.c
Normal file
223
uts/common/fs/zfs/refcount.c
Normal file
@ -0,0 +1,223 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/refcount.h>
|
||||
|
||||
#ifdef ZFS_DEBUG
|
||||
|
||||
#ifdef _KERNEL
|
||||
int reference_tracking_enable = FALSE; /* runs out of memory too easily */
|
||||
#else
|
||||
int reference_tracking_enable = TRUE;
|
||||
#endif
|
||||
int reference_history = 4; /* tunable */
|
||||
|
||||
static kmem_cache_t *reference_cache;
|
||||
static kmem_cache_t *reference_history_cache;
|
||||
|
||||
void
|
||||
refcount_init(void)
|
||||
{
|
||||
reference_cache = kmem_cache_create("reference_cache",
|
||||
sizeof (reference_t), 0, NULL, NULL, NULL, NULL, NULL, 0);
|
||||
|
||||
reference_history_cache = kmem_cache_create("reference_history_cache",
|
||||
sizeof (uint64_t), 0, NULL, NULL, NULL, NULL, NULL, 0);
|
||||
}
|
||||
|
||||
void
|
||||
refcount_fini(void)
|
||||
{
|
||||
kmem_cache_destroy(reference_cache);
|
||||
kmem_cache_destroy(reference_history_cache);
|
||||
}
|
||||
|
||||
void
|
||||
refcount_create(refcount_t *rc)
|
||||
{
|
||||
mutex_init(&rc->rc_mtx, NULL, MUTEX_DEFAULT, NULL);
|
||||
list_create(&rc->rc_list, sizeof (reference_t),
|
||||
offsetof(reference_t, ref_link));
|
||||
list_create(&rc->rc_removed, sizeof (reference_t),
|
||||
offsetof(reference_t, ref_link));
|
||||
rc->rc_count = 0;
|
||||
rc->rc_removed_count = 0;
|
||||
}
|
||||
|
||||
void
|
||||
refcount_destroy_many(refcount_t *rc, uint64_t number)
|
||||
{
|
||||
reference_t *ref;
|
||||
|
||||
ASSERT(rc->rc_count == number);
|
||||
while (ref = list_head(&rc->rc_list)) {
|
||||
list_remove(&rc->rc_list, ref);
|
||||
kmem_cache_free(reference_cache, ref);
|
||||
}
|
||||
list_destroy(&rc->rc_list);
|
||||
|
||||
while (ref = list_head(&rc->rc_removed)) {
|
||||
list_remove(&rc->rc_removed, ref);
|
||||
kmem_cache_free(reference_history_cache, ref->ref_removed);
|
||||
kmem_cache_free(reference_cache, ref);
|
||||
}
|
||||
list_destroy(&rc->rc_removed);
|
||||
mutex_destroy(&rc->rc_mtx);
|
||||
}
|
||||
|
||||
void
|
||||
refcount_destroy(refcount_t *rc)
|
||||
{
|
||||
refcount_destroy_many(rc, 0);
|
||||
}
|
||||
|
||||
int
|
||||
refcount_is_zero(refcount_t *rc)
|
||||
{
|
||||
ASSERT(rc->rc_count >= 0);
|
||||
return (rc->rc_count == 0);
|
||||
}
|
||||
|
||||
int64_t
|
||||
refcount_count(refcount_t *rc)
|
||||
{
|
||||
ASSERT(rc->rc_count >= 0);
|
||||
return (rc->rc_count);
|
||||
}
|
||||
|
||||
int64_t
|
||||
refcount_add_many(refcount_t *rc, uint64_t number, void *holder)
|
||||
{
|
||||
reference_t *ref;
|
||||
int64_t count;
|
||||
|
||||
if (reference_tracking_enable) {
|
||||
ref = kmem_cache_alloc(reference_cache, KM_SLEEP);
|
||||
ref->ref_holder = holder;
|
||||
ref->ref_number = number;
|
||||
}
|
||||
mutex_enter(&rc->rc_mtx);
|
||||
ASSERT(rc->rc_count >= 0);
|
||||
if (reference_tracking_enable)
|
||||
list_insert_head(&rc->rc_list, ref);
|
||||
rc->rc_count += number;
|
||||
count = rc->rc_count;
|
||||
mutex_exit(&rc->rc_mtx);
|
||||
|
||||
return (count);
|
||||
}
|
||||
|
||||
int64_t
|
||||
refcount_add(refcount_t *rc, void *holder)
|
||||
{
|
||||
return (refcount_add_many(rc, 1, holder));
|
||||
}
|
||||
|
||||
int64_t
|
||||
refcount_remove_many(refcount_t *rc, uint64_t number, void *holder)
|
||||
{
|
||||
reference_t *ref;
|
||||
int64_t count;
|
||||
|
||||
mutex_enter(&rc->rc_mtx);
|
||||
ASSERT(rc->rc_count >= number);
|
||||
|
||||
if (!reference_tracking_enable) {
|
||||
rc->rc_count -= number;
|
||||
count = rc->rc_count;
|
||||
mutex_exit(&rc->rc_mtx);
|
||||
return (count);
|
||||
}
|
||||
|
||||
for (ref = list_head(&rc->rc_list); ref;
|
||||
ref = list_next(&rc->rc_list, ref)) {
|
||||
if (ref->ref_holder == holder && ref->ref_number == number) {
|
||||
list_remove(&rc->rc_list, ref);
|
||||
if (reference_history > 0) {
|
||||
ref->ref_removed =
|
||||
kmem_cache_alloc(reference_history_cache,
|
||||
KM_SLEEP);
|
||||
list_insert_head(&rc->rc_removed, ref);
|
||||
rc->rc_removed_count++;
|
||||
if (rc->rc_removed_count >= reference_history) {
|
||||
ref = list_tail(&rc->rc_removed);
|
||||
list_remove(&rc->rc_removed, ref);
|
||||
kmem_cache_free(reference_history_cache,
|
||||
ref->ref_removed);
|
||||
kmem_cache_free(reference_cache, ref);
|
||||
rc->rc_removed_count--;
|
||||
}
|
||||
} else {
|
||||
kmem_cache_free(reference_cache, ref);
|
||||
}
|
||||
rc->rc_count -= number;
|
||||
count = rc->rc_count;
|
||||
mutex_exit(&rc->rc_mtx);
|
||||
return (count);
|
||||
}
|
||||
}
|
||||
panic("No such hold %p on refcount %llx", holder,
|
||||
(u_longlong_t)(uintptr_t)rc);
|
||||
return (-1);
|
||||
}
|
||||
|
||||
int64_t
|
||||
refcount_remove(refcount_t *rc, void *holder)
|
||||
{
|
||||
return (refcount_remove_many(rc, 1, holder));
|
||||
}
|
||||
|
||||
void
|
||||
refcount_transfer(refcount_t *dst, refcount_t *src)
|
||||
{
|
||||
int64_t count, removed_count;
|
||||
list_t list, removed;
|
||||
|
||||
list_create(&list, sizeof (reference_t),
|
||||
offsetof(reference_t, ref_link));
|
||||
list_create(&removed, sizeof (reference_t),
|
||||
offsetof(reference_t, ref_link));
|
||||
|
||||
mutex_enter(&src->rc_mtx);
|
||||
count = src->rc_count;
|
||||
removed_count = src->rc_removed_count;
|
||||
src->rc_count = 0;
|
||||
src->rc_removed_count = 0;
|
||||
list_move_tail(&list, &src->rc_list);
|
||||
list_move_tail(&removed, &src->rc_removed);
|
||||
mutex_exit(&src->rc_mtx);
|
||||
|
||||
mutex_enter(&dst->rc_mtx);
|
||||
dst->rc_count += count;
|
||||
dst->rc_removed_count += removed_count;
|
||||
list_move_tail(&dst->rc_list, &list);
|
||||
list_move_tail(&dst->rc_removed, &removed);
|
||||
mutex_exit(&dst->rc_mtx);
|
||||
|
||||
list_destroy(&list);
|
||||
list_destroy(&removed);
|
||||
}
|
||||
|
||||
#endif /* ZFS_DEBUG */
|
264
uts/common/fs/zfs/rrwlock.c
Normal file
264
uts/common/fs/zfs/rrwlock.c
Normal file
@ -0,0 +1,264 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#include <sys/refcount.h>
|
||||
#include <sys/rrwlock.h>
|
||||
|
||||
/*
|
||||
* This file contains the implementation of a re-entrant read
|
||||
* reader/writer lock (aka "rrwlock").
|
||||
*
|
||||
* This is a normal reader/writer lock with the additional feature
|
||||
* of allowing threads who have already obtained a read lock to
|
||||
* re-enter another read lock (re-entrant read) - even if there are
|
||||
* waiting writers.
|
||||
*
|
||||
* Callers who have not obtained a read lock give waiting writers priority.
|
||||
*
|
||||
* The rrwlock_t lock does not allow re-entrant writers, nor does it
|
||||
* allow a re-entrant mix of reads and writes (that is, it does not
|
||||
* allow a caller who has already obtained a read lock to be able to
|
||||
* then grab a write lock without first dropping all read locks, and
|
||||
* vice versa).
|
||||
*
|
||||
* The rrwlock_t uses tsd (thread specific data) to keep a list of
|
||||
* nodes (rrw_node_t), where each node keeps track of which specific
|
||||
* lock (rrw_node_t::rn_rrl) the thread has grabbed. Since re-entering
|
||||
* should be rare, a thread that grabs multiple reads on the same rrwlock_t
|
||||
* will store multiple rrw_node_ts of the same 'rrn_rrl'. Nodes on the
|
||||
* tsd list can represent a different rrwlock_t. This allows a thread
|
||||
* to enter multiple and unique rrwlock_ts for read locks at the same time.
|
||||
*
|
||||
* Since using tsd exposes some overhead, the rrwlock_t only needs to
|
||||
* keep tsd data when writers are waiting. If no writers are waiting, then
|
||||
* a reader just bumps the anonymous read count (rr_anon_rcount) - no tsd
|
||||
* is needed. Once a writer attempts to grab the lock, readers then
|
||||
* keep tsd data and bump the linked readers count (rr_linked_rcount).
|
||||
*
|
||||
* If there are waiting writers and there are anonymous readers, then a
|
||||
* reader doesn't know if it is a re-entrant lock. But since it may be one,
|
||||
* we allow the read to proceed (otherwise it could deadlock). Since once
|
||||
* waiting writers are active, readers no longer bump the anonymous count,
|
||||
* the anonymous readers will eventually flush themselves out. At this point,
|
||||
* readers will be able to tell if they are a re-entrant lock (have a
|
||||
* rrw_node_t entry for the lock) or not. If they are a re-entrant lock, then
|
||||
* we must let the proceed. If they are not, then the reader blocks for the
|
||||
* waiting writers. Hence, we do not starve writers.
|
||||
*/
|
||||
|
||||
/* global key for TSD */
|
||||
uint_t rrw_tsd_key;
|
||||
|
||||
typedef struct rrw_node {
|
||||
struct rrw_node *rn_next;
|
||||
rrwlock_t *rn_rrl;
|
||||
} rrw_node_t;
|
||||
|
||||
static rrw_node_t *
|
||||
rrn_find(rrwlock_t *rrl)
|
||||
{
|
||||
rrw_node_t *rn;
|
||||
|
||||
if (refcount_count(&rrl->rr_linked_rcount) == 0)
|
||||
return (NULL);
|
||||
|
||||
for (rn = tsd_get(rrw_tsd_key); rn != NULL; rn = rn->rn_next) {
|
||||
if (rn->rn_rrl == rrl)
|
||||
return (rn);
|
||||
}
|
||||
return (NULL);
|
||||
}
|
||||
|
||||
/*
|
||||
* Add a node to the head of the singly linked list.
|
||||
*/
|
||||
static void
|
||||
rrn_add(rrwlock_t *rrl)
|
||||
{
|
||||
rrw_node_t *rn;
|
||||
|
||||
rn = kmem_alloc(sizeof (*rn), KM_SLEEP);
|
||||
rn->rn_rrl = rrl;
|
||||
rn->rn_next = tsd_get(rrw_tsd_key);
|
||||
VERIFY(tsd_set(rrw_tsd_key, rn) == 0);
|
||||
}
|
||||
|
||||
/*
|
||||
* If a node is found for 'rrl', then remove the node from this
|
||||
* thread's list and return TRUE; otherwise return FALSE.
|
||||
*/
|
||||
static boolean_t
|
||||
rrn_find_and_remove(rrwlock_t *rrl)
|
||||
{
|
||||
rrw_node_t *rn;
|
||||
rrw_node_t *prev = NULL;
|
||||
|
||||
if (refcount_count(&rrl->rr_linked_rcount) == 0)
|
||||
return (B_FALSE);
|
||||
|
||||
for (rn = tsd_get(rrw_tsd_key); rn != NULL; rn = rn->rn_next) {
|
||||
if (rn->rn_rrl == rrl) {
|
||||
if (prev)
|
||||
prev->rn_next = rn->rn_next;
|
||||
else
|
||||
VERIFY(tsd_set(rrw_tsd_key, rn->rn_next) == 0);
|
||||
kmem_free(rn, sizeof (*rn));
|
||||
return (B_TRUE);
|
||||
}
|
||||
prev = rn;
|
||||
}
|
||||
return (B_FALSE);
|
||||
}
|
||||
|
||||
void
|
||||
rrw_init(rrwlock_t *rrl)
|
||||
{
|
||||
mutex_init(&rrl->rr_lock, NULL, MUTEX_DEFAULT, NULL);
|
||||
cv_init(&rrl->rr_cv, NULL, CV_DEFAULT, NULL);
|
||||
rrl->rr_writer = NULL;
|
||||
refcount_create(&rrl->rr_anon_rcount);
|
||||
refcount_create(&rrl->rr_linked_rcount);
|
||||
rrl->rr_writer_wanted = B_FALSE;
|
||||
}
|
||||
|
||||
void
|
||||
rrw_destroy(rrwlock_t *rrl)
|
||||
{
|
||||
mutex_destroy(&rrl->rr_lock);
|
||||
cv_destroy(&rrl->rr_cv);
|
||||
ASSERT(rrl->rr_writer == NULL);
|
||||
refcount_destroy(&rrl->rr_anon_rcount);
|
||||
refcount_destroy(&rrl->rr_linked_rcount);
|
||||
}
|
||||
|
||||
static void
|
||||
rrw_enter_read(rrwlock_t *rrl, void *tag)
|
||||
{
|
||||
mutex_enter(&rrl->rr_lock);
|
||||
#if !defined(DEBUG) && defined(_KERNEL)
|
||||
if (!rrl->rr_writer && !rrl->rr_writer_wanted) {
|
||||
rrl->rr_anon_rcount.rc_count++;
|
||||
mutex_exit(&rrl->rr_lock);
|
||||
return;
|
||||
}
|
||||
DTRACE_PROBE(zfs__rrwfastpath__rdmiss);
|
||||
#endif
|
||||
ASSERT(rrl->rr_writer != curthread);
|
||||
ASSERT(refcount_count(&rrl->rr_anon_rcount) >= 0);
|
||||
|
||||
while (rrl->rr_writer || (rrl->rr_writer_wanted &&
|
||||
refcount_is_zero(&rrl->rr_anon_rcount) &&
|
||||
rrn_find(rrl) == NULL))
|
||||
cv_wait(&rrl->rr_cv, &rrl->rr_lock);
|
||||
|
||||
if (rrl->rr_writer_wanted) {
|
||||
/* may or may not be a re-entrant enter */
|
||||
rrn_add(rrl);
|
||||
(void) refcount_add(&rrl->rr_linked_rcount, tag);
|
||||
} else {
|
||||
(void) refcount_add(&rrl->rr_anon_rcount, tag);
|
||||
}
|
||||
ASSERT(rrl->rr_writer == NULL);
|
||||
mutex_exit(&rrl->rr_lock);
|
||||
}
|
||||
|
||||
static void
|
||||
rrw_enter_write(rrwlock_t *rrl)
|
||||
{
|
||||
mutex_enter(&rrl->rr_lock);
|
||||
ASSERT(rrl->rr_writer != curthread);
|
||||
|
||||
while (refcount_count(&rrl->rr_anon_rcount) > 0 ||
|
||||
refcount_count(&rrl->rr_linked_rcount) > 0 ||
|
||||
rrl->rr_writer != NULL) {
|
||||
rrl->rr_writer_wanted = B_TRUE;
|
||||
cv_wait(&rrl->rr_cv, &rrl->rr_lock);
|
||||
}
|
||||
rrl->rr_writer_wanted = B_FALSE;
|
||||
rrl->rr_writer = curthread;
|
||||
mutex_exit(&rrl->rr_lock);
|
||||
}
|
||||
|
||||
void
|
||||
rrw_enter(rrwlock_t *rrl, krw_t rw, void *tag)
|
||||
{
|
||||
if (rw == RW_READER)
|
||||
rrw_enter_read(rrl, tag);
|
||||
else
|
||||
rrw_enter_write(rrl);
|
||||
}
|
||||
|
||||
void
|
||||
rrw_exit(rrwlock_t *rrl, void *tag)
|
||||
{
|
||||
mutex_enter(&rrl->rr_lock);
|
||||
#if !defined(DEBUG) && defined(_KERNEL)
|
||||
if (!rrl->rr_writer && rrl->rr_linked_rcount.rc_count == 0) {
|
||||
rrl->rr_anon_rcount.rc_count--;
|
||||
if (rrl->rr_anon_rcount.rc_count == 0)
|
||||
cv_broadcast(&rrl->rr_cv);
|
||||
mutex_exit(&rrl->rr_lock);
|
||||
return;
|
||||
}
|
||||
DTRACE_PROBE(zfs__rrwfastpath__exitmiss);
|
||||
#endif
|
||||
ASSERT(!refcount_is_zero(&rrl->rr_anon_rcount) ||
|
||||
!refcount_is_zero(&rrl->rr_linked_rcount) ||
|
||||
rrl->rr_writer != NULL);
|
||||
|
||||
if (rrl->rr_writer == NULL) {
|
||||
int64_t count;
|
||||
if (rrn_find_and_remove(rrl))
|
||||
count = refcount_remove(&rrl->rr_linked_rcount, tag);
|
||||
else
|
||||
count = refcount_remove(&rrl->rr_anon_rcount, tag);
|
||||
if (count == 0)
|
||||
cv_broadcast(&rrl->rr_cv);
|
||||
} else {
|
||||
ASSERT(rrl->rr_writer == curthread);
|
||||
ASSERT(refcount_is_zero(&rrl->rr_anon_rcount) &&
|
||||
refcount_is_zero(&rrl->rr_linked_rcount));
|
||||
rrl->rr_writer = NULL;
|
||||
cv_broadcast(&rrl->rr_cv);
|
||||
}
|
||||
mutex_exit(&rrl->rr_lock);
|
||||
}
|
||||
|
||||
boolean_t
|
||||
rrw_held(rrwlock_t *rrl, krw_t rw)
|
||||
{
|
||||
boolean_t held;
|
||||
|
||||
mutex_enter(&rrl->rr_lock);
|
||||
if (rw == RW_WRITER) {
|
||||
held = (rrl->rr_writer == curthread);
|
||||
} else {
|
||||
held = (!refcount_is_zero(&rrl->rr_anon_rcount) ||
|
||||
!refcount_is_zero(&rrl->rr_linked_rcount));
|
||||
}
|
||||
mutex_exit(&rrl->rr_lock);
|
||||
|
||||
return (held);
|
||||
}
|
1970
uts/common/fs/zfs/sa.c
Normal file
1970
uts/common/fs/zfs/sa.c
Normal file
File diff suppressed because it is too large
Load Diff
50
uts/common/fs/zfs/sha256.c
Normal file
50
uts/common/fs/zfs/sha256.c
Normal file
@ -0,0 +1,50 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/zio.h>
|
||||
#include <sys/sha2.h>
|
||||
|
||||
void
|
||||
zio_checksum_SHA256(const void *buf, uint64_t size, zio_cksum_t *zcp)
|
||||
{
|
||||
SHA2_CTX ctx;
|
||||
zio_cksum_t tmp;
|
||||
|
||||
SHA2Init(SHA256, &ctx);
|
||||
SHA2Update(&ctx, buf, size);
|
||||
SHA2Final(&tmp, &ctx);
|
||||
|
||||
/*
|
||||
* A prior implementation of this function had a
|
||||
* private SHA256 implementation always wrote things out in
|
||||
* Big Endian and there wasn't a byteswap variant of it.
|
||||
* To preseve on disk compatibility we need to force that
|
||||
* behaviour.
|
||||
*/
|
||||
zcp->zc_word[0] = BE_64(tmp.zc_word[0]);
|
||||
zcp->zc_word[1] = BE_64(tmp.zc_word[1]);
|
||||
zcp->zc_word[2] = BE_64(tmp.zc_word[2]);
|
||||
zcp->zc_word[3] = BE_64(tmp.zc_word[3]);
|
||||
}
|
5882
uts/common/fs/zfs/spa.c
Normal file
5882
uts/common/fs/zfs/spa.c
Normal file
File diff suppressed because it is too large
Load Diff
487
uts/common/fs/zfs/spa_config.c
Normal file
487
uts/common/fs/zfs/spa_config.c
Normal file
@ -0,0 +1,487 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#include <sys/spa.h>
|
||||
#include <sys/spa_impl.h>
|
||||
#include <sys/nvpair.h>
|
||||
#include <sys/uio.h>
|
||||
#include <sys/fs/zfs.h>
|
||||
#include <sys/vdev_impl.h>
|
||||
#include <sys/zfs_ioctl.h>
|
||||
#include <sys/utsname.h>
|
||||
#include <sys/systeminfo.h>
|
||||
#include <sys/sunddi.h>
|
||||
#ifdef _KERNEL
|
||||
#include <sys/kobj.h>
|
||||
#include <sys/zone.h>
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Pool configuration repository.
|
||||
*
|
||||
* Pool configuration is stored as a packed nvlist on the filesystem. By
|
||||
* default, all pools are stored in /etc/zfs/zpool.cache and loaded on boot
|
||||
* (when the ZFS module is loaded). Pools can also have the 'cachefile'
|
||||
* property set that allows them to be stored in an alternate location until
|
||||
* the control of external software.
|
||||
*
|
||||
* For each cache file, we have a single nvlist which holds all the
|
||||
* configuration information. When the module loads, we read this information
|
||||
* from /etc/zfs/zpool.cache and populate the SPA namespace. This namespace is
|
||||
* maintained independently in spa.c. Whenever the namespace is modified, or
|
||||
* the configuration of a pool is changed, we call spa_config_sync(), which
|
||||
* walks through all the active pools and writes the configuration to disk.
|
||||
*/
|
||||
|
||||
static uint64_t spa_config_generation = 1;
|
||||
|
||||
/*
|
||||
* This can be overridden in userland to preserve an alternate namespace for
|
||||
* userland pools when doing testing.
|
||||
*/
|
||||
const char *spa_config_path = ZPOOL_CACHE;
|
||||
|
||||
/*
|
||||
* Called when the module is first loaded, this routine loads the configuration
|
||||
* file into the SPA namespace. It does not actually open or load the pools; it
|
||||
* only populates the namespace.
|
||||
*/
|
||||
void
|
||||
spa_config_load(void)
|
||||
{
|
||||
void *buf = NULL;
|
||||
nvlist_t *nvlist, *child;
|
||||
nvpair_t *nvpair;
|
||||
char *pathname;
|
||||
struct _buf *file;
|
||||
uint64_t fsize;
|
||||
|
||||
/*
|
||||
* Open the configuration file.
|
||||
*/
|
||||
pathname = kmem_alloc(MAXPATHLEN, KM_SLEEP);
|
||||
|
||||
(void) snprintf(pathname, MAXPATHLEN, "%s%s",
|
||||
(rootdir != NULL) ? "./" : "", spa_config_path);
|
||||
|
||||
file = kobj_open_file(pathname);
|
||||
|
||||
kmem_free(pathname, MAXPATHLEN);
|
||||
|
||||
if (file == (struct _buf *)-1)
|
||||
return;
|
||||
|
||||
if (kobj_get_filesize(file, &fsize) != 0)
|
||||
goto out;
|
||||
|
||||
buf = kmem_alloc(fsize, KM_SLEEP);
|
||||
|
||||
/*
|
||||
* Read the nvlist from the file.
|
||||
*/
|
||||
if (kobj_read_file(file, buf, fsize, 0) < 0)
|
||||
goto out;
|
||||
|
||||
/*
|
||||
* Unpack the nvlist.
|
||||
*/
|
||||
if (nvlist_unpack(buf, fsize, &nvlist, KM_SLEEP) != 0)
|
||||
goto out;
|
||||
|
||||
/*
|
||||
* Iterate over all elements in the nvlist, creating a new spa_t for
|
||||
* each one with the specified configuration.
|
||||
*/
|
||||
mutex_enter(&spa_namespace_lock);
|
||||
nvpair = NULL;
|
||||
while ((nvpair = nvlist_next_nvpair(nvlist, nvpair)) != NULL) {
|
||||
if (nvpair_type(nvpair) != DATA_TYPE_NVLIST)
|
||||
continue;
|
||||
|
||||
VERIFY(nvpair_value_nvlist(nvpair, &child) == 0);
|
||||
|
||||
if (spa_lookup(nvpair_name(nvpair)) != NULL)
|
||||
continue;
|
||||
(void) spa_add(nvpair_name(nvpair), child, NULL);
|
||||
}
|
||||
mutex_exit(&spa_namespace_lock);
|
||||
|
||||
nvlist_free(nvlist);
|
||||
|
||||
out:
|
||||
if (buf != NULL)
|
||||
kmem_free(buf, fsize);
|
||||
|
||||
kobj_close_file(file);
|
||||
}
|
||||
|
||||
static void
|
||||
spa_config_write(spa_config_dirent_t *dp, nvlist_t *nvl)
|
||||
{
|
||||
size_t buflen;
|
||||
char *buf;
|
||||
vnode_t *vp;
|
||||
int oflags = FWRITE | FTRUNC | FCREAT | FOFFMAX;
|
||||
char *temp;
|
||||
|
||||
/*
|
||||
* If the nvlist is empty (NULL), then remove the old cachefile.
|
||||
*/
|
||||
if (nvl == NULL) {
|
||||
(void) vn_remove(dp->scd_path, UIO_SYSSPACE, RMFILE);
|
||||
return;
|
||||
}
|
||||
|
||||
/*
|
||||
* Pack the configuration into a buffer.
|
||||
*/
|
||||
VERIFY(nvlist_size(nvl, &buflen, NV_ENCODE_XDR) == 0);
|
||||
|
||||
buf = kmem_alloc(buflen, KM_SLEEP);
|
||||
temp = kmem_zalloc(MAXPATHLEN, KM_SLEEP);
|
||||
|
||||
VERIFY(nvlist_pack(nvl, &buf, &buflen, NV_ENCODE_XDR,
|
||||
KM_SLEEP) == 0);
|
||||
|
||||
/*
|
||||
* Write the configuration to disk. We need to do the traditional
|
||||
* 'write to temporary file, sync, move over original' to make sure we
|
||||
* always have a consistent view of the data.
|
||||
*/
|
||||
(void) snprintf(temp, MAXPATHLEN, "%s.tmp", dp->scd_path);
|
||||
|
||||
if (vn_open(temp, UIO_SYSSPACE, oflags, 0644, &vp, CRCREAT, 0) == 0) {
|
||||
if (vn_rdwr(UIO_WRITE, vp, buf, buflen, 0, UIO_SYSSPACE,
|
||||
0, RLIM64_INFINITY, kcred, NULL) == 0 &&
|
||||
VOP_FSYNC(vp, FSYNC, kcred, NULL) == 0) {
|
||||
(void) vn_rename(temp, dp->scd_path, UIO_SYSSPACE);
|
||||
}
|
||||
(void) VOP_CLOSE(vp, oflags, 1, 0, kcred, NULL);
|
||||
VN_RELE(vp);
|
||||
}
|
||||
|
||||
(void) vn_remove(temp, UIO_SYSSPACE, RMFILE);
|
||||
|
||||
kmem_free(buf, buflen);
|
||||
kmem_free(temp, MAXPATHLEN);
|
||||
}
|
||||
|
||||
/*
|
||||
* Synchronize pool configuration to disk. This must be called with the
|
||||
* namespace lock held.
|
||||
*/
|
||||
void
|
||||
spa_config_sync(spa_t *target, boolean_t removing, boolean_t postsysevent)
|
||||
{
|
||||
spa_config_dirent_t *dp, *tdp;
|
||||
nvlist_t *nvl;
|
||||
|
||||
ASSERT(MUTEX_HELD(&spa_namespace_lock));
|
||||
|
||||
if (rootdir == NULL || !(spa_mode_global & FWRITE))
|
||||
return;
|
||||
|
||||
/*
|
||||
* Iterate over all cachefiles for the pool, past or present. When the
|
||||
* cachefile is changed, the new one is pushed onto this list, allowing
|
||||
* us to update previous cachefiles that no longer contain this pool.
|
||||
*/
|
||||
for (dp = list_head(&target->spa_config_list); dp != NULL;
|
||||
dp = list_next(&target->spa_config_list, dp)) {
|
||||
spa_t *spa = NULL;
|
||||
if (dp->scd_path == NULL)
|
||||
continue;
|
||||
|
||||
/*
|
||||
* Iterate over all pools, adding any matching pools to 'nvl'.
|
||||
*/
|
||||
nvl = NULL;
|
||||
while ((spa = spa_next(spa)) != NULL) {
|
||||
if (spa == target && removing)
|
||||
continue;
|
||||
|
||||
mutex_enter(&spa->spa_props_lock);
|
||||
tdp = list_head(&spa->spa_config_list);
|
||||
if (spa->spa_config == NULL ||
|
||||
tdp->scd_path == NULL ||
|
||||
strcmp(tdp->scd_path, dp->scd_path) != 0) {
|
||||
mutex_exit(&spa->spa_props_lock);
|
||||
continue;
|
||||
}
|
||||
|
||||
if (nvl == NULL)
|
||||
VERIFY(nvlist_alloc(&nvl, NV_UNIQUE_NAME,
|
||||
KM_SLEEP) == 0);
|
||||
|
||||
VERIFY(nvlist_add_nvlist(nvl, spa->spa_name,
|
||||
spa->spa_config) == 0);
|
||||
mutex_exit(&spa->spa_props_lock);
|
||||
}
|
||||
|
||||
spa_config_write(dp, nvl);
|
||||
nvlist_free(nvl);
|
||||
}
|
||||
|
||||
/*
|
||||
* Remove any config entries older than the current one.
|
||||
*/
|
||||
dp = list_head(&target->spa_config_list);
|
||||
while ((tdp = list_next(&target->spa_config_list, dp)) != NULL) {
|
||||
list_remove(&target->spa_config_list, tdp);
|
||||
if (tdp->scd_path != NULL)
|
||||
spa_strfree(tdp->scd_path);
|
||||
kmem_free(tdp, sizeof (spa_config_dirent_t));
|
||||
}
|
||||
|
||||
spa_config_generation++;
|
||||
|
||||
if (postsysevent)
|
||||
spa_event_notify(target, NULL, ESC_ZFS_CONFIG_SYNC);
|
||||
}
|
||||
|
||||
/*
|
||||
* Sigh. Inside a local zone, we don't have access to /etc/zfs/zpool.cache,
|
||||
* and we don't want to allow the local zone to see all the pools anyway.
|
||||
* So we have to invent the ZFS_IOC_CONFIG ioctl to grab the configuration
|
||||
* information for all pool visible within the zone.
|
||||
*/
|
||||
nvlist_t *
|
||||
spa_all_configs(uint64_t *generation)
|
||||
{
|
||||
nvlist_t *pools;
|
||||
spa_t *spa = NULL;
|
||||
|
||||
if (*generation == spa_config_generation)
|
||||
return (NULL);
|
||||
|
||||
VERIFY(nvlist_alloc(&pools, NV_UNIQUE_NAME, KM_SLEEP) == 0);
|
||||
|
||||
mutex_enter(&spa_namespace_lock);
|
||||
while ((spa = spa_next(spa)) != NULL) {
|
||||
if (INGLOBALZONE(curproc) ||
|
||||
zone_dataset_visible(spa_name(spa), NULL)) {
|
||||
mutex_enter(&spa->spa_props_lock);
|
||||
VERIFY(nvlist_add_nvlist(pools, spa_name(spa),
|
||||
spa->spa_config) == 0);
|
||||
mutex_exit(&spa->spa_props_lock);
|
||||
}
|
||||
}
|
||||
*generation = spa_config_generation;
|
||||
mutex_exit(&spa_namespace_lock);
|
||||
|
||||
return (pools);
|
||||
}
|
||||
|
||||
void
|
||||
spa_config_set(spa_t *spa, nvlist_t *config)
|
||||
{
|
||||
mutex_enter(&spa->spa_props_lock);
|
||||
if (spa->spa_config != NULL)
|
||||
nvlist_free(spa->spa_config);
|
||||
spa->spa_config = config;
|
||||
mutex_exit(&spa->spa_props_lock);
|
||||
}
|
||||
|
||||
/*
|
||||
* Generate the pool's configuration based on the current in-core state.
|
||||
* We infer whether to generate a complete config or just one top-level config
|
||||
* based on whether vd is the root vdev.
|
||||
*/
|
||||
nvlist_t *
|
||||
spa_config_generate(spa_t *spa, vdev_t *vd, uint64_t txg, int getstats)
|
||||
{
|
||||
nvlist_t *config, *nvroot;
|
||||
vdev_t *rvd = spa->spa_root_vdev;
|
||||
unsigned long hostid = 0;
|
||||
boolean_t locked = B_FALSE;
|
||||
uint64_t split_guid;
|
||||
|
||||
if (vd == NULL) {
|
||||
vd = rvd;
|
||||
locked = B_TRUE;
|
||||
spa_config_enter(spa, SCL_CONFIG | SCL_STATE, FTAG, RW_READER);
|
||||
}
|
||||
|
||||
ASSERT(spa_config_held(spa, SCL_CONFIG | SCL_STATE, RW_READER) ==
|
||||
(SCL_CONFIG | SCL_STATE));
|
||||
|
||||
/*
|
||||
* If txg is -1, report the current value of spa->spa_config_txg.
|
||||
*/
|
||||
if (txg == -1ULL)
|
||||
txg = spa->spa_config_txg;
|
||||
|
||||
VERIFY(nvlist_alloc(&config, NV_UNIQUE_NAME, KM_SLEEP) == 0);
|
||||
|
||||
VERIFY(nvlist_add_uint64(config, ZPOOL_CONFIG_VERSION,
|
||||
spa_version(spa)) == 0);
|
||||
VERIFY(nvlist_add_string(config, ZPOOL_CONFIG_POOL_NAME,
|
||||
spa_name(spa)) == 0);
|
||||
VERIFY(nvlist_add_uint64(config, ZPOOL_CONFIG_POOL_STATE,
|
||||
spa_state(spa)) == 0);
|
||||
VERIFY(nvlist_add_uint64(config, ZPOOL_CONFIG_POOL_TXG,
|
||||
txg) == 0);
|
||||
VERIFY(nvlist_add_uint64(config, ZPOOL_CONFIG_POOL_GUID,
|
||||
spa_guid(spa)) == 0);
|
||||
#ifdef _KERNEL
|
||||
hostid = zone_get_hostid(NULL);
|
||||
#else /* _KERNEL */
|
||||
/*
|
||||
* We're emulating the system's hostid in userland, so we can't use
|
||||
* zone_get_hostid().
|
||||
*/
|
||||
(void) ddi_strtoul(hw_serial, NULL, 10, &hostid);
|
||||
#endif /* _KERNEL */
|
||||
if (hostid != 0) {
|
||||
VERIFY(nvlist_add_uint64(config, ZPOOL_CONFIG_HOSTID,
|
||||
hostid) == 0);
|
||||
}
|
||||
VERIFY(nvlist_add_string(config, ZPOOL_CONFIG_HOSTNAME,
|
||||
utsname.nodename) == 0);
|
||||
|
||||
if (vd != rvd) {
|
||||
VERIFY(nvlist_add_uint64(config, ZPOOL_CONFIG_TOP_GUID,
|
||||
vd->vdev_top->vdev_guid) == 0);
|
||||
VERIFY(nvlist_add_uint64(config, ZPOOL_CONFIG_GUID,
|
||||
vd->vdev_guid) == 0);
|
||||
if (vd->vdev_isspare)
|
||||
VERIFY(nvlist_add_uint64(config, ZPOOL_CONFIG_IS_SPARE,
|
||||
1ULL) == 0);
|
||||
if (vd->vdev_islog)
|
||||
VERIFY(nvlist_add_uint64(config, ZPOOL_CONFIG_IS_LOG,
|
||||
1ULL) == 0);
|
||||
vd = vd->vdev_top; /* label contains top config */
|
||||
} else {
|
||||
/*
|
||||
* Only add the (potentially large) split information
|
||||
* in the mos config, and not in the vdev labels
|
||||
*/
|
||||
if (spa->spa_config_splitting != NULL)
|
||||
VERIFY(nvlist_add_nvlist(config, ZPOOL_CONFIG_SPLIT,
|
||||
spa->spa_config_splitting) == 0);
|
||||
}
|
||||
|
||||
/*
|
||||
* Add the top-level config. We even add this on pools which
|
||||
* don't support holes in the namespace.
|
||||
*/
|
||||
vdev_top_config_generate(spa, config);
|
||||
|
||||
/*
|
||||
* If we're splitting, record the original pool's guid.
|
||||
*/
|
||||
if (spa->spa_config_splitting != NULL &&
|
||||
nvlist_lookup_uint64(spa->spa_config_splitting,
|
||||
ZPOOL_CONFIG_SPLIT_GUID, &split_guid) == 0) {
|
||||
VERIFY(nvlist_add_uint64(config, ZPOOL_CONFIG_SPLIT_GUID,
|
||||
split_guid) == 0);
|
||||
}
|
||||
|
||||
nvroot = vdev_config_generate(spa, vd, getstats, 0);
|
||||
VERIFY(nvlist_add_nvlist(config, ZPOOL_CONFIG_VDEV_TREE, nvroot) == 0);
|
||||
nvlist_free(nvroot);
|
||||
|
||||
if (getstats && spa_load_state(spa) == SPA_LOAD_NONE) {
|
||||
ddt_histogram_t *ddh;
|
||||
ddt_stat_t *dds;
|
||||
ddt_object_t *ddo;
|
||||
|
||||
ddh = kmem_zalloc(sizeof (ddt_histogram_t), KM_SLEEP);
|
||||
ddt_get_dedup_histogram(spa, ddh);
|
||||
VERIFY(nvlist_add_uint64_array(config,
|
||||
ZPOOL_CONFIG_DDT_HISTOGRAM,
|
||||
(uint64_t *)ddh, sizeof (*ddh) / sizeof (uint64_t)) == 0);
|
||||
kmem_free(ddh, sizeof (ddt_histogram_t));
|
||||
|
||||
ddo = kmem_zalloc(sizeof (ddt_object_t), KM_SLEEP);
|
||||
ddt_get_dedup_object_stats(spa, ddo);
|
||||
VERIFY(nvlist_add_uint64_array(config,
|
||||
ZPOOL_CONFIG_DDT_OBJ_STATS,
|
||||
(uint64_t *)ddo, sizeof (*ddo) / sizeof (uint64_t)) == 0);
|
||||
kmem_free(ddo, sizeof (ddt_object_t));
|
||||
|
||||
dds = kmem_zalloc(sizeof (ddt_stat_t), KM_SLEEP);
|
||||
ddt_get_dedup_stats(spa, dds);
|
||||
VERIFY(nvlist_add_uint64_array(config,
|
||||
ZPOOL_CONFIG_DDT_STATS,
|
||||
(uint64_t *)dds, sizeof (*dds) / sizeof (uint64_t)) == 0);
|
||||
kmem_free(dds, sizeof (ddt_stat_t));
|
||||
}
|
||||
|
||||
if (locked)
|
||||
spa_config_exit(spa, SCL_CONFIG | SCL_STATE, FTAG);
|
||||
|
||||
return (config);
|
||||
}
|
||||
|
||||
/*
|
||||
* Update all disk labels, generate a fresh config based on the current
|
||||
* in-core state, and sync the global config cache (do not sync the config
|
||||
* cache if this is a booting rootpool).
|
||||
*/
|
||||
void
|
||||
spa_config_update(spa_t *spa, int what)
|
||||
{
|
||||
vdev_t *rvd = spa->spa_root_vdev;
|
||||
uint64_t txg;
|
||||
int c;
|
||||
|
||||
ASSERT(MUTEX_HELD(&spa_namespace_lock));
|
||||
|
||||
spa_config_enter(spa, SCL_ALL, FTAG, RW_WRITER);
|
||||
txg = spa_last_synced_txg(spa) + 1;
|
||||
if (what == SPA_CONFIG_UPDATE_POOL) {
|
||||
vdev_config_dirty(rvd);
|
||||
} else {
|
||||
/*
|
||||
* If we have top-level vdevs that were added but have
|
||||
* not yet been prepared for allocation, do that now.
|
||||
* (It's safe now because the config cache is up to date,
|
||||
* so it will be able to translate the new DVAs.)
|
||||
* See comments in spa_vdev_add() for full details.
|
||||
*/
|
||||
for (c = 0; c < rvd->vdev_children; c++) {
|
||||
vdev_t *tvd = rvd->vdev_child[c];
|
||||
if (tvd->vdev_ms_array == 0)
|
||||
vdev_metaslab_set_size(tvd);
|
||||
vdev_expand(tvd, txg);
|
||||
}
|
||||
}
|
||||
spa_config_exit(spa, SCL_ALL, FTAG);
|
||||
|
||||
/*
|
||||
* Wait for the mosconfig to be regenerated and synced.
|
||||
*/
|
||||
txg_wait_synced(spa->spa_dsl_pool, txg);
|
||||
|
||||
/*
|
||||
* Update the global config cache to reflect the new mosconfig.
|
||||
*/
|
||||
if (!spa->spa_is_root)
|
||||
spa_config_sync(spa, B_FALSE, what != SPA_CONFIG_UPDATE_POOL);
|
||||
|
||||
if (what == SPA_CONFIG_UPDATE_POOL)
|
||||
spa_config_update(spa, SPA_CONFIG_UPDATE_VDEVS);
|
||||
}
|
403
uts/common/fs/zfs/spa_errlog.c
Normal file
403
uts/common/fs/zfs/spa_errlog.c
Normal file
@ -0,0 +1,403 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2006, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
/*
|
||||
* Routines to manage the on-disk persistent error log.
|
||||
*
|
||||
* Each pool stores a log of all logical data errors seen during normal
|
||||
* operation. This is actually the union of two distinct logs: the last log,
|
||||
* and the current log. All errors seen are logged to the current log. When a
|
||||
* scrub completes, the current log becomes the last log, the last log is thrown
|
||||
* out, and the current log is reinitialized. This way, if an error is somehow
|
||||
* corrected, a new scrub will show that that it no longer exists, and will be
|
||||
* deleted from the log when the scrub completes.
|
||||
*
|
||||
* The log is stored using a ZAP object whose key is a string form of the
|
||||
* zbookmark tuple (objset, object, level, blkid), and whose contents is an
|
||||
* optional 'objset:object' human-readable string describing the data. When an
|
||||
* error is first logged, this string will be empty, indicating that no name is
|
||||
* known. This prevents us from having to issue a potentially large amount of
|
||||
* I/O to discover the object name during an error path. Instead, we do the
|
||||
* calculation when the data is requested, storing the result so future queries
|
||||
* will be faster.
|
||||
*
|
||||
* This log is then shipped into an nvlist where the key is the dataset name and
|
||||
* the value is the object name. Userland is then responsible for uniquifying
|
||||
* this list and displaying it to the user.
|
||||
*/
|
||||
|
||||
#include <sys/dmu_tx.h>
|
||||
#include <sys/spa.h>
|
||||
#include <sys/spa_impl.h>
|
||||
#include <sys/zap.h>
|
||||
#include <sys/zio.h>
|
||||
|
||||
|
||||
/*
|
||||
* Convert a bookmark to a string.
|
||||
*/
|
||||
static void
|
||||
bookmark_to_name(zbookmark_t *zb, char *buf, size_t len)
|
||||
{
|
||||
(void) snprintf(buf, len, "%llx:%llx:%llx:%llx",
|
||||
(u_longlong_t)zb->zb_objset, (u_longlong_t)zb->zb_object,
|
||||
(u_longlong_t)zb->zb_level, (u_longlong_t)zb->zb_blkid);
|
||||
}
|
||||
|
||||
/*
|
||||
* Convert a string to a bookmark
|
||||
*/
|
||||
#ifdef _KERNEL
|
||||
static void
|
||||
name_to_bookmark(char *buf, zbookmark_t *zb)
|
||||
{
|
||||
zb->zb_objset = strtonum(buf, &buf);
|
||||
ASSERT(*buf == ':');
|
||||
zb->zb_object = strtonum(buf + 1, &buf);
|
||||
ASSERT(*buf == ':');
|
||||
zb->zb_level = (int)strtonum(buf + 1, &buf);
|
||||
ASSERT(*buf == ':');
|
||||
zb->zb_blkid = strtonum(buf + 1, &buf);
|
||||
ASSERT(*buf == '\0');
|
||||
}
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Log an uncorrectable error to the persistent error log. We add it to the
|
||||
* spa's list of pending errors. The changes are actually synced out to disk
|
||||
* during spa_errlog_sync().
|
||||
*/
|
||||
void
|
||||
spa_log_error(spa_t *spa, zio_t *zio)
|
||||
{
|
||||
zbookmark_t *zb = &zio->io_logical->io_bookmark;
|
||||
spa_error_entry_t search;
|
||||
spa_error_entry_t *new;
|
||||
avl_tree_t *tree;
|
||||
avl_index_t where;
|
||||
|
||||
/*
|
||||
* If we are trying to import a pool, ignore any errors, as we won't be
|
||||
* writing to the pool any time soon.
|
||||
*/
|
||||
if (spa_load_state(spa) == SPA_LOAD_TRYIMPORT)
|
||||
return;
|
||||
|
||||
mutex_enter(&spa->spa_errlist_lock);
|
||||
|
||||
/*
|
||||
* If we have had a request to rotate the log, log it to the next list
|
||||
* instead of the current one.
|
||||
*/
|
||||
if (spa->spa_scrub_active || spa->spa_scrub_finished)
|
||||
tree = &spa->spa_errlist_scrub;
|
||||
else
|
||||
tree = &spa->spa_errlist_last;
|
||||
|
||||
search.se_bookmark = *zb;
|
||||
if (avl_find(tree, &search, &where) != NULL) {
|
||||
mutex_exit(&spa->spa_errlist_lock);
|
||||
return;
|
||||
}
|
||||
|
||||
new = kmem_zalloc(sizeof (spa_error_entry_t), KM_SLEEP);
|
||||
new->se_bookmark = *zb;
|
||||
avl_insert(tree, new, where);
|
||||
|
||||
mutex_exit(&spa->spa_errlist_lock);
|
||||
}
|
||||
|
||||
/*
|
||||
* Return the number of errors currently in the error log. This is actually the
|
||||
* sum of both the last log and the current log, since we don't know the union
|
||||
* of these logs until we reach userland.
|
||||
*/
|
||||
uint64_t
|
||||
spa_get_errlog_size(spa_t *spa)
|
||||
{
|
||||
uint64_t total = 0, count;
|
||||
|
||||
mutex_enter(&spa->spa_errlog_lock);
|
||||
if (spa->spa_errlog_scrub != 0 &&
|
||||
zap_count(spa->spa_meta_objset, spa->spa_errlog_scrub,
|
||||
&count) == 0)
|
||||
total += count;
|
||||
|
||||
if (spa->spa_errlog_last != 0 && !spa->spa_scrub_finished &&
|
||||
zap_count(spa->spa_meta_objset, spa->spa_errlog_last,
|
||||
&count) == 0)
|
||||
total += count;
|
||||
mutex_exit(&spa->spa_errlog_lock);
|
||||
|
||||
mutex_enter(&spa->spa_errlist_lock);
|
||||
total += avl_numnodes(&spa->spa_errlist_last);
|
||||
total += avl_numnodes(&spa->spa_errlist_scrub);
|
||||
mutex_exit(&spa->spa_errlist_lock);
|
||||
|
||||
return (total);
|
||||
}
|
||||
|
||||
#ifdef _KERNEL
|
||||
static int
|
||||
process_error_log(spa_t *spa, uint64_t obj, void *addr, size_t *count)
|
||||
{
|
||||
zap_cursor_t zc;
|
||||
zap_attribute_t za;
|
||||
zbookmark_t zb;
|
||||
|
||||
if (obj == 0)
|
||||
return (0);
|
||||
|
||||
for (zap_cursor_init(&zc, spa->spa_meta_objset, obj);
|
||||
zap_cursor_retrieve(&zc, &za) == 0;
|
||||
zap_cursor_advance(&zc)) {
|
||||
|
||||
if (*count == 0) {
|
||||
zap_cursor_fini(&zc);
|
||||
return (ENOMEM);
|
||||
}
|
||||
|
||||
name_to_bookmark(za.za_name, &zb);
|
||||
|
||||
if (copyout(&zb, (char *)addr +
|
||||
(*count - 1) * sizeof (zbookmark_t),
|
||||
sizeof (zbookmark_t)) != 0)
|
||||
return (EFAULT);
|
||||
|
||||
*count -= 1;
|
||||
}
|
||||
|
||||
zap_cursor_fini(&zc);
|
||||
|
||||
return (0);
|
||||
}
|
||||
|
||||
static int
|
||||
process_error_list(avl_tree_t *list, void *addr, size_t *count)
|
||||
{
|
||||
spa_error_entry_t *se;
|
||||
|
||||
for (se = avl_first(list); se != NULL; se = AVL_NEXT(list, se)) {
|
||||
|
||||
if (*count == 0)
|
||||
return (ENOMEM);
|
||||
|
||||
if (copyout(&se->se_bookmark, (char *)addr +
|
||||
(*count - 1) * sizeof (zbookmark_t),
|
||||
sizeof (zbookmark_t)) != 0)
|
||||
return (EFAULT);
|
||||
|
||||
*count -= 1;
|
||||
}
|
||||
|
||||
return (0);
|
||||
}
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Copy all known errors to userland as an array of bookmarks. This is
|
||||
* actually a union of the on-disk last log and current log, as well as any
|
||||
* pending error requests.
|
||||
*
|
||||
* Because the act of reading the on-disk log could cause errors to be
|
||||
* generated, we have two separate locks: one for the error log and one for the
|
||||
* in-core error lists. We only need the error list lock to log and error, so
|
||||
* we grab the error log lock while we read the on-disk logs, and only pick up
|
||||
* the error list lock when we are finished.
|
||||
*/
|
||||
int
|
||||
spa_get_errlog(spa_t *spa, void *uaddr, size_t *count)
|
||||
{
|
||||
int ret = 0;
|
||||
|
||||
#ifdef _KERNEL
|
||||
mutex_enter(&spa->spa_errlog_lock);
|
||||
|
||||
ret = process_error_log(spa, spa->spa_errlog_scrub, uaddr, count);
|
||||
|
||||
if (!ret && !spa->spa_scrub_finished)
|
||||
ret = process_error_log(spa, spa->spa_errlog_last, uaddr,
|
||||
count);
|
||||
|
||||
mutex_enter(&spa->spa_errlist_lock);
|
||||
if (!ret)
|
||||
ret = process_error_list(&spa->spa_errlist_scrub, uaddr,
|
||||
count);
|
||||
if (!ret)
|
||||
ret = process_error_list(&spa->spa_errlist_last, uaddr,
|
||||
count);
|
||||
mutex_exit(&spa->spa_errlist_lock);
|
||||
|
||||
mutex_exit(&spa->spa_errlog_lock);
|
||||
#endif
|
||||
|
||||
return (ret);
|
||||
}
|
||||
|
||||
/*
|
||||
* Called when a scrub completes. This simply set a bit which tells which AVL
|
||||
* tree to add new errors. spa_errlog_sync() is responsible for actually
|
||||
* syncing the changes to the underlying objects.
|
||||
*/
|
||||
void
|
||||
spa_errlog_rotate(spa_t *spa)
|
||||
{
|
||||
mutex_enter(&spa->spa_errlist_lock);
|
||||
spa->spa_scrub_finished = B_TRUE;
|
||||
mutex_exit(&spa->spa_errlist_lock);
|
||||
}
|
||||
|
||||
/*
|
||||
* Discard any pending errors from the spa_t. Called when unloading a faulted
|
||||
* pool, as the errors encountered during the open cannot be synced to disk.
|
||||
*/
|
||||
void
|
||||
spa_errlog_drain(spa_t *spa)
|
||||
{
|
||||
spa_error_entry_t *se;
|
||||
void *cookie;
|
||||
|
||||
mutex_enter(&spa->spa_errlist_lock);
|
||||
|
||||
cookie = NULL;
|
||||
while ((se = avl_destroy_nodes(&spa->spa_errlist_last,
|
||||
&cookie)) != NULL)
|
||||
kmem_free(se, sizeof (spa_error_entry_t));
|
||||
cookie = NULL;
|
||||
while ((se = avl_destroy_nodes(&spa->spa_errlist_scrub,
|
||||
&cookie)) != NULL)
|
||||
kmem_free(se, sizeof (spa_error_entry_t));
|
||||
|
||||
mutex_exit(&spa->spa_errlist_lock);
|
||||
}
|
||||
|
||||
/*
|
||||
* Process a list of errors into the current on-disk log.
|
||||
*/
|
||||
static void
|
||||
sync_error_list(spa_t *spa, avl_tree_t *t, uint64_t *obj, dmu_tx_t *tx)
|
||||
{
|
||||
spa_error_entry_t *se;
|
||||
char buf[64];
|
||||
void *cookie;
|
||||
|
||||
if (avl_numnodes(t) != 0) {
|
||||
/* create log if necessary */
|
||||
if (*obj == 0)
|
||||
*obj = zap_create(spa->spa_meta_objset,
|
||||
DMU_OT_ERROR_LOG, DMU_OT_NONE,
|
||||
0, tx);
|
||||
|
||||
/* add errors to the current log */
|
||||
for (se = avl_first(t); se != NULL; se = AVL_NEXT(t, se)) {
|
||||
char *name = se->se_name ? se->se_name : "";
|
||||
|
||||
bookmark_to_name(&se->se_bookmark, buf, sizeof (buf));
|
||||
|
||||
(void) zap_update(spa->spa_meta_objset,
|
||||
*obj, buf, 1, strlen(name) + 1, name, tx);
|
||||
}
|
||||
|
||||
/* purge the error list */
|
||||
cookie = NULL;
|
||||
while ((se = avl_destroy_nodes(t, &cookie)) != NULL)
|
||||
kmem_free(se, sizeof (spa_error_entry_t));
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Sync the error log out to disk. This is a little tricky because the act of
|
||||
* writing the error log requires the spa_errlist_lock. So, we need to lock the
|
||||
* error lists, take a copy of the lists, and then reinitialize them. Then, we
|
||||
* drop the error list lock and take the error log lock, at which point we
|
||||
* do the errlog processing. Then, if we encounter an I/O error during this
|
||||
* process, we can successfully add the error to the list. Note that this will
|
||||
* result in the perpetual recycling of errors, but it is an unlikely situation
|
||||
* and not a performance critical operation.
|
||||
*/
|
||||
void
|
||||
spa_errlog_sync(spa_t *spa, uint64_t txg)
|
||||
{
|
||||
dmu_tx_t *tx;
|
||||
avl_tree_t scrub, last;
|
||||
int scrub_finished;
|
||||
|
||||
mutex_enter(&spa->spa_errlist_lock);
|
||||
|
||||
/*
|
||||
* Bail out early under normal circumstances.
|
||||
*/
|
||||
if (avl_numnodes(&spa->spa_errlist_scrub) == 0 &&
|
||||
avl_numnodes(&spa->spa_errlist_last) == 0 &&
|
||||
!spa->spa_scrub_finished) {
|
||||
mutex_exit(&spa->spa_errlist_lock);
|
||||
return;
|
||||
}
|
||||
|
||||
spa_get_errlists(spa, &last, &scrub);
|
||||
scrub_finished = spa->spa_scrub_finished;
|
||||
spa->spa_scrub_finished = B_FALSE;
|
||||
|
||||
mutex_exit(&spa->spa_errlist_lock);
|
||||
mutex_enter(&spa->spa_errlog_lock);
|
||||
|
||||
tx = dmu_tx_create_assigned(spa->spa_dsl_pool, txg);
|
||||
|
||||
/*
|
||||
* Sync out the current list of errors.
|
||||
*/
|
||||
sync_error_list(spa, &last, &spa->spa_errlog_last, tx);
|
||||
|
||||
/*
|
||||
* Rotate the log if necessary.
|
||||
*/
|
||||
if (scrub_finished) {
|
||||
if (spa->spa_errlog_last != 0)
|
||||
VERIFY(dmu_object_free(spa->spa_meta_objset,
|
||||
spa->spa_errlog_last, tx) == 0);
|
||||
spa->spa_errlog_last = spa->spa_errlog_scrub;
|
||||
spa->spa_errlog_scrub = 0;
|
||||
|
||||
sync_error_list(spa, &scrub, &spa->spa_errlog_last, tx);
|
||||
}
|
||||
|
||||
/*
|
||||
* Sync out any pending scrub errors.
|
||||
*/
|
||||
sync_error_list(spa, &scrub, &spa->spa_errlog_scrub, tx);
|
||||
|
||||
/*
|
||||
* Update the MOS to reflect the new values.
|
||||
*/
|
||||
(void) zap_update(spa->spa_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
|
||||
DMU_POOL_ERRLOG_LAST, sizeof (uint64_t), 1,
|
||||
&spa->spa_errlog_last, tx);
|
||||
(void) zap_update(spa->spa_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
|
||||
DMU_POOL_ERRLOG_SCRUB, sizeof (uint64_t), 1,
|
||||
&spa->spa_errlog_scrub, tx);
|
||||
|
||||
dmu_tx_commit(tx);
|
||||
|
||||
mutex_exit(&spa->spa_errlog_lock);
|
||||
}
|
502
uts/common/fs/zfs/spa_history.c
Normal file
502
uts/common/fs/zfs/spa_history.c
Normal file
@ -0,0 +1,502 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
|
||||
/*
|
||||
* Copyright (c) 2006, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#include <sys/spa.h>
|
||||
#include <sys/spa_impl.h>
|
||||
#include <sys/zap.h>
|
||||
#include <sys/dsl_synctask.h>
|
||||
#include <sys/dmu_tx.h>
|
||||
#include <sys/dmu_objset.h>
|
||||
#include <sys/utsname.h>
|
||||
#include <sys/cmn_err.h>
|
||||
#include <sys/sunddi.h>
|
||||
#include "zfs_comutil.h"
|
||||
#ifdef _KERNEL
|
||||
#include <sys/zone.h>
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Routines to manage the on-disk history log.
|
||||
*
|
||||
* The history log is stored as a dmu object containing
|
||||
* <packed record length, record nvlist> tuples.
|
||||
*
|
||||
* Where "record nvlist" is a nvlist containing uint64_ts and strings, and
|
||||
* "packed record length" is the packed length of the "record nvlist" stored
|
||||
* as a little endian uint64_t.
|
||||
*
|
||||
* The log is implemented as a ring buffer, though the original creation
|
||||
* of the pool ('zpool create') is never overwritten.
|
||||
*
|
||||
* The history log is tracked as object 'spa_t::spa_history'. The bonus buffer
|
||||
* of 'spa_history' stores the offsets for logging/retrieving history as
|
||||
* 'spa_history_phys_t'. 'sh_pool_create_len' is the ending offset in bytes of
|
||||
* where the 'zpool create' record is stored. This allows us to never
|
||||
* overwrite the original creation of the pool. 'sh_phys_max_off' is the
|
||||
* physical ending offset in bytes of the log. This tells you the length of
|
||||
* the buffer. 'sh_eof' is the logical EOF (in bytes). Whenever a record
|
||||
* is added, 'sh_eof' is incremented by the the size of the record.
|
||||
* 'sh_eof' is never decremented. 'sh_bof' is the logical BOF (in bytes).
|
||||
* This is where the consumer should start reading from after reading in
|
||||
* the 'zpool create' portion of the log.
|
||||
*
|
||||
* 'sh_records_lost' keeps track of how many records have been overwritten
|
||||
* and permanently lost.
|
||||
*/
|
||||
|
||||
/* convert a logical offset to physical */
|
||||
static uint64_t
|
||||
spa_history_log_to_phys(uint64_t log_off, spa_history_phys_t *shpp)
|
||||
{
|
||||
uint64_t phys_len;
|
||||
|
||||
phys_len = shpp->sh_phys_max_off - shpp->sh_pool_create_len;
|
||||
return ((log_off - shpp->sh_pool_create_len) % phys_len
|
||||
+ shpp->sh_pool_create_len);
|
||||
}
|
||||
|
||||
void
|
||||
spa_history_create_obj(spa_t *spa, dmu_tx_t *tx)
|
||||
{
|
||||
dmu_buf_t *dbp;
|
||||
spa_history_phys_t *shpp;
|
||||
objset_t *mos = spa->spa_meta_objset;
|
||||
|
||||
ASSERT(spa->spa_history == 0);
|
||||
spa->spa_history = dmu_object_alloc(mos, DMU_OT_SPA_HISTORY,
|
||||
SPA_MAXBLOCKSIZE, DMU_OT_SPA_HISTORY_OFFSETS,
|
||||
sizeof (spa_history_phys_t), tx);
|
||||
|
||||
VERIFY(zap_add(mos, DMU_POOL_DIRECTORY_OBJECT,
|
||||
DMU_POOL_HISTORY, sizeof (uint64_t), 1,
|
||||
&spa->spa_history, tx) == 0);
|
||||
|
||||
VERIFY(0 == dmu_bonus_hold(mos, spa->spa_history, FTAG, &dbp));
|
||||
ASSERT(dbp->db_size >= sizeof (spa_history_phys_t));
|
||||
|
||||
shpp = dbp->db_data;
|
||||
dmu_buf_will_dirty(dbp, tx);
|
||||
|
||||
/*
|
||||
* Figure out maximum size of history log. We set it at
|
||||
* 1% of pool size, with a max of 32MB and min of 128KB.
|
||||
*/
|
||||
shpp->sh_phys_max_off =
|
||||
metaslab_class_get_dspace(spa_normal_class(spa)) / 100;
|
||||
shpp->sh_phys_max_off = MIN(shpp->sh_phys_max_off, 32<<20);
|
||||
shpp->sh_phys_max_off = MAX(shpp->sh_phys_max_off, 128<<10);
|
||||
|
||||
dmu_buf_rele(dbp, FTAG);
|
||||
}
|
||||
|
||||
/*
|
||||
* Change 'sh_bof' to the beginning of the next record.
|
||||
*/
|
||||
static int
|
||||
spa_history_advance_bof(spa_t *spa, spa_history_phys_t *shpp)
|
||||
{
|
||||
objset_t *mos = spa->spa_meta_objset;
|
||||
uint64_t firstread, reclen, phys_bof;
|
||||
char buf[sizeof (reclen)];
|
||||
int err;
|
||||
|
||||
phys_bof = spa_history_log_to_phys(shpp->sh_bof, shpp);
|
||||
firstread = MIN(sizeof (reclen), shpp->sh_phys_max_off - phys_bof);
|
||||
|
||||
if ((err = dmu_read(mos, spa->spa_history, phys_bof, firstread,
|
||||
buf, DMU_READ_PREFETCH)) != 0)
|
||||
return (err);
|
||||
if (firstread != sizeof (reclen)) {
|
||||
if ((err = dmu_read(mos, spa->spa_history,
|
||||
shpp->sh_pool_create_len, sizeof (reclen) - firstread,
|
||||
buf + firstread, DMU_READ_PREFETCH)) != 0)
|
||||
return (err);
|
||||
}
|
||||
|
||||
reclen = LE_64(*((uint64_t *)buf));
|
||||
shpp->sh_bof += reclen + sizeof (reclen);
|
||||
shpp->sh_records_lost++;
|
||||
return (0);
|
||||
}
|
||||
|
||||
static int
|
||||
spa_history_write(spa_t *spa, void *buf, uint64_t len, spa_history_phys_t *shpp,
|
||||
dmu_tx_t *tx)
|
||||
{
|
||||
uint64_t firstwrite, phys_eof;
|
||||
objset_t *mos = spa->spa_meta_objset;
|
||||
int err;
|
||||
|
||||
ASSERT(MUTEX_HELD(&spa->spa_history_lock));
|
||||
|
||||
/* see if we need to reset logical BOF */
|
||||
while (shpp->sh_phys_max_off - shpp->sh_pool_create_len -
|
||||
(shpp->sh_eof - shpp->sh_bof) <= len) {
|
||||
if ((err = spa_history_advance_bof(spa, shpp)) != 0) {
|
||||
return (err);
|
||||
}
|
||||
}
|
||||
|
||||
phys_eof = spa_history_log_to_phys(shpp->sh_eof, shpp);
|
||||
firstwrite = MIN(len, shpp->sh_phys_max_off - phys_eof);
|
||||
shpp->sh_eof += len;
|
||||
dmu_write(mos, spa->spa_history, phys_eof, firstwrite, buf, tx);
|
||||
|
||||
len -= firstwrite;
|
||||
if (len > 0) {
|
||||
/* write out the rest at the beginning of physical file */
|
||||
dmu_write(mos, spa->spa_history, shpp->sh_pool_create_len,
|
||||
len, (char *)buf + firstwrite, tx);
|
||||
}
|
||||
|
||||
return (0);
|
||||
}
|
||||
|
||||
static char *
|
||||
spa_history_zone()
|
||||
{
|
||||
#ifdef _KERNEL
|
||||
return (curproc->p_zone->zone_name);
|
||||
#else
|
||||
return ("global");
|
||||
#endif
|
||||
}
|
||||
|
||||
/*
|
||||
* Write out a history event.
|
||||
*/
|
||||
/*ARGSUSED*/
|
||||
static void
|
||||
spa_history_log_sync(void *arg1, void *arg2, dmu_tx_t *tx)
|
||||
{
|
||||
spa_t *spa = arg1;
|
||||
history_arg_t *hap = arg2;
|
||||
const char *history_str = hap->ha_history_str;
|
||||
objset_t *mos = spa->spa_meta_objset;
|
||||
dmu_buf_t *dbp;
|
||||
spa_history_phys_t *shpp;
|
||||
size_t reclen;
|
||||
uint64_t le_len;
|
||||
nvlist_t *nvrecord;
|
||||
char *record_packed = NULL;
|
||||
int ret;
|
||||
|
||||
/*
|
||||
* If we have an older pool that doesn't have a command
|
||||
* history object, create it now.
|
||||
*/
|
||||
mutex_enter(&spa->spa_history_lock);
|
||||
if (!spa->spa_history)
|
||||
spa_history_create_obj(spa, tx);
|
||||
mutex_exit(&spa->spa_history_lock);
|
||||
|
||||
/*
|
||||
* Get the offset of where we need to write via the bonus buffer.
|
||||
* Update the offset when the write completes.
|
||||
*/
|
||||
VERIFY(0 == dmu_bonus_hold(mos, spa->spa_history, FTAG, &dbp));
|
||||
shpp = dbp->db_data;
|
||||
|
||||
dmu_buf_will_dirty(dbp, tx);
|
||||
|
||||
#ifdef ZFS_DEBUG
|
||||
{
|
||||
dmu_object_info_t doi;
|
||||
dmu_object_info_from_db(dbp, &doi);
|
||||
ASSERT3U(doi.doi_bonus_type, ==, DMU_OT_SPA_HISTORY_OFFSETS);
|
||||
}
|
||||
#endif
|
||||
|
||||
VERIFY(nvlist_alloc(&nvrecord, NV_UNIQUE_NAME, KM_SLEEP) == 0);
|
||||
VERIFY(nvlist_add_uint64(nvrecord, ZPOOL_HIST_TIME,
|
||||
gethrestime_sec()) == 0);
|
||||
VERIFY(nvlist_add_uint64(nvrecord, ZPOOL_HIST_WHO, hap->ha_uid) == 0);
|
||||
if (hap->ha_zone != NULL)
|
||||
VERIFY(nvlist_add_string(nvrecord, ZPOOL_HIST_ZONE,
|
||||
hap->ha_zone) == 0);
|
||||
#ifdef _KERNEL
|
||||
VERIFY(nvlist_add_string(nvrecord, ZPOOL_HIST_HOST,
|
||||
utsname.nodename) == 0);
|
||||
#endif
|
||||
if (hap->ha_log_type == LOG_CMD_POOL_CREATE ||
|
||||
hap->ha_log_type == LOG_CMD_NORMAL) {
|
||||
VERIFY(nvlist_add_string(nvrecord, ZPOOL_HIST_CMD,
|
||||
history_str) == 0);
|
||||
|
||||
zfs_dbgmsg("command: %s", history_str);
|
||||
} else {
|
||||
VERIFY(nvlist_add_uint64(nvrecord, ZPOOL_HIST_INT_EVENT,
|
||||
hap->ha_event) == 0);
|
||||
VERIFY(nvlist_add_uint64(nvrecord, ZPOOL_HIST_TXG,
|
||||
tx->tx_txg) == 0);
|
||||
VERIFY(nvlist_add_string(nvrecord, ZPOOL_HIST_INT_STR,
|
||||
history_str) == 0);
|
||||
|
||||
zfs_dbgmsg("internal %s pool:%s txg:%llu %s",
|
||||
zfs_history_event_names[hap->ha_event], spa_name(spa),
|
||||
(longlong_t)tx->tx_txg, history_str);
|
||||
|
||||
}
|
||||
|
||||
VERIFY(nvlist_size(nvrecord, &reclen, NV_ENCODE_XDR) == 0);
|
||||
record_packed = kmem_alloc(reclen, KM_SLEEP);
|
||||
|
||||
VERIFY(nvlist_pack(nvrecord, &record_packed, &reclen,
|
||||
NV_ENCODE_XDR, KM_SLEEP) == 0);
|
||||
|
||||
mutex_enter(&spa->spa_history_lock);
|
||||
if (hap->ha_log_type == LOG_CMD_POOL_CREATE)
|
||||
VERIFY(shpp->sh_eof == shpp->sh_pool_create_len);
|
||||
|
||||
/* write out the packed length as little endian */
|
||||
le_len = LE_64((uint64_t)reclen);
|
||||
ret = spa_history_write(spa, &le_len, sizeof (le_len), shpp, tx);
|
||||
if (!ret)
|
||||
ret = spa_history_write(spa, record_packed, reclen, shpp, tx);
|
||||
|
||||
if (!ret && hap->ha_log_type == LOG_CMD_POOL_CREATE) {
|
||||
shpp->sh_pool_create_len += sizeof (le_len) + reclen;
|
||||
shpp->sh_bof = shpp->sh_pool_create_len;
|
||||
}
|
||||
|
||||
mutex_exit(&spa->spa_history_lock);
|
||||
nvlist_free(nvrecord);
|
||||
kmem_free(record_packed, reclen);
|
||||
dmu_buf_rele(dbp, FTAG);
|
||||
|
||||
strfree(hap->ha_history_str);
|
||||
if (hap->ha_zone != NULL)
|
||||
strfree(hap->ha_zone);
|
||||
kmem_free(hap, sizeof (history_arg_t));
|
||||
}
|
||||
|
||||
/*
|
||||
* Write out a history event.
|
||||
*/
|
||||
int
|
||||
spa_history_log(spa_t *spa, const char *history_str, history_log_type_t what)
|
||||
{
|
||||
history_arg_t *ha;
|
||||
int err = 0;
|
||||
dmu_tx_t *tx;
|
||||
|
||||
ASSERT(what != LOG_INTERNAL);
|
||||
|
||||
tx = dmu_tx_create_dd(spa_get_dsl(spa)->dp_mos_dir);
|
||||
err = dmu_tx_assign(tx, TXG_WAIT);
|
||||
if (err) {
|
||||
dmu_tx_abort(tx);
|
||||
return (err);
|
||||
}
|
||||
|
||||
ha = kmem_alloc(sizeof (history_arg_t), KM_SLEEP);
|
||||
ha->ha_history_str = strdup(history_str);
|
||||
ha->ha_zone = strdup(spa_history_zone());
|
||||
ha->ha_log_type = what;
|
||||
ha->ha_uid = crgetuid(CRED());
|
||||
|
||||
/* Kick this off asynchronously; errors are ignored. */
|
||||
dsl_sync_task_do_nowait(spa_get_dsl(spa), NULL,
|
||||
spa_history_log_sync, spa, ha, 0, tx);
|
||||
dmu_tx_commit(tx);
|
||||
|
||||
/* spa_history_log_sync will free ha and strings */
|
||||
return (err);
|
||||
}
|
||||
|
||||
/*
|
||||
* Read out the command history.
|
||||
*/
|
||||
int
|
||||
spa_history_get(spa_t *spa, uint64_t *offp, uint64_t *len, char *buf)
|
||||
{
|
||||
objset_t *mos = spa->spa_meta_objset;
|
||||
dmu_buf_t *dbp;
|
||||
uint64_t read_len, phys_read_off, phys_eof;
|
||||
uint64_t leftover = 0;
|
||||
spa_history_phys_t *shpp;
|
||||
int err;
|
||||
|
||||
/*
|
||||
* If the command history doesn't exist (older pool),
|
||||
* that's ok, just return ENOENT.
|
||||
*/
|
||||
if (!spa->spa_history)
|
||||
return (ENOENT);
|
||||
|
||||
/*
|
||||
* The history is logged asynchronously, so when they request
|
||||
* the first chunk of history, make sure everything has been
|
||||
* synced to disk so that we get it.
|
||||
*/
|
||||
if (*offp == 0 && spa_writeable(spa))
|
||||
txg_wait_synced(spa_get_dsl(spa), 0);
|
||||
|
||||
if ((err = dmu_bonus_hold(mos, spa->spa_history, FTAG, &dbp)) != 0)
|
||||
return (err);
|
||||
shpp = dbp->db_data;
|
||||
|
||||
#ifdef ZFS_DEBUG
|
||||
{
|
||||
dmu_object_info_t doi;
|
||||
dmu_object_info_from_db(dbp, &doi);
|
||||
ASSERT3U(doi.doi_bonus_type, ==, DMU_OT_SPA_HISTORY_OFFSETS);
|
||||
}
|
||||
#endif
|
||||
|
||||
mutex_enter(&spa->spa_history_lock);
|
||||
phys_eof = spa_history_log_to_phys(shpp->sh_eof, shpp);
|
||||
|
||||
if (*offp < shpp->sh_pool_create_len) {
|
||||
/* read in just the zpool create history */
|
||||
phys_read_off = *offp;
|
||||
read_len = MIN(*len, shpp->sh_pool_create_len -
|
||||
phys_read_off);
|
||||
} else {
|
||||
/*
|
||||
* Need to reset passed in offset to BOF if the passed in
|
||||
* offset has since been overwritten.
|
||||
*/
|
||||
*offp = MAX(*offp, shpp->sh_bof);
|
||||
phys_read_off = spa_history_log_to_phys(*offp, shpp);
|
||||
|
||||
/*
|
||||
* Read up to the minimum of what the user passed down or
|
||||
* the EOF (physical or logical). If we hit physical EOF,
|
||||
* use 'leftover' to read from the physical BOF.
|
||||
*/
|
||||
if (phys_read_off <= phys_eof) {
|
||||
read_len = MIN(*len, phys_eof - phys_read_off);
|
||||
} else {
|
||||
read_len = MIN(*len,
|
||||
shpp->sh_phys_max_off - phys_read_off);
|
||||
if (phys_read_off + *len > shpp->sh_phys_max_off) {
|
||||
leftover = MIN(*len - read_len,
|
||||
phys_eof - shpp->sh_pool_create_len);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/* offset for consumer to use next */
|
||||
*offp += read_len + leftover;
|
||||
|
||||
/* tell the consumer how much you actually read */
|
||||
*len = read_len + leftover;
|
||||
|
||||
if (read_len == 0) {
|
||||
mutex_exit(&spa->spa_history_lock);
|
||||
dmu_buf_rele(dbp, FTAG);
|
||||
return (0);
|
||||
}
|
||||
|
||||
err = dmu_read(mos, spa->spa_history, phys_read_off, read_len, buf,
|
||||
DMU_READ_PREFETCH);
|
||||
if (leftover && err == 0) {
|
||||
err = dmu_read(mos, spa->spa_history, shpp->sh_pool_create_len,
|
||||
leftover, buf + read_len, DMU_READ_PREFETCH);
|
||||
}
|
||||
mutex_exit(&spa->spa_history_lock);
|
||||
|
||||
dmu_buf_rele(dbp, FTAG);
|
||||
return (err);
|
||||
}
|
||||
|
||||
static void
|
||||
log_internal(history_internal_events_t event, spa_t *spa,
|
||||
dmu_tx_t *tx, const char *fmt, va_list adx)
|
||||
{
|
||||
history_arg_t *ha;
|
||||
|
||||
/*
|
||||
* If this is part of creating a pool, not everything is
|
||||
* initialized yet, so don't bother logging the internal events.
|
||||
*/
|
||||
if (tx->tx_txg == TXG_INITIAL)
|
||||
return;
|
||||
|
||||
ha = kmem_alloc(sizeof (history_arg_t), KM_SLEEP);
|
||||
ha->ha_history_str = kmem_alloc(vsnprintf(NULL, 0, fmt, adx) + 1,
|
||||
KM_SLEEP);
|
||||
|
||||
(void) vsprintf(ha->ha_history_str, fmt, adx);
|
||||
|
||||
ha->ha_log_type = LOG_INTERNAL;
|
||||
ha->ha_event = event;
|
||||
ha->ha_zone = NULL;
|
||||
ha->ha_uid = 0;
|
||||
|
||||
if (dmu_tx_is_syncing(tx)) {
|
||||
spa_history_log_sync(spa, ha, tx);
|
||||
} else {
|
||||
dsl_sync_task_do_nowait(spa_get_dsl(spa), NULL,
|
||||
spa_history_log_sync, spa, ha, 0, tx);
|
||||
}
|
||||
/* spa_history_log_sync() will free ha and strings */
|
||||
}
|
||||
|
||||
void
|
||||
spa_history_log_internal(history_internal_events_t event, spa_t *spa,
|
||||
dmu_tx_t *tx, const char *fmt, ...)
|
||||
{
|
||||
dmu_tx_t *htx = tx;
|
||||
va_list adx;
|
||||
|
||||
/* create a tx if we didn't get one */
|
||||
if (tx == NULL) {
|
||||
htx = dmu_tx_create_dd(spa_get_dsl(spa)->dp_mos_dir);
|
||||
if (dmu_tx_assign(htx, TXG_WAIT) != 0) {
|
||||
dmu_tx_abort(htx);
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
va_start(adx, fmt);
|
||||
log_internal(event, spa, htx, fmt, adx);
|
||||
va_end(adx);
|
||||
|
||||
/* if we didn't get a tx from the caller, commit the one we made */
|
||||
if (tx == NULL)
|
||||
dmu_tx_commit(htx);
|
||||
}
|
||||
|
||||
void
|
||||
spa_history_log_version(spa_t *spa, history_internal_events_t event)
|
||||
{
|
||||
#ifdef _KERNEL
|
||||
uint64_t current_vers = spa_version(spa);
|
||||
|
||||
if (current_vers >= SPA_VERSION_ZPOOL_HISTORY) {
|
||||
spa_history_log_internal(event, spa, NULL,
|
||||
"pool spa %llu; zfs spa %llu; zpl %d; uts %s %s %s %s",
|
||||
(u_longlong_t)current_vers, SPA_VERSION, ZPL_VERSION,
|
||||
utsname.nodename, utsname.release, utsname.version,
|
||||
utsname.machine);
|
||||
}
|
||||
cmn_err(CE_CONT, "!%s version %llu pool %s using %llu",
|
||||
event == LOG_POOL_IMPORT ? "imported" :
|
||||
event == LOG_POOL_CREATE ? "created" : "accessed",
|
||||
(u_longlong_t)current_vers, spa_name(spa), SPA_VERSION);
|
||||
#endif
|
||||
}
|
1672
uts/common/fs/zfs/spa_misc.c
Normal file
1672
uts/common/fs/zfs/spa_misc.c
Normal file
File diff suppressed because it is too large
Load Diff
616
uts/common/fs/zfs/space_map.c
Normal file
616
uts/common/fs/zfs/space_map.c
Normal file
@ -0,0 +1,616 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/spa.h>
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/zio.h>
|
||||
#include <sys/space_map.h>
|
||||
|
||||
/*
|
||||
* Space map routines.
|
||||
* NOTE: caller is responsible for all locking.
|
||||
*/
|
||||
static int
|
||||
space_map_seg_compare(const void *x1, const void *x2)
|
||||
{
|
||||
const space_seg_t *s1 = x1;
|
||||
const space_seg_t *s2 = x2;
|
||||
|
||||
if (s1->ss_start < s2->ss_start) {
|
||||
if (s1->ss_end > s2->ss_start)
|
||||
return (0);
|
||||
return (-1);
|
||||
}
|
||||
if (s1->ss_start > s2->ss_start) {
|
||||
if (s1->ss_start < s2->ss_end)
|
||||
return (0);
|
||||
return (1);
|
||||
}
|
||||
return (0);
|
||||
}
|
||||
|
||||
void
|
||||
space_map_create(space_map_t *sm, uint64_t start, uint64_t size, uint8_t shift,
|
||||
kmutex_t *lp)
|
||||
{
|
||||
bzero(sm, sizeof (*sm));
|
||||
|
||||
cv_init(&sm->sm_load_cv, NULL, CV_DEFAULT, NULL);
|
||||
|
||||
avl_create(&sm->sm_root, space_map_seg_compare,
|
||||
sizeof (space_seg_t), offsetof(struct space_seg, ss_node));
|
||||
|
||||
sm->sm_start = start;
|
||||
sm->sm_size = size;
|
||||
sm->sm_shift = shift;
|
||||
sm->sm_lock = lp;
|
||||
}
|
||||
|
||||
void
|
||||
space_map_destroy(space_map_t *sm)
|
||||
{
|
||||
ASSERT(!sm->sm_loaded && !sm->sm_loading);
|
||||
VERIFY3U(sm->sm_space, ==, 0);
|
||||
avl_destroy(&sm->sm_root);
|
||||
cv_destroy(&sm->sm_load_cv);
|
||||
}
|
||||
|
||||
void
|
||||
space_map_add(space_map_t *sm, uint64_t start, uint64_t size)
|
||||
{
|
||||
avl_index_t where;
|
||||
space_seg_t ssearch, *ss_before, *ss_after, *ss;
|
||||
uint64_t end = start + size;
|
||||
int merge_before, merge_after;
|
||||
|
||||
ASSERT(MUTEX_HELD(sm->sm_lock));
|
||||
VERIFY(size != 0);
|
||||
VERIFY3U(start, >=, sm->sm_start);
|
||||
VERIFY3U(end, <=, sm->sm_start + sm->sm_size);
|
||||
VERIFY(sm->sm_space + size <= sm->sm_size);
|
||||
VERIFY(P2PHASE(start, 1ULL << sm->sm_shift) == 0);
|
||||
VERIFY(P2PHASE(size, 1ULL << sm->sm_shift) == 0);
|
||||
|
||||
ssearch.ss_start = start;
|
||||
ssearch.ss_end = end;
|
||||
ss = avl_find(&sm->sm_root, &ssearch, &where);
|
||||
|
||||
if (ss != NULL && ss->ss_start <= start && ss->ss_end >= end) {
|
||||
zfs_panic_recover("zfs: allocating allocated segment"
|
||||
"(offset=%llu size=%llu)\n",
|
||||
(longlong_t)start, (longlong_t)size);
|
||||
return;
|
||||
}
|
||||
|
||||
/* Make sure we don't overlap with either of our neighbors */
|
||||
VERIFY(ss == NULL);
|
||||
|
||||
ss_before = avl_nearest(&sm->sm_root, where, AVL_BEFORE);
|
||||
ss_after = avl_nearest(&sm->sm_root, where, AVL_AFTER);
|
||||
|
||||
merge_before = (ss_before != NULL && ss_before->ss_end == start);
|
||||
merge_after = (ss_after != NULL && ss_after->ss_start == end);
|
||||
|
||||
if (merge_before && merge_after) {
|
||||
avl_remove(&sm->sm_root, ss_before);
|
||||
if (sm->sm_pp_root) {
|
||||
avl_remove(sm->sm_pp_root, ss_before);
|
||||
avl_remove(sm->sm_pp_root, ss_after);
|
||||
}
|
||||
ss_after->ss_start = ss_before->ss_start;
|
||||
kmem_free(ss_before, sizeof (*ss_before));
|
||||
ss = ss_after;
|
||||
} else if (merge_before) {
|
||||
ss_before->ss_end = end;
|
||||
if (sm->sm_pp_root)
|
||||
avl_remove(sm->sm_pp_root, ss_before);
|
||||
ss = ss_before;
|
||||
} else if (merge_after) {
|
||||
ss_after->ss_start = start;
|
||||
if (sm->sm_pp_root)
|
||||
avl_remove(sm->sm_pp_root, ss_after);
|
||||
ss = ss_after;
|
||||
} else {
|
||||
ss = kmem_alloc(sizeof (*ss), KM_SLEEP);
|
||||
ss->ss_start = start;
|
||||
ss->ss_end = end;
|
||||
avl_insert(&sm->sm_root, ss, where);
|
||||
}
|
||||
|
||||
if (sm->sm_pp_root)
|
||||
avl_add(sm->sm_pp_root, ss);
|
||||
|
||||
sm->sm_space += size;
|
||||
}
|
||||
|
||||
void
|
||||
space_map_remove(space_map_t *sm, uint64_t start, uint64_t size)
|
||||
{
|
||||
avl_index_t where;
|
||||
space_seg_t ssearch, *ss, *newseg;
|
||||
uint64_t end = start + size;
|
||||
int left_over, right_over;
|
||||
|
||||
ASSERT(MUTEX_HELD(sm->sm_lock));
|
||||
VERIFY(size != 0);
|
||||
VERIFY(P2PHASE(start, 1ULL << sm->sm_shift) == 0);
|
||||
VERIFY(P2PHASE(size, 1ULL << sm->sm_shift) == 0);
|
||||
|
||||
ssearch.ss_start = start;
|
||||
ssearch.ss_end = end;
|
||||
ss = avl_find(&sm->sm_root, &ssearch, &where);
|
||||
|
||||
/* Make sure we completely overlap with someone */
|
||||
if (ss == NULL) {
|
||||
zfs_panic_recover("zfs: freeing free segment "
|
||||
"(offset=%llu size=%llu)",
|
||||
(longlong_t)start, (longlong_t)size);
|
||||
return;
|
||||
}
|
||||
VERIFY3U(ss->ss_start, <=, start);
|
||||
VERIFY3U(ss->ss_end, >=, end);
|
||||
VERIFY(sm->sm_space - size <= sm->sm_size);
|
||||
|
||||
left_over = (ss->ss_start != start);
|
||||
right_over = (ss->ss_end != end);
|
||||
|
||||
if (sm->sm_pp_root)
|
||||
avl_remove(sm->sm_pp_root, ss);
|
||||
|
||||
if (left_over && right_over) {
|
||||
newseg = kmem_alloc(sizeof (*newseg), KM_SLEEP);
|
||||
newseg->ss_start = end;
|
||||
newseg->ss_end = ss->ss_end;
|
||||
ss->ss_end = start;
|
||||
avl_insert_here(&sm->sm_root, newseg, ss, AVL_AFTER);
|
||||
if (sm->sm_pp_root)
|
||||
avl_add(sm->sm_pp_root, newseg);
|
||||
} else if (left_over) {
|
||||
ss->ss_end = start;
|
||||
} else if (right_over) {
|
||||
ss->ss_start = end;
|
||||
} else {
|
||||
avl_remove(&sm->sm_root, ss);
|
||||
kmem_free(ss, sizeof (*ss));
|
||||
ss = NULL;
|
||||
}
|
||||
|
||||
if (sm->sm_pp_root && ss != NULL)
|
||||
avl_add(sm->sm_pp_root, ss);
|
||||
|
||||
sm->sm_space -= size;
|
||||
}
|
||||
|
||||
boolean_t
|
||||
space_map_contains(space_map_t *sm, uint64_t start, uint64_t size)
|
||||
{
|
||||
avl_index_t where;
|
||||
space_seg_t ssearch, *ss;
|
||||
uint64_t end = start + size;
|
||||
|
||||
ASSERT(MUTEX_HELD(sm->sm_lock));
|
||||
VERIFY(size != 0);
|
||||
VERIFY(P2PHASE(start, 1ULL << sm->sm_shift) == 0);
|
||||
VERIFY(P2PHASE(size, 1ULL << sm->sm_shift) == 0);
|
||||
|
||||
ssearch.ss_start = start;
|
||||
ssearch.ss_end = end;
|
||||
ss = avl_find(&sm->sm_root, &ssearch, &where);
|
||||
|
||||
return (ss != NULL && ss->ss_start <= start && ss->ss_end >= end);
|
||||
}
|
||||
|
||||
void
|
||||
space_map_vacate(space_map_t *sm, space_map_func_t *func, space_map_t *mdest)
|
||||
{
|
||||
space_seg_t *ss;
|
||||
void *cookie = NULL;
|
||||
|
||||
ASSERT(MUTEX_HELD(sm->sm_lock));
|
||||
|
||||
while ((ss = avl_destroy_nodes(&sm->sm_root, &cookie)) != NULL) {
|
||||
if (func != NULL)
|
||||
func(mdest, ss->ss_start, ss->ss_end - ss->ss_start);
|
||||
kmem_free(ss, sizeof (*ss));
|
||||
}
|
||||
sm->sm_space = 0;
|
||||
}
|
||||
|
||||
void
|
||||
space_map_walk(space_map_t *sm, space_map_func_t *func, space_map_t *mdest)
|
||||
{
|
||||
space_seg_t *ss;
|
||||
|
||||
ASSERT(MUTEX_HELD(sm->sm_lock));
|
||||
|
||||
for (ss = avl_first(&sm->sm_root); ss; ss = AVL_NEXT(&sm->sm_root, ss))
|
||||
func(mdest, ss->ss_start, ss->ss_end - ss->ss_start);
|
||||
}
|
||||
|
||||
/*
|
||||
* Wait for any in-progress space_map_load() to complete.
|
||||
*/
|
||||
void
|
||||
space_map_load_wait(space_map_t *sm)
|
||||
{
|
||||
ASSERT(MUTEX_HELD(sm->sm_lock));
|
||||
|
||||
while (sm->sm_loading) {
|
||||
ASSERT(!sm->sm_loaded);
|
||||
cv_wait(&sm->sm_load_cv, sm->sm_lock);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Note: space_map_load() will drop sm_lock across dmu_read() calls.
|
||||
* The caller must be OK with this.
|
||||
*/
|
||||
int
|
||||
space_map_load(space_map_t *sm, space_map_ops_t *ops, uint8_t maptype,
|
||||
space_map_obj_t *smo, objset_t *os)
|
||||
{
|
||||
uint64_t *entry, *entry_map, *entry_map_end;
|
||||
uint64_t bufsize, size, offset, end, space;
|
||||
uint64_t mapstart = sm->sm_start;
|
||||
int error = 0;
|
||||
|
||||
ASSERT(MUTEX_HELD(sm->sm_lock));
|
||||
ASSERT(!sm->sm_loaded);
|
||||
ASSERT(!sm->sm_loading);
|
||||
|
||||
sm->sm_loading = B_TRUE;
|
||||
end = smo->smo_objsize;
|
||||
space = smo->smo_alloc;
|
||||
|
||||
ASSERT(sm->sm_ops == NULL);
|
||||
VERIFY3U(sm->sm_space, ==, 0);
|
||||
|
||||
if (maptype == SM_FREE) {
|
||||
space_map_add(sm, sm->sm_start, sm->sm_size);
|
||||
space = sm->sm_size - space;
|
||||
}
|
||||
|
||||
bufsize = 1ULL << SPACE_MAP_BLOCKSHIFT;
|
||||
entry_map = zio_buf_alloc(bufsize);
|
||||
|
||||
mutex_exit(sm->sm_lock);
|
||||
if (end > bufsize)
|
||||
dmu_prefetch(os, smo->smo_object, bufsize, end - bufsize);
|
||||
mutex_enter(sm->sm_lock);
|
||||
|
||||
for (offset = 0; offset < end; offset += bufsize) {
|
||||
size = MIN(end - offset, bufsize);
|
||||
VERIFY(P2PHASE(size, sizeof (uint64_t)) == 0);
|
||||
VERIFY(size != 0);
|
||||
|
||||
dprintf("object=%llu offset=%llx size=%llx\n",
|
||||
smo->smo_object, offset, size);
|
||||
|
||||
mutex_exit(sm->sm_lock);
|
||||
error = dmu_read(os, smo->smo_object, offset, size, entry_map,
|
||||
DMU_READ_PREFETCH);
|
||||
mutex_enter(sm->sm_lock);
|
||||
if (error != 0)
|
||||
break;
|
||||
|
||||
entry_map_end = entry_map + (size / sizeof (uint64_t));
|
||||
for (entry = entry_map; entry < entry_map_end; entry++) {
|
||||
uint64_t e = *entry;
|
||||
|
||||
if (SM_DEBUG_DECODE(e)) /* Skip debug entries */
|
||||
continue;
|
||||
|
||||
(SM_TYPE_DECODE(e) == maptype ?
|
||||
space_map_add : space_map_remove)(sm,
|
||||
(SM_OFFSET_DECODE(e) << sm->sm_shift) + mapstart,
|
||||
SM_RUN_DECODE(e) << sm->sm_shift);
|
||||
}
|
||||
}
|
||||
|
||||
if (error == 0) {
|
||||
VERIFY3U(sm->sm_space, ==, space);
|
||||
|
||||
sm->sm_loaded = B_TRUE;
|
||||
sm->sm_ops = ops;
|
||||
if (ops != NULL)
|
||||
ops->smop_load(sm);
|
||||
} else {
|
||||
space_map_vacate(sm, NULL, NULL);
|
||||
}
|
||||
|
||||
zio_buf_free(entry_map, bufsize);
|
||||
|
||||
sm->sm_loading = B_FALSE;
|
||||
|
||||
cv_broadcast(&sm->sm_load_cv);
|
||||
|
||||
return (error);
|
||||
}
|
||||
|
||||
void
|
||||
space_map_unload(space_map_t *sm)
|
||||
{
|
||||
ASSERT(MUTEX_HELD(sm->sm_lock));
|
||||
|
||||
if (sm->sm_loaded && sm->sm_ops != NULL)
|
||||
sm->sm_ops->smop_unload(sm);
|
||||
|
||||
sm->sm_loaded = B_FALSE;
|
||||
sm->sm_ops = NULL;
|
||||
|
||||
space_map_vacate(sm, NULL, NULL);
|
||||
}
|
||||
|
||||
uint64_t
|
||||
space_map_maxsize(space_map_t *sm)
|
||||
{
|
||||
ASSERT(sm->sm_ops != NULL);
|
||||
return (sm->sm_ops->smop_max(sm));
|
||||
}
|
||||
|
||||
uint64_t
|
||||
space_map_alloc(space_map_t *sm, uint64_t size)
|
||||
{
|
||||
uint64_t start;
|
||||
|
||||
start = sm->sm_ops->smop_alloc(sm, size);
|
||||
if (start != -1ULL)
|
||||
space_map_remove(sm, start, size);
|
||||
return (start);
|
||||
}
|
||||
|
||||
void
|
||||
space_map_claim(space_map_t *sm, uint64_t start, uint64_t size)
|
||||
{
|
||||
sm->sm_ops->smop_claim(sm, start, size);
|
||||
space_map_remove(sm, start, size);
|
||||
}
|
||||
|
||||
void
|
||||
space_map_free(space_map_t *sm, uint64_t start, uint64_t size)
|
||||
{
|
||||
space_map_add(sm, start, size);
|
||||
sm->sm_ops->smop_free(sm, start, size);
|
||||
}
|
||||
|
||||
/*
|
||||
* Note: space_map_sync() will drop sm_lock across dmu_write() calls.
|
||||
*/
|
||||
void
|
||||
space_map_sync(space_map_t *sm, uint8_t maptype,
|
||||
space_map_obj_t *smo, objset_t *os, dmu_tx_t *tx)
|
||||
{
|
||||
spa_t *spa = dmu_objset_spa(os);
|
||||
void *cookie = NULL;
|
||||
space_seg_t *ss;
|
||||
uint64_t bufsize, start, size, run_len;
|
||||
uint64_t *entry, *entry_map, *entry_map_end;
|
||||
|
||||
ASSERT(MUTEX_HELD(sm->sm_lock));
|
||||
|
||||
if (sm->sm_space == 0)
|
||||
return;
|
||||
|
||||
dprintf("object %4llu, txg %llu, pass %d, %c, count %lu, space %llx\n",
|
||||
smo->smo_object, dmu_tx_get_txg(tx), spa_sync_pass(spa),
|
||||
maptype == SM_ALLOC ? 'A' : 'F', avl_numnodes(&sm->sm_root),
|
||||
sm->sm_space);
|
||||
|
||||
if (maptype == SM_ALLOC)
|
||||
smo->smo_alloc += sm->sm_space;
|
||||
else
|
||||
smo->smo_alloc -= sm->sm_space;
|
||||
|
||||
bufsize = (8 + avl_numnodes(&sm->sm_root)) * sizeof (uint64_t);
|
||||
bufsize = MIN(bufsize, 1ULL << SPACE_MAP_BLOCKSHIFT);
|
||||
entry_map = zio_buf_alloc(bufsize);
|
||||
entry_map_end = entry_map + (bufsize / sizeof (uint64_t));
|
||||
entry = entry_map;
|
||||
|
||||
*entry++ = SM_DEBUG_ENCODE(1) |
|
||||
SM_DEBUG_ACTION_ENCODE(maptype) |
|
||||
SM_DEBUG_SYNCPASS_ENCODE(spa_sync_pass(spa)) |
|
||||
SM_DEBUG_TXG_ENCODE(dmu_tx_get_txg(tx));
|
||||
|
||||
while ((ss = avl_destroy_nodes(&sm->sm_root, &cookie)) != NULL) {
|
||||
size = ss->ss_end - ss->ss_start;
|
||||
start = (ss->ss_start - sm->sm_start) >> sm->sm_shift;
|
||||
|
||||
sm->sm_space -= size;
|
||||
size >>= sm->sm_shift;
|
||||
|
||||
while (size) {
|
||||
run_len = MIN(size, SM_RUN_MAX);
|
||||
|
||||
if (entry == entry_map_end) {
|
||||
mutex_exit(sm->sm_lock);
|
||||
dmu_write(os, smo->smo_object, smo->smo_objsize,
|
||||
bufsize, entry_map, tx);
|
||||
mutex_enter(sm->sm_lock);
|
||||
smo->smo_objsize += bufsize;
|
||||
entry = entry_map;
|
||||
}
|
||||
|
||||
*entry++ = SM_OFFSET_ENCODE(start) |
|
||||
SM_TYPE_ENCODE(maptype) |
|
||||
SM_RUN_ENCODE(run_len);
|
||||
|
||||
start += run_len;
|
||||
size -= run_len;
|
||||
}
|
||||
kmem_free(ss, sizeof (*ss));
|
||||
}
|
||||
|
||||
if (entry != entry_map) {
|
||||
size = (entry - entry_map) * sizeof (uint64_t);
|
||||
mutex_exit(sm->sm_lock);
|
||||
dmu_write(os, smo->smo_object, smo->smo_objsize,
|
||||
size, entry_map, tx);
|
||||
mutex_enter(sm->sm_lock);
|
||||
smo->smo_objsize += size;
|
||||
}
|
||||
|
||||
zio_buf_free(entry_map, bufsize);
|
||||
|
||||
VERIFY3U(sm->sm_space, ==, 0);
|
||||
}
|
||||
|
||||
void
|
||||
space_map_truncate(space_map_obj_t *smo, objset_t *os, dmu_tx_t *tx)
|
||||
{
|
||||
VERIFY(dmu_free_range(os, smo->smo_object, 0, -1ULL, tx) == 0);
|
||||
|
||||
smo->smo_objsize = 0;
|
||||
smo->smo_alloc = 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Space map reference trees.
|
||||
*
|
||||
* A space map is a collection of integers. Every integer is either
|
||||
* in the map, or it's not. A space map reference tree generalizes
|
||||
* the idea: it allows its members to have arbitrary reference counts,
|
||||
* as opposed to the implicit reference count of 0 or 1 in a space map.
|
||||
* This representation comes in handy when computing the union or
|
||||
* intersection of multiple space maps. For example, the union of
|
||||
* N space maps is the subset of the reference tree with refcnt >= 1.
|
||||
* The intersection of N space maps is the subset with refcnt >= N.
|
||||
*
|
||||
* [It's very much like a Fourier transform. Unions and intersections
|
||||
* are hard to perform in the 'space map domain', so we convert the maps
|
||||
* into the 'reference count domain', where it's trivial, then invert.]
|
||||
*
|
||||
* vdev_dtl_reassess() uses computations of this form to determine
|
||||
* DTL_MISSING and DTL_OUTAGE for interior vdevs -- e.g. a RAID-Z vdev
|
||||
* has an outage wherever refcnt >= vdev_nparity + 1, and a mirror vdev
|
||||
* has an outage wherever refcnt >= vdev_children.
|
||||
*/
|
||||
static int
|
||||
space_map_ref_compare(const void *x1, const void *x2)
|
||||
{
|
||||
const space_ref_t *sr1 = x1;
|
||||
const space_ref_t *sr2 = x2;
|
||||
|
||||
if (sr1->sr_offset < sr2->sr_offset)
|
||||
return (-1);
|
||||
if (sr1->sr_offset > sr2->sr_offset)
|
||||
return (1);
|
||||
|
||||
if (sr1 < sr2)
|
||||
return (-1);
|
||||
if (sr1 > sr2)
|
||||
return (1);
|
||||
|
||||
return (0);
|
||||
}
|
||||
|
||||
void
|
||||
space_map_ref_create(avl_tree_t *t)
|
||||
{
|
||||
avl_create(t, space_map_ref_compare,
|
||||
sizeof (space_ref_t), offsetof(space_ref_t, sr_node));
|
||||
}
|
||||
|
||||
void
|
||||
space_map_ref_destroy(avl_tree_t *t)
|
||||
{
|
||||
space_ref_t *sr;
|
||||
void *cookie = NULL;
|
||||
|
||||
while ((sr = avl_destroy_nodes(t, &cookie)) != NULL)
|
||||
kmem_free(sr, sizeof (*sr));
|
||||
|
||||
avl_destroy(t);
|
||||
}
|
||||
|
||||
static void
|
||||
space_map_ref_add_node(avl_tree_t *t, uint64_t offset, int64_t refcnt)
|
||||
{
|
||||
space_ref_t *sr;
|
||||
|
||||
sr = kmem_alloc(sizeof (*sr), KM_SLEEP);
|
||||
sr->sr_offset = offset;
|
||||
sr->sr_refcnt = refcnt;
|
||||
|
||||
avl_add(t, sr);
|
||||
}
|
||||
|
||||
void
|
||||
space_map_ref_add_seg(avl_tree_t *t, uint64_t start, uint64_t end,
|
||||
int64_t refcnt)
|
||||
{
|
||||
space_map_ref_add_node(t, start, refcnt);
|
||||
space_map_ref_add_node(t, end, -refcnt);
|
||||
}
|
||||
|
||||
/*
|
||||
* Convert (or add) a space map into a reference tree.
|
||||
*/
|
||||
void
|
||||
space_map_ref_add_map(avl_tree_t *t, space_map_t *sm, int64_t refcnt)
|
||||
{
|
||||
space_seg_t *ss;
|
||||
|
||||
ASSERT(MUTEX_HELD(sm->sm_lock));
|
||||
|
||||
for (ss = avl_first(&sm->sm_root); ss; ss = AVL_NEXT(&sm->sm_root, ss))
|
||||
space_map_ref_add_seg(t, ss->ss_start, ss->ss_end, refcnt);
|
||||
}
|
||||
|
||||
/*
|
||||
* Convert a reference tree into a space map. The space map will contain
|
||||
* all members of the reference tree for which refcnt >= minref.
|
||||
*/
|
||||
void
|
||||
space_map_ref_generate_map(avl_tree_t *t, space_map_t *sm, int64_t minref)
|
||||
{
|
||||
uint64_t start = -1ULL;
|
||||
int64_t refcnt = 0;
|
||||
space_ref_t *sr;
|
||||
|
||||
ASSERT(MUTEX_HELD(sm->sm_lock));
|
||||
|
||||
space_map_vacate(sm, NULL, NULL);
|
||||
|
||||
for (sr = avl_first(t); sr != NULL; sr = AVL_NEXT(t, sr)) {
|
||||
refcnt += sr->sr_refcnt;
|
||||
if (refcnt >= minref) {
|
||||
if (start == -1ULL) {
|
||||
start = sr->sr_offset;
|
||||
}
|
||||
} else {
|
||||
if (start != -1ULL) {
|
||||
uint64_t end = sr->sr_offset;
|
||||
ASSERT(start <= end);
|
||||
if (end > start)
|
||||
space_map_add(sm, start, end - start);
|
||||
start = -1ULL;
|
||||
}
|
||||
}
|
||||
}
|
||||
ASSERT(refcnt == 0);
|
||||
ASSERT(start == -1ULL);
|
||||
}
|
142
uts/common/fs/zfs/sys/arc.h
Normal file
142
uts/common/fs/zfs/sys/arc.h
Normal file
@ -0,0 +1,142 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_ARC_H
|
||||
#define _SYS_ARC_H
|
||||
|
||||
#include <sys/zfs_context.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
#include <sys/zio.h>
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/spa.h>
|
||||
|
||||
typedef struct arc_buf_hdr arc_buf_hdr_t;
|
||||
typedef struct arc_buf arc_buf_t;
|
||||
typedef void arc_done_func_t(zio_t *zio, arc_buf_t *buf, void *private);
|
||||
typedef int arc_evict_func_t(void *private);
|
||||
|
||||
/* generic arc_done_func_t's which you can use */
|
||||
arc_done_func_t arc_bcopy_func;
|
||||
arc_done_func_t arc_getbuf_func;
|
||||
|
||||
struct arc_buf {
|
||||
arc_buf_hdr_t *b_hdr;
|
||||
arc_buf_t *b_next;
|
||||
kmutex_t b_evict_lock;
|
||||
krwlock_t b_data_lock;
|
||||
void *b_data;
|
||||
arc_evict_func_t *b_efunc;
|
||||
void *b_private;
|
||||
};
|
||||
|
||||
typedef enum arc_buf_contents {
|
||||
ARC_BUFC_DATA, /* buffer contains data */
|
||||
ARC_BUFC_METADATA, /* buffer contains metadata */
|
||||
ARC_BUFC_NUMTYPES
|
||||
} arc_buf_contents_t;
|
||||
/*
|
||||
* These are the flags we pass into calls to the arc
|
||||
*/
|
||||
#define ARC_WAIT (1 << 1) /* perform I/O synchronously */
|
||||
#define ARC_NOWAIT (1 << 2) /* perform I/O asynchronously */
|
||||
#define ARC_PREFETCH (1 << 3) /* I/O is a prefetch */
|
||||
#define ARC_CACHED (1 << 4) /* I/O was already in cache */
|
||||
#define ARC_L2CACHE (1 << 5) /* cache in L2ARC */
|
||||
|
||||
/*
|
||||
* The following breakdows of arc_size exist for kstat only.
|
||||
*/
|
||||
typedef enum arc_space_type {
|
||||
ARC_SPACE_DATA,
|
||||
ARC_SPACE_HDRS,
|
||||
ARC_SPACE_L2HDRS,
|
||||
ARC_SPACE_OTHER,
|
||||
ARC_SPACE_NUMTYPES
|
||||
} arc_space_type_t;
|
||||
|
||||
void arc_space_consume(uint64_t space, arc_space_type_t type);
|
||||
void arc_space_return(uint64_t space, arc_space_type_t type);
|
||||
void *arc_data_buf_alloc(uint64_t space);
|
||||
void arc_data_buf_free(void *buf, uint64_t space);
|
||||
arc_buf_t *arc_buf_alloc(spa_t *spa, int size, void *tag,
|
||||
arc_buf_contents_t type);
|
||||
arc_buf_t *arc_loan_buf(spa_t *spa, int size);
|
||||
void arc_return_buf(arc_buf_t *buf, void *tag);
|
||||
void arc_loan_inuse_buf(arc_buf_t *buf, void *tag);
|
||||
void arc_buf_add_ref(arc_buf_t *buf, void *tag);
|
||||
int arc_buf_remove_ref(arc_buf_t *buf, void *tag);
|
||||
int arc_buf_size(arc_buf_t *buf);
|
||||
void arc_release(arc_buf_t *buf, void *tag);
|
||||
int arc_release_bp(arc_buf_t *buf, void *tag, blkptr_t *bp, spa_t *spa,
|
||||
zbookmark_t *zb);
|
||||
int arc_released(arc_buf_t *buf);
|
||||
int arc_has_callback(arc_buf_t *buf);
|
||||
void arc_buf_freeze(arc_buf_t *buf);
|
||||
void arc_buf_thaw(arc_buf_t *buf);
|
||||
#ifdef ZFS_DEBUG
|
||||
int arc_referenced(arc_buf_t *buf);
|
||||
#endif
|
||||
|
||||
int arc_read(zio_t *pio, spa_t *spa, const blkptr_t *bp, arc_buf_t *pbuf,
|
||||
arc_done_func_t *done, void *private, int priority, int zio_flags,
|
||||
uint32_t *arc_flags, const zbookmark_t *zb);
|
||||
int arc_read_nolock(zio_t *pio, spa_t *spa, const blkptr_t *bp,
|
||||
arc_done_func_t *done, void *private, int priority, int flags,
|
||||
uint32_t *arc_flags, const zbookmark_t *zb);
|
||||
zio_t *arc_write(zio_t *pio, spa_t *spa, uint64_t txg,
|
||||
blkptr_t *bp, arc_buf_t *buf, boolean_t l2arc, const zio_prop_t *zp,
|
||||
arc_done_func_t *ready, arc_done_func_t *done, void *private,
|
||||
int priority, int zio_flags, const zbookmark_t *zb);
|
||||
|
||||
void arc_set_callback(arc_buf_t *buf, arc_evict_func_t *func, void *private);
|
||||
int arc_buf_evict(arc_buf_t *buf);
|
||||
|
||||
void arc_flush(spa_t *spa);
|
||||
void arc_tempreserve_clear(uint64_t reserve);
|
||||
int arc_tempreserve_space(uint64_t reserve, uint64_t txg);
|
||||
|
||||
void arc_init(void);
|
||||
void arc_fini(void);
|
||||
|
||||
/*
|
||||
* Level 2 ARC
|
||||
*/
|
||||
|
||||
void l2arc_add_vdev(spa_t *spa, vdev_t *vd);
|
||||
void l2arc_remove_vdev(vdev_t *vd);
|
||||
boolean_t l2arc_vdev_present(vdev_t *vd);
|
||||
void l2arc_init(void);
|
||||
void l2arc_fini(void);
|
||||
void l2arc_start(void);
|
||||
void l2arc_stop(void);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_ARC_H */
|
57
uts/common/fs/zfs/sys/bplist.h
Normal file
57
uts/common/fs/zfs/sys/bplist.h
Normal file
@ -0,0 +1,57 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_BPLIST_H
|
||||
#define _SYS_BPLIST_H
|
||||
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/spa.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
typedef struct bplist_entry {
|
||||
blkptr_t bpe_blk;
|
||||
list_node_t bpe_node;
|
||||
} bplist_entry_t;
|
||||
|
||||
typedef struct bplist {
|
||||
kmutex_t bpl_lock;
|
||||
list_t bpl_list;
|
||||
} bplist_t;
|
||||
|
||||
typedef int bplist_itor_t(void *arg, const blkptr_t *bp, dmu_tx_t *tx);
|
||||
|
||||
void bplist_create(bplist_t *bpl);
|
||||
void bplist_destroy(bplist_t *bpl);
|
||||
void bplist_append(bplist_t *bpl, const blkptr_t *bp);
|
||||
void bplist_iterate(bplist_t *bpl, bplist_itor_t *func,
|
||||
void *arg, dmu_tx_t *tx);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_BPLIST_H */
|
91
uts/common/fs/zfs/sys/bpobj.h
Normal file
91
uts/common/fs/zfs/sys/bpobj.h
Normal file
@ -0,0 +1,91 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_BPOBJ_H
|
||||
#define _SYS_BPOBJ_H
|
||||
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/spa.h>
|
||||
#include <sys/txg.h>
|
||||
#include <sys/zio.h>
|
||||
#include <sys/zfs_context.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
typedef struct bpobj_phys {
|
||||
/*
|
||||
* This is the bonus buffer for the dead lists. The object's
|
||||
* contents is an array of bpo_entries blkptr_t's, representing
|
||||
* a total of bpo_bytes physical space.
|
||||
*/
|
||||
uint64_t bpo_num_blkptrs;
|
||||
uint64_t bpo_bytes;
|
||||
uint64_t bpo_comp;
|
||||
uint64_t bpo_uncomp;
|
||||
uint64_t bpo_subobjs;
|
||||
uint64_t bpo_num_subobjs;
|
||||
} bpobj_phys_t;
|
||||
|
||||
#define BPOBJ_SIZE_V0 (2 * sizeof (uint64_t))
|
||||
#define BPOBJ_SIZE_V1 (4 * sizeof (uint64_t))
|
||||
|
||||
typedef struct bpobj {
|
||||
kmutex_t bpo_lock;
|
||||
objset_t *bpo_os;
|
||||
uint64_t bpo_object;
|
||||
int bpo_epb;
|
||||
uint8_t bpo_havecomp;
|
||||
uint8_t bpo_havesubobj;
|
||||
bpobj_phys_t *bpo_phys;
|
||||
dmu_buf_t *bpo_dbuf;
|
||||
dmu_buf_t *bpo_cached_dbuf;
|
||||
} bpobj_t;
|
||||
|
||||
typedef int bpobj_itor_t(void *arg, const blkptr_t *bp, dmu_tx_t *tx);
|
||||
|
||||
uint64_t bpobj_alloc(objset_t *mos, int blocksize, dmu_tx_t *tx);
|
||||
void bpobj_free(objset_t *os, uint64_t obj, dmu_tx_t *tx);
|
||||
|
||||
int bpobj_open(bpobj_t *bpo, objset_t *mos, uint64_t object);
|
||||
void bpobj_close(bpobj_t *bpo);
|
||||
|
||||
int bpobj_iterate(bpobj_t *bpo, bpobj_itor_t func, void *arg, dmu_tx_t *tx);
|
||||
int bpobj_iterate_nofree(bpobj_t *bpo, bpobj_itor_t func, void *, dmu_tx_t *);
|
||||
int bpobj_iterate_dbg(bpobj_t *bpo, uint64_t *itorp, blkptr_t *bp);
|
||||
|
||||
void bpobj_enqueue_subobj(bpobj_t *bpo, uint64_t subobj, dmu_tx_t *tx);
|
||||
void bpobj_enqueue(bpobj_t *bpo, const blkptr_t *bp, dmu_tx_t *tx);
|
||||
|
||||
int bpobj_space(bpobj_t *bpo,
|
||||
uint64_t *usedp, uint64_t *compp, uint64_t *uncompp);
|
||||
int bpobj_space_range(bpobj_t *bpo, uint64_t mintxg, uint64_t maxtxg,
|
||||
uint64_t *usedp, uint64_t *compp, uint64_t *uncompp);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_BPOBJ_H */
|
375
uts/common/fs/zfs/sys/dbuf.h
Normal file
375
uts/common/fs/zfs/sys/dbuf.h
Normal file
@ -0,0 +1,375 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_DBUF_H
|
||||
#define _SYS_DBUF_H
|
||||
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/spa.h>
|
||||
#include <sys/txg.h>
|
||||
#include <sys/zio.h>
|
||||
#include <sys/arc.h>
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/refcount.h>
|
||||
#include <sys/zrlock.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
#define IN_DMU_SYNC 2
|
||||
|
||||
/*
|
||||
* define flags for dbuf_read
|
||||
*/
|
||||
|
||||
#define DB_RF_MUST_SUCCEED (1 << 0)
|
||||
#define DB_RF_CANFAIL (1 << 1)
|
||||
#define DB_RF_HAVESTRUCT (1 << 2)
|
||||
#define DB_RF_NOPREFETCH (1 << 3)
|
||||
#define DB_RF_NEVERWAIT (1 << 4)
|
||||
#define DB_RF_CACHED (1 << 5)
|
||||
|
||||
/*
|
||||
* The simplified state transition diagram for dbufs looks like:
|
||||
*
|
||||
* +----> READ ----+
|
||||
* | |
|
||||
* | V
|
||||
* (alloc)-->UNCACHED CACHED-->EVICTING-->(free)
|
||||
* | ^ ^
|
||||
* | | |
|
||||
* +----> FILL ----+ |
|
||||
* | |
|
||||
* | |
|
||||
* +--------> NOFILL -------+
|
||||
*/
|
||||
typedef enum dbuf_states {
|
||||
DB_UNCACHED,
|
||||
DB_FILL,
|
||||
DB_NOFILL,
|
||||
DB_READ,
|
||||
DB_CACHED,
|
||||
DB_EVICTING
|
||||
} dbuf_states_t;
|
||||
|
||||
struct dnode;
|
||||
struct dmu_tx;
|
||||
|
||||
/*
|
||||
* level = 0 means the user data
|
||||
* level = 1 means the single indirect block
|
||||
* etc.
|
||||
*/
|
||||
|
||||
struct dmu_buf_impl;
|
||||
|
||||
typedef enum override_states {
|
||||
DR_NOT_OVERRIDDEN,
|
||||
DR_IN_DMU_SYNC,
|
||||
DR_OVERRIDDEN
|
||||
} override_states_t;
|
||||
|
||||
typedef struct dbuf_dirty_record {
|
||||
/* link on our parents dirty list */
|
||||
list_node_t dr_dirty_node;
|
||||
|
||||
/* transaction group this data will sync in */
|
||||
uint64_t dr_txg;
|
||||
|
||||
/* zio of outstanding write IO */
|
||||
zio_t *dr_zio;
|
||||
|
||||
/* pointer back to our dbuf */
|
||||
struct dmu_buf_impl *dr_dbuf;
|
||||
|
||||
/* pointer to next dirty record */
|
||||
struct dbuf_dirty_record *dr_next;
|
||||
|
||||
/* pointer to parent dirty record */
|
||||
struct dbuf_dirty_record *dr_parent;
|
||||
|
||||
union dirty_types {
|
||||
struct dirty_indirect {
|
||||
|
||||
/* protect access to list */
|
||||
kmutex_t dr_mtx;
|
||||
|
||||
/* Our list of dirty children */
|
||||
list_t dr_children;
|
||||
} di;
|
||||
struct dirty_leaf {
|
||||
|
||||
/*
|
||||
* dr_data is set when we dirty the buffer
|
||||
* so that we can retain the pointer even if it
|
||||
* gets COW'd in a subsequent transaction group.
|
||||
*/
|
||||
arc_buf_t *dr_data;
|
||||
blkptr_t dr_overridden_by;
|
||||
override_states_t dr_override_state;
|
||||
uint8_t dr_copies;
|
||||
} dl;
|
||||
} dt;
|
||||
} dbuf_dirty_record_t;
|
||||
|
||||
typedef struct dmu_buf_impl {
|
||||
/*
|
||||
* The following members are immutable, with the exception of
|
||||
* db.db_data, which is protected by db_mtx.
|
||||
*/
|
||||
|
||||
/* the publicly visible structure */
|
||||
dmu_buf_t db;
|
||||
|
||||
/* the objset we belong to */
|
||||
struct objset *db_objset;
|
||||
|
||||
/*
|
||||
* handle to safely access the dnode we belong to (NULL when evicted)
|
||||
*/
|
||||
struct dnode_handle *db_dnode_handle;
|
||||
|
||||
/*
|
||||
* our parent buffer; if the dnode points to us directly,
|
||||
* db_parent == db_dnode_handle->dnh_dnode->dn_dbuf
|
||||
* only accessed by sync thread ???
|
||||
* (NULL when evicted)
|
||||
* May change from NULL to non-NULL under the protection of db_mtx
|
||||
* (see dbuf_check_blkptr())
|
||||
*/
|
||||
struct dmu_buf_impl *db_parent;
|
||||
|
||||
/*
|
||||
* link for hash table of all dmu_buf_impl_t's
|
||||
*/
|
||||
struct dmu_buf_impl *db_hash_next;
|
||||
|
||||
/* our block number */
|
||||
uint64_t db_blkid;
|
||||
|
||||
/*
|
||||
* Pointer to the blkptr_t which points to us. May be NULL if we
|
||||
* don't have one yet. (NULL when evicted)
|
||||
*/
|
||||
blkptr_t *db_blkptr;
|
||||
|
||||
/*
|
||||
* Our indirection level. Data buffers have db_level==0.
|
||||
* Indirect buffers which point to data buffers have
|
||||
* db_level==1. etc. Buffers which contain dnodes have
|
||||
* db_level==0, since the dnodes are stored in a file.
|
||||
*/
|
||||
uint8_t db_level;
|
||||
|
||||
/* db_mtx protects the members below */
|
||||
kmutex_t db_mtx;
|
||||
|
||||
/*
|
||||
* Current state of the buffer
|
||||
*/
|
||||
dbuf_states_t db_state;
|
||||
|
||||
/*
|
||||
* Refcount accessed by dmu_buf_{hold,rele}.
|
||||
* If nonzero, the buffer can't be destroyed.
|
||||
* Protected by db_mtx.
|
||||
*/
|
||||
refcount_t db_holds;
|
||||
|
||||
/* buffer holding our data */
|
||||
arc_buf_t *db_buf;
|
||||
|
||||
kcondvar_t db_changed;
|
||||
dbuf_dirty_record_t *db_data_pending;
|
||||
|
||||
/* pointer to most recent dirty record for this buffer */
|
||||
dbuf_dirty_record_t *db_last_dirty;
|
||||
|
||||
/*
|
||||
* Our link on the owner dnodes's dn_dbufs list.
|
||||
* Protected by its dn_dbufs_mtx.
|
||||
*/
|
||||
list_node_t db_link;
|
||||
|
||||
/* Data which is unique to data (leaf) blocks: */
|
||||
|
||||
/* stuff we store for the user (see dmu_buf_set_user) */
|
||||
void *db_user_ptr;
|
||||
void **db_user_data_ptr_ptr;
|
||||
dmu_buf_evict_func_t *db_evict_func;
|
||||
|
||||
uint8_t db_immediate_evict;
|
||||
uint8_t db_freed_in_flight;
|
||||
|
||||
uint8_t db_dirtycnt;
|
||||
} dmu_buf_impl_t;
|
||||
|
||||
/* Note: the dbuf hash table is exposed only for the mdb module */
|
||||
#define DBUF_MUTEXES 256
|
||||
#define DBUF_HASH_MUTEX(h, idx) (&(h)->hash_mutexes[(idx) & (DBUF_MUTEXES-1)])
|
||||
typedef struct dbuf_hash_table {
|
||||
uint64_t hash_table_mask;
|
||||
dmu_buf_impl_t **hash_table;
|
||||
kmutex_t hash_mutexes[DBUF_MUTEXES];
|
||||
} dbuf_hash_table_t;
|
||||
|
||||
|
||||
uint64_t dbuf_whichblock(struct dnode *di, uint64_t offset);
|
||||
|
||||
dmu_buf_impl_t *dbuf_create_tlib(struct dnode *dn, char *data);
|
||||
void dbuf_create_bonus(struct dnode *dn);
|
||||
int dbuf_spill_set_blksz(dmu_buf_t *db, uint64_t blksz, dmu_tx_t *tx);
|
||||
void dbuf_spill_hold(struct dnode *dn, dmu_buf_impl_t **dbp, void *tag);
|
||||
|
||||
void dbuf_rm_spill(struct dnode *dn, dmu_tx_t *tx);
|
||||
|
||||
dmu_buf_impl_t *dbuf_hold(struct dnode *dn, uint64_t blkid, void *tag);
|
||||
dmu_buf_impl_t *dbuf_hold_level(struct dnode *dn, int level, uint64_t blkid,
|
||||
void *tag);
|
||||
int dbuf_hold_impl(struct dnode *dn, uint8_t level, uint64_t blkid, int create,
|
||||
void *tag, dmu_buf_impl_t **dbp);
|
||||
|
||||
void dbuf_prefetch(struct dnode *dn, uint64_t blkid);
|
||||
|
||||
void dbuf_add_ref(dmu_buf_impl_t *db, void *tag);
|
||||
uint64_t dbuf_refcount(dmu_buf_impl_t *db);
|
||||
|
||||
void dbuf_rele(dmu_buf_impl_t *db, void *tag);
|
||||
void dbuf_rele_and_unlock(dmu_buf_impl_t *db, void *tag);
|
||||
|
||||
dmu_buf_impl_t *dbuf_find(struct dnode *dn, uint8_t level, uint64_t blkid);
|
||||
|
||||
int dbuf_read(dmu_buf_impl_t *db, zio_t *zio, uint32_t flags);
|
||||
void dbuf_will_dirty(dmu_buf_impl_t *db, dmu_tx_t *tx);
|
||||
void dbuf_fill_done(dmu_buf_impl_t *db, dmu_tx_t *tx);
|
||||
void dmu_buf_will_not_fill(dmu_buf_t *db, dmu_tx_t *tx);
|
||||
void dmu_buf_will_fill(dmu_buf_t *db, dmu_tx_t *tx);
|
||||
void dmu_buf_fill_done(dmu_buf_t *db, dmu_tx_t *tx);
|
||||
void dbuf_assign_arcbuf(dmu_buf_impl_t *db, arc_buf_t *buf, dmu_tx_t *tx);
|
||||
dbuf_dirty_record_t *dbuf_dirty(dmu_buf_impl_t *db, dmu_tx_t *tx);
|
||||
arc_buf_t *dbuf_loan_arcbuf(dmu_buf_impl_t *db);
|
||||
|
||||
void dbuf_clear(dmu_buf_impl_t *db);
|
||||
void dbuf_evict(dmu_buf_impl_t *db);
|
||||
|
||||
void dbuf_setdirty(dmu_buf_impl_t *db, dmu_tx_t *tx);
|
||||
void dbuf_unoverride(dbuf_dirty_record_t *dr);
|
||||
void dbuf_sync_list(list_t *list, dmu_tx_t *tx);
|
||||
void dbuf_release_bp(dmu_buf_impl_t *db);
|
||||
|
||||
void dbuf_free_range(struct dnode *dn, uint64_t start, uint64_t end,
|
||||
struct dmu_tx *);
|
||||
|
||||
void dbuf_new_size(dmu_buf_impl_t *db, int size, dmu_tx_t *tx);
|
||||
|
||||
#define DB_DNODE(_db) ((_db)->db_dnode_handle->dnh_dnode)
|
||||
#define DB_DNODE_LOCK(_db) ((_db)->db_dnode_handle->dnh_zrlock)
|
||||
#define DB_DNODE_ENTER(_db) (zrl_add(&DB_DNODE_LOCK(_db)))
|
||||
#define DB_DNODE_EXIT(_db) (zrl_remove(&DB_DNODE_LOCK(_db)))
|
||||
#define DB_DNODE_HELD(_db) (!zrl_is_zero(&DB_DNODE_LOCK(_db)))
|
||||
#define DB_GET_SPA(_spa_p, _db) { \
|
||||
dnode_t *__dn; \
|
||||
DB_DNODE_ENTER(_db); \
|
||||
__dn = DB_DNODE(_db); \
|
||||
*(_spa_p) = __dn->dn_objset->os_spa; \
|
||||
DB_DNODE_EXIT(_db); \
|
||||
}
|
||||
#define DB_GET_OBJSET(_os_p, _db) { \
|
||||
dnode_t *__dn; \
|
||||
DB_DNODE_ENTER(_db); \
|
||||
__dn = DB_DNODE(_db); \
|
||||
*(_os_p) = __dn->dn_objset; \
|
||||
DB_DNODE_EXIT(_db); \
|
||||
}
|
||||
|
||||
void dbuf_init(void);
|
||||
void dbuf_fini(void);
|
||||
|
||||
boolean_t dbuf_is_metadata(dmu_buf_impl_t *db);
|
||||
|
||||
#define DBUF_IS_METADATA(_db) \
|
||||
(dbuf_is_metadata(_db))
|
||||
|
||||
#define DBUF_GET_BUFC_TYPE(_db) \
|
||||
(DBUF_IS_METADATA(_db) ? ARC_BUFC_METADATA : ARC_BUFC_DATA)
|
||||
|
||||
#define DBUF_IS_CACHEABLE(_db) \
|
||||
((_db)->db_objset->os_primary_cache == ZFS_CACHE_ALL || \
|
||||
(DBUF_IS_METADATA(_db) && \
|
||||
((_db)->db_objset->os_primary_cache == ZFS_CACHE_METADATA)))
|
||||
|
||||
#define DBUF_IS_L2CACHEABLE(_db) \
|
||||
((_db)->db_objset->os_secondary_cache == ZFS_CACHE_ALL || \
|
||||
(DBUF_IS_METADATA(_db) && \
|
||||
((_db)->db_objset->os_secondary_cache == ZFS_CACHE_METADATA)))
|
||||
|
||||
#ifdef ZFS_DEBUG
|
||||
|
||||
/*
|
||||
* There should be a ## between the string literal and fmt, to make it
|
||||
* clear that we're joining two strings together, but gcc does not
|
||||
* support that preprocessor token.
|
||||
*/
|
||||
#define dprintf_dbuf(dbuf, fmt, ...) do { \
|
||||
if (zfs_flags & ZFS_DEBUG_DPRINTF) { \
|
||||
char __db_buf[32]; \
|
||||
uint64_t __db_obj = (dbuf)->db.db_object; \
|
||||
if (__db_obj == DMU_META_DNODE_OBJECT) \
|
||||
(void) strcpy(__db_buf, "mdn"); \
|
||||
else \
|
||||
(void) snprintf(__db_buf, sizeof (__db_buf), "%lld", \
|
||||
(u_longlong_t)__db_obj); \
|
||||
dprintf_ds((dbuf)->db_objset->os_dsl_dataset, \
|
||||
"obj=%s lvl=%u blkid=%lld " fmt, \
|
||||
__db_buf, (dbuf)->db_level, \
|
||||
(u_longlong_t)(dbuf)->db_blkid, __VA_ARGS__); \
|
||||
} \
|
||||
_NOTE(CONSTCOND) } while (0)
|
||||
|
||||
#define dprintf_dbuf_bp(db, bp, fmt, ...) do { \
|
||||
if (zfs_flags & ZFS_DEBUG_DPRINTF) { \
|
||||
char *__blkbuf = kmem_alloc(BP_SPRINTF_LEN, KM_SLEEP); \
|
||||
sprintf_blkptr(__blkbuf, bp); \
|
||||
dprintf_dbuf(db, fmt " %s\n", __VA_ARGS__, __blkbuf); \
|
||||
kmem_free(__blkbuf, BP_SPRINTF_LEN); \
|
||||
} \
|
||||
_NOTE(CONSTCOND) } while (0)
|
||||
|
||||
#define DBUF_VERIFY(db) dbuf_verify(db)
|
||||
|
||||
#else
|
||||
|
||||
#define dprintf_dbuf(db, fmt, ...)
|
||||
#define dprintf_dbuf_bp(db, bp, fmt, ...)
|
||||
#define DBUF_VERIFY(db)
|
||||
|
||||
#endif
|
||||
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_DBUF_H */
|
246
uts/common/fs/zfs/sys/ddt.h
Normal file
246
uts/common/fs/zfs/sys/ddt.h
Normal file
@ -0,0 +1,246 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2009, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_DDT_H
|
||||
#define _SYS_DDT_H
|
||||
|
||||
#include <sys/sysmacros.h>
|
||||
#include <sys/types.h>
|
||||
#include <sys/fs/zfs.h>
|
||||
#include <sys/zio.h>
|
||||
#include <sys/dmu.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
/*
|
||||
* On-disk DDT formats, in the desired search order (newest version first).
|
||||
*/
|
||||
enum ddt_type {
|
||||
DDT_TYPE_ZAP = 0,
|
||||
DDT_TYPES
|
||||
};
|
||||
|
||||
/*
|
||||
* DDT classes, in the desired search order (highest replication level first).
|
||||
*/
|
||||
enum ddt_class {
|
||||
DDT_CLASS_DITTO = 0,
|
||||
DDT_CLASS_DUPLICATE,
|
||||
DDT_CLASS_UNIQUE,
|
||||
DDT_CLASSES
|
||||
};
|
||||
|
||||
#define DDT_TYPE_CURRENT 0
|
||||
|
||||
#define DDT_COMPRESS_BYTEORDER_MASK 0x80
|
||||
#define DDT_COMPRESS_FUNCTION_MASK 0x7f
|
||||
|
||||
/*
|
||||
* On-disk ddt entry: key (name) and physical storage (value).
|
||||
*/
|
||||
typedef struct ddt_key {
|
||||
zio_cksum_t ddk_cksum; /* 256-bit block checksum */
|
||||
uint64_t ddk_prop; /* LSIZE, PSIZE, compression */
|
||||
} ddt_key_t;
|
||||
|
||||
/*
|
||||
* ddk_prop layout:
|
||||
*
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
* | 0 | 0 | 0 | comp | PSIZE | LSIZE |
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
*/
|
||||
#define DDK_GET_LSIZE(ddk) \
|
||||
BF64_GET_SB((ddk)->ddk_prop, 0, 16, SPA_MINBLOCKSHIFT, 1)
|
||||
#define DDK_SET_LSIZE(ddk, x) \
|
||||
BF64_SET_SB((ddk)->ddk_prop, 0, 16, SPA_MINBLOCKSHIFT, 1, x)
|
||||
|
||||
#define DDK_GET_PSIZE(ddk) \
|
||||
BF64_GET_SB((ddk)->ddk_prop, 16, 16, SPA_MINBLOCKSHIFT, 1)
|
||||
#define DDK_SET_PSIZE(ddk, x) \
|
||||
BF64_SET_SB((ddk)->ddk_prop, 16, 16, SPA_MINBLOCKSHIFT, 1, x)
|
||||
|
||||
#define DDK_GET_COMPRESS(ddk) BF64_GET((ddk)->ddk_prop, 32, 8)
|
||||
#define DDK_SET_COMPRESS(ddk, x) BF64_SET((ddk)->ddk_prop, 32, 8, x)
|
||||
|
||||
#define DDT_KEY_WORDS (sizeof (ddt_key_t) / sizeof (uint64_t))
|
||||
|
||||
typedef struct ddt_phys {
|
||||
dva_t ddp_dva[SPA_DVAS_PER_BP];
|
||||
uint64_t ddp_refcnt;
|
||||
uint64_t ddp_phys_birth;
|
||||
} ddt_phys_t;
|
||||
|
||||
enum ddt_phys_type {
|
||||
DDT_PHYS_DITTO = 0,
|
||||
DDT_PHYS_SINGLE = 1,
|
||||
DDT_PHYS_DOUBLE = 2,
|
||||
DDT_PHYS_TRIPLE = 3,
|
||||
DDT_PHYS_TYPES
|
||||
};
|
||||
|
||||
/*
|
||||
* In-core ddt entry
|
||||
*/
|
||||
struct ddt_entry {
|
||||
ddt_key_t dde_key;
|
||||
ddt_phys_t dde_phys[DDT_PHYS_TYPES];
|
||||
zio_t *dde_lead_zio[DDT_PHYS_TYPES];
|
||||
void *dde_repair_data;
|
||||
enum ddt_type dde_type;
|
||||
enum ddt_class dde_class;
|
||||
uint8_t dde_loading;
|
||||
uint8_t dde_loaded;
|
||||
kcondvar_t dde_cv;
|
||||
avl_node_t dde_node;
|
||||
};
|
||||
|
||||
/*
|
||||
* In-core ddt
|
||||
*/
|
||||
struct ddt {
|
||||
kmutex_t ddt_lock;
|
||||
avl_tree_t ddt_tree;
|
||||
avl_tree_t ddt_repair_tree;
|
||||
enum zio_checksum ddt_checksum;
|
||||
spa_t *ddt_spa;
|
||||
objset_t *ddt_os;
|
||||
uint64_t ddt_stat_object;
|
||||
uint64_t ddt_object[DDT_TYPES][DDT_CLASSES];
|
||||
ddt_histogram_t ddt_histogram[DDT_TYPES][DDT_CLASSES];
|
||||
ddt_histogram_t ddt_histogram_cache[DDT_TYPES][DDT_CLASSES];
|
||||
ddt_object_t ddt_object_stats[DDT_TYPES][DDT_CLASSES];
|
||||
avl_node_t ddt_node;
|
||||
};
|
||||
|
||||
/*
|
||||
* In-core and on-disk bookmark for DDT walks
|
||||
*/
|
||||
typedef struct ddt_bookmark {
|
||||
uint64_t ddb_class;
|
||||
uint64_t ddb_type;
|
||||
uint64_t ddb_checksum;
|
||||
uint64_t ddb_cursor;
|
||||
} ddt_bookmark_t;
|
||||
|
||||
/*
|
||||
* Ops vector to access a specific DDT object type.
|
||||
*/
|
||||
typedef struct ddt_ops {
|
||||
char ddt_op_name[32];
|
||||
int (*ddt_op_create)(objset_t *os, uint64_t *object, dmu_tx_t *tx,
|
||||
boolean_t prehash);
|
||||
int (*ddt_op_destroy)(objset_t *os, uint64_t object, dmu_tx_t *tx);
|
||||
int (*ddt_op_lookup)(objset_t *os, uint64_t object, ddt_entry_t *dde);
|
||||
void (*ddt_op_prefetch)(objset_t *os, uint64_t object,
|
||||
ddt_entry_t *dde);
|
||||
int (*ddt_op_update)(objset_t *os, uint64_t object, ddt_entry_t *dde,
|
||||
dmu_tx_t *tx);
|
||||
int (*ddt_op_remove)(objset_t *os, uint64_t object, ddt_entry_t *dde,
|
||||
dmu_tx_t *tx);
|
||||
int (*ddt_op_walk)(objset_t *os, uint64_t object, ddt_entry_t *dde,
|
||||
uint64_t *walk);
|
||||
uint64_t (*ddt_op_count)(objset_t *os, uint64_t object);
|
||||
} ddt_ops_t;
|
||||
|
||||
#define DDT_NAMELEN 80
|
||||
|
||||
extern void ddt_object_name(ddt_t *ddt, enum ddt_type type,
|
||||
enum ddt_class class, char *name);
|
||||
extern int ddt_object_walk(ddt_t *ddt, enum ddt_type type,
|
||||
enum ddt_class class, uint64_t *walk, ddt_entry_t *dde);
|
||||
extern uint64_t ddt_object_count(ddt_t *ddt, enum ddt_type type,
|
||||
enum ddt_class class);
|
||||
extern int ddt_object_info(ddt_t *ddt, enum ddt_type type,
|
||||
enum ddt_class class, dmu_object_info_t *);
|
||||
extern boolean_t ddt_object_exists(ddt_t *ddt, enum ddt_type type,
|
||||
enum ddt_class class);
|
||||
|
||||
extern void ddt_bp_fill(const ddt_phys_t *ddp, blkptr_t *bp,
|
||||
uint64_t txg);
|
||||
extern void ddt_bp_create(enum zio_checksum checksum, const ddt_key_t *ddk,
|
||||
const ddt_phys_t *ddp, blkptr_t *bp);
|
||||
|
||||
extern void ddt_key_fill(ddt_key_t *ddk, const blkptr_t *bp);
|
||||
|
||||
extern void ddt_phys_fill(ddt_phys_t *ddp, const blkptr_t *bp);
|
||||
extern void ddt_phys_clear(ddt_phys_t *ddp);
|
||||
extern void ddt_phys_addref(ddt_phys_t *ddp);
|
||||
extern void ddt_phys_decref(ddt_phys_t *ddp);
|
||||
extern void ddt_phys_free(ddt_t *ddt, ddt_key_t *ddk, ddt_phys_t *ddp,
|
||||
uint64_t txg);
|
||||
extern ddt_phys_t *ddt_phys_select(const ddt_entry_t *dde, const blkptr_t *bp);
|
||||
extern uint64_t ddt_phys_total_refcnt(const ddt_entry_t *dde);
|
||||
|
||||
extern void ddt_stat_add(ddt_stat_t *dst, const ddt_stat_t *src, uint64_t neg);
|
||||
|
||||
extern void ddt_histogram_add(ddt_histogram_t *dst, const ddt_histogram_t *src);
|
||||
extern void ddt_histogram_stat(ddt_stat_t *dds, const ddt_histogram_t *ddh);
|
||||
extern boolean_t ddt_histogram_empty(const ddt_histogram_t *ddh);
|
||||
extern void ddt_get_dedup_object_stats(spa_t *spa, ddt_object_t *ddo);
|
||||
extern void ddt_get_dedup_histogram(spa_t *spa, ddt_histogram_t *ddh);
|
||||
extern void ddt_get_dedup_stats(spa_t *spa, ddt_stat_t *dds_total);
|
||||
|
||||
extern uint64_t ddt_get_dedup_dspace(spa_t *spa);
|
||||
extern uint64_t ddt_get_pool_dedup_ratio(spa_t *spa);
|
||||
|
||||
extern int ddt_ditto_copies_needed(ddt_t *ddt, ddt_entry_t *dde,
|
||||
ddt_phys_t *ddp_willref);
|
||||
extern int ddt_ditto_copies_present(ddt_entry_t *dde);
|
||||
|
||||
extern size_t ddt_compress(void *src, uchar_t *dst, size_t s_len, size_t d_len);
|
||||
extern void ddt_decompress(uchar_t *src, void *dst, size_t s_len, size_t d_len);
|
||||
|
||||
extern ddt_t *ddt_select(spa_t *spa, const blkptr_t *bp);
|
||||
extern void ddt_enter(ddt_t *ddt);
|
||||
extern void ddt_exit(ddt_t *ddt);
|
||||
extern ddt_entry_t *ddt_lookup(ddt_t *ddt, const blkptr_t *bp, boolean_t add);
|
||||
extern void ddt_prefetch(spa_t *spa, const blkptr_t *bp);
|
||||
extern void ddt_remove(ddt_t *ddt, ddt_entry_t *dde);
|
||||
|
||||
extern boolean_t ddt_class_contains(spa_t *spa, enum ddt_class max_class,
|
||||
const blkptr_t *bp);
|
||||
|
||||
extern ddt_entry_t *ddt_repair_start(ddt_t *ddt, const blkptr_t *bp);
|
||||
extern void ddt_repair_done(ddt_t *ddt, ddt_entry_t *dde);
|
||||
|
||||
extern int ddt_entry_compare(const void *x1, const void *x2);
|
||||
|
||||
extern void ddt_create(spa_t *spa);
|
||||
extern int ddt_load(spa_t *spa);
|
||||
extern void ddt_unload(spa_t *spa);
|
||||
extern void ddt_sync(spa_t *spa, uint64_t txg);
|
||||
extern int ddt_walk(spa_t *spa, ddt_bookmark_t *ddb, ddt_entry_t *dde);
|
||||
extern int ddt_object_update(ddt_t *ddt, enum ddt_type type,
|
||||
enum ddt_class class, ddt_entry_t *dde, dmu_tx_t *tx);
|
||||
|
||||
extern const ddt_ops_t ddt_zap_ops;
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_DDT_H */
|
740
uts/common/fs/zfs/sys/dmu.h
Normal file
740
uts/common/fs/zfs/sys/dmu.h
Normal file
@ -0,0 +1,740 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
/* Portions Copyright 2010 Robert Milkowski */
|
||||
|
||||
#ifndef _SYS_DMU_H
|
||||
#define _SYS_DMU_H
|
||||
|
||||
/*
|
||||
* This file describes the interface that the DMU provides for its
|
||||
* consumers.
|
||||
*
|
||||
* The DMU also interacts with the SPA. That interface is described in
|
||||
* dmu_spa.h.
|
||||
*/
|
||||
|
||||
#include <sys/inttypes.h>
|
||||
#include <sys/types.h>
|
||||
#include <sys/param.h>
|
||||
#include <sys/cred.h>
|
||||
#include <sys/time.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
struct uio;
|
||||
struct xuio;
|
||||
struct page;
|
||||
struct vnode;
|
||||
struct spa;
|
||||
struct zilog;
|
||||
struct zio;
|
||||
struct blkptr;
|
||||
struct zap_cursor;
|
||||
struct dsl_dataset;
|
||||
struct dsl_pool;
|
||||
struct dnode;
|
||||
struct drr_begin;
|
||||
struct drr_end;
|
||||
struct zbookmark;
|
||||
struct spa;
|
||||
struct nvlist;
|
||||
struct arc_buf;
|
||||
struct zio_prop;
|
||||
struct sa_handle;
|
||||
|
||||
typedef struct objset objset_t;
|
||||
typedef struct dmu_tx dmu_tx_t;
|
||||
typedef struct dsl_dir dsl_dir_t;
|
||||
|
||||
typedef enum dmu_object_type {
|
||||
DMU_OT_NONE,
|
||||
/* general: */
|
||||
DMU_OT_OBJECT_DIRECTORY, /* ZAP */
|
||||
DMU_OT_OBJECT_ARRAY, /* UINT64 */
|
||||
DMU_OT_PACKED_NVLIST, /* UINT8 (XDR by nvlist_pack/unpack) */
|
||||
DMU_OT_PACKED_NVLIST_SIZE, /* UINT64 */
|
||||
DMU_OT_BPOBJ, /* UINT64 */
|
||||
DMU_OT_BPOBJ_HDR, /* UINT64 */
|
||||
/* spa: */
|
||||
DMU_OT_SPACE_MAP_HEADER, /* UINT64 */
|
||||
DMU_OT_SPACE_MAP, /* UINT64 */
|
||||
/* zil: */
|
||||
DMU_OT_INTENT_LOG, /* UINT64 */
|
||||
/* dmu: */
|
||||
DMU_OT_DNODE, /* DNODE */
|
||||
DMU_OT_OBJSET, /* OBJSET */
|
||||
/* dsl: */
|
||||
DMU_OT_DSL_DIR, /* UINT64 */
|
||||
DMU_OT_DSL_DIR_CHILD_MAP, /* ZAP */
|
||||
DMU_OT_DSL_DS_SNAP_MAP, /* ZAP */
|
||||
DMU_OT_DSL_PROPS, /* ZAP */
|
||||
DMU_OT_DSL_DATASET, /* UINT64 */
|
||||
/* zpl: */
|
||||
DMU_OT_ZNODE, /* ZNODE */
|
||||
DMU_OT_OLDACL, /* Old ACL */
|
||||
DMU_OT_PLAIN_FILE_CONTENTS, /* UINT8 */
|
||||
DMU_OT_DIRECTORY_CONTENTS, /* ZAP */
|
||||
DMU_OT_MASTER_NODE, /* ZAP */
|
||||
DMU_OT_UNLINKED_SET, /* ZAP */
|
||||
/* zvol: */
|
||||
DMU_OT_ZVOL, /* UINT8 */
|
||||
DMU_OT_ZVOL_PROP, /* ZAP */
|
||||
/* other; for testing only! */
|
||||
DMU_OT_PLAIN_OTHER, /* UINT8 */
|
||||
DMU_OT_UINT64_OTHER, /* UINT64 */
|
||||
DMU_OT_ZAP_OTHER, /* ZAP */
|
||||
/* new object types: */
|
||||
DMU_OT_ERROR_LOG, /* ZAP */
|
||||
DMU_OT_SPA_HISTORY, /* UINT8 */
|
||||
DMU_OT_SPA_HISTORY_OFFSETS, /* spa_his_phys_t */
|
||||
DMU_OT_POOL_PROPS, /* ZAP */
|
||||
DMU_OT_DSL_PERMS, /* ZAP */
|
||||
DMU_OT_ACL, /* ACL */
|
||||
DMU_OT_SYSACL, /* SYSACL */
|
||||
DMU_OT_FUID, /* FUID table (Packed NVLIST UINT8) */
|
||||
DMU_OT_FUID_SIZE, /* FUID table size UINT64 */
|
||||
DMU_OT_NEXT_CLONES, /* ZAP */
|
||||
DMU_OT_SCAN_QUEUE, /* ZAP */
|
||||
DMU_OT_USERGROUP_USED, /* ZAP */
|
||||
DMU_OT_USERGROUP_QUOTA, /* ZAP */
|
||||
DMU_OT_USERREFS, /* ZAP */
|
||||
DMU_OT_DDT_ZAP, /* ZAP */
|
||||
DMU_OT_DDT_STATS, /* ZAP */
|
||||
DMU_OT_SA, /* System attr */
|
||||
DMU_OT_SA_MASTER_NODE, /* ZAP */
|
||||
DMU_OT_SA_ATTR_REGISTRATION, /* ZAP */
|
||||
DMU_OT_SA_ATTR_LAYOUTS, /* ZAP */
|
||||
DMU_OT_SCAN_XLATE, /* ZAP */
|
||||
DMU_OT_DEDUP, /* fake dedup BP from ddt_bp_create() */
|
||||
DMU_OT_DEADLIST, /* ZAP */
|
||||
DMU_OT_DEADLIST_HDR, /* UINT64 */
|
||||
DMU_OT_DSL_CLONES, /* ZAP */
|
||||
DMU_OT_BPOBJ_SUBOBJ, /* UINT64 */
|
||||
DMU_OT_NUMTYPES
|
||||
} dmu_object_type_t;
|
||||
|
||||
typedef enum dmu_objset_type {
|
||||
DMU_OST_NONE,
|
||||
DMU_OST_META,
|
||||
DMU_OST_ZFS,
|
||||
DMU_OST_ZVOL,
|
||||
DMU_OST_OTHER, /* For testing only! */
|
||||
DMU_OST_ANY, /* Be careful! */
|
||||
DMU_OST_NUMTYPES
|
||||
} dmu_objset_type_t;
|
||||
|
||||
void byteswap_uint64_array(void *buf, size_t size);
|
||||
void byteswap_uint32_array(void *buf, size_t size);
|
||||
void byteswap_uint16_array(void *buf, size_t size);
|
||||
void byteswap_uint8_array(void *buf, size_t size);
|
||||
void zap_byteswap(void *buf, size_t size);
|
||||
void zfs_oldacl_byteswap(void *buf, size_t size);
|
||||
void zfs_acl_byteswap(void *buf, size_t size);
|
||||
void zfs_znode_byteswap(void *buf, size_t size);
|
||||
|
||||
#define DS_FIND_SNAPSHOTS (1<<0)
|
||||
#define DS_FIND_CHILDREN (1<<1)
|
||||
|
||||
/*
|
||||
* The maximum number of bytes that can be accessed as part of one
|
||||
* operation, including metadata.
|
||||
*/
|
||||
#define DMU_MAX_ACCESS (10<<20) /* 10MB */
|
||||
#define DMU_MAX_DELETEBLKCNT (20480) /* ~5MB of indirect blocks */
|
||||
|
||||
#define DMU_USERUSED_OBJECT (-1ULL)
|
||||
#define DMU_GROUPUSED_OBJECT (-2ULL)
|
||||
#define DMU_DEADLIST_OBJECT (-3ULL)
|
||||
|
||||
/*
|
||||
* artificial blkids for bonus buffer and spill blocks
|
||||
*/
|
||||
#define DMU_BONUS_BLKID (-1ULL)
|
||||
#define DMU_SPILL_BLKID (-2ULL)
|
||||
/*
|
||||
* Public routines to create, destroy, open, and close objsets.
|
||||
*/
|
||||
int dmu_objset_hold(const char *name, void *tag, objset_t **osp);
|
||||
int dmu_objset_own(const char *name, dmu_objset_type_t type,
|
||||
boolean_t readonly, void *tag, objset_t **osp);
|
||||
void dmu_objset_rele(objset_t *os, void *tag);
|
||||
void dmu_objset_disown(objset_t *os, void *tag);
|
||||
int dmu_objset_open_ds(struct dsl_dataset *ds, objset_t **osp);
|
||||
|
||||
int dmu_objset_evict_dbufs(objset_t *os);
|
||||
int dmu_objset_create(const char *name, dmu_objset_type_t type, uint64_t flags,
|
||||
void (*func)(objset_t *os, void *arg, cred_t *cr, dmu_tx_t *tx), void *arg);
|
||||
int dmu_objset_clone(const char *name, struct dsl_dataset *clone_origin,
|
||||
uint64_t flags);
|
||||
int dmu_objset_destroy(const char *name, boolean_t defer);
|
||||
int dmu_snapshots_destroy(char *fsname, char *snapname, boolean_t defer);
|
||||
int dmu_objset_snapshot(char *fsname, char *snapname, char *tag,
|
||||
struct nvlist *props, boolean_t recursive, boolean_t temporary, int fd);
|
||||
int dmu_objset_rename(const char *name, const char *newname,
|
||||
boolean_t recursive);
|
||||
int dmu_objset_find(char *name, int func(const char *, void *), void *arg,
|
||||
int flags);
|
||||
void dmu_objset_byteswap(void *buf, size_t size);
|
||||
|
||||
typedef struct dmu_buf {
|
||||
uint64_t db_object; /* object that this buffer is part of */
|
||||
uint64_t db_offset; /* byte offset in this object */
|
||||
uint64_t db_size; /* size of buffer in bytes */
|
||||
void *db_data; /* data in buffer */
|
||||
} dmu_buf_t;
|
||||
|
||||
typedef void dmu_buf_evict_func_t(struct dmu_buf *db, void *user_ptr);
|
||||
|
||||
/*
|
||||
* The names of zap entries in the DIRECTORY_OBJECT of the MOS.
|
||||
*/
|
||||
#define DMU_POOL_DIRECTORY_OBJECT 1
|
||||
#define DMU_POOL_CONFIG "config"
|
||||
#define DMU_POOL_ROOT_DATASET "root_dataset"
|
||||
#define DMU_POOL_SYNC_BPOBJ "sync_bplist"
|
||||
#define DMU_POOL_ERRLOG_SCRUB "errlog_scrub"
|
||||
#define DMU_POOL_ERRLOG_LAST "errlog_last"
|
||||
#define DMU_POOL_SPARES "spares"
|
||||
#define DMU_POOL_DEFLATE "deflate"
|
||||
#define DMU_POOL_HISTORY "history"
|
||||
#define DMU_POOL_PROPS "pool_props"
|
||||
#define DMU_POOL_L2CACHE "l2cache"
|
||||
#define DMU_POOL_TMP_USERREFS "tmp_userrefs"
|
||||
#define DMU_POOL_DDT "DDT-%s-%s-%s"
|
||||
#define DMU_POOL_DDT_STATS "DDT-statistics"
|
||||
#define DMU_POOL_CREATION_VERSION "creation_version"
|
||||
#define DMU_POOL_SCAN "scan"
|
||||
#define DMU_POOL_FREE_BPOBJ "free_bpobj"
|
||||
|
||||
/*
|
||||
* Allocate an object from this objset. The range of object numbers
|
||||
* available is (0, DN_MAX_OBJECT). Object 0 is the meta-dnode.
|
||||
*
|
||||
* The transaction must be assigned to a txg. The newly allocated
|
||||
* object will be "held" in the transaction (ie. you can modify the
|
||||
* newly allocated object in this transaction).
|
||||
*
|
||||
* dmu_object_alloc() chooses an object and returns it in *objectp.
|
||||
*
|
||||
* dmu_object_claim() allocates a specific object number. If that
|
||||
* number is already allocated, it fails and returns EEXIST.
|
||||
*
|
||||
* Return 0 on success, or ENOSPC or EEXIST as specified above.
|
||||
*/
|
||||
uint64_t dmu_object_alloc(objset_t *os, dmu_object_type_t ot,
|
||||
int blocksize, dmu_object_type_t bonus_type, int bonus_len, dmu_tx_t *tx);
|
||||
int dmu_object_claim(objset_t *os, uint64_t object, dmu_object_type_t ot,
|
||||
int blocksize, dmu_object_type_t bonus_type, int bonus_len, dmu_tx_t *tx);
|
||||
int dmu_object_reclaim(objset_t *os, uint64_t object, dmu_object_type_t ot,
|
||||
int blocksize, dmu_object_type_t bonustype, int bonuslen);
|
||||
|
||||
/*
|
||||
* Free an object from this objset.
|
||||
*
|
||||
* The object's data will be freed as well (ie. you don't need to call
|
||||
* dmu_free(object, 0, -1, tx)).
|
||||
*
|
||||
* The object need not be held in the transaction.
|
||||
*
|
||||
* If there are any holds on this object's buffers (via dmu_buf_hold()),
|
||||
* or tx holds on the object (via dmu_tx_hold_object()), you can not
|
||||
* free it; it fails and returns EBUSY.
|
||||
*
|
||||
* If the object is not allocated, it fails and returns ENOENT.
|
||||
*
|
||||
* Return 0 on success, or EBUSY or ENOENT as specified above.
|
||||
*/
|
||||
int dmu_object_free(objset_t *os, uint64_t object, dmu_tx_t *tx);
|
||||
|
||||
/*
|
||||
* Find the next allocated or free object.
|
||||
*
|
||||
* The objectp parameter is in-out. It will be updated to be the next
|
||||
* object which is allocated. Ignore objects which have not been
|
||||
* modified since txg.
|
||||
*
|
||||
* XXX Can only be called on a objset with no dirty data.
|
||||
*
|
||||
* Returns 0 on success, or ENOENT if there are no more objects.
|
||||
*/
|
||||
int dmu_object_next(objset_t *os, uint64_t *objectp,
|
||||
boolean_t hole, uint64_t txg);
|
||||
|
||||
/*
|
||||
* Set the data blocksize for an object.
|
||||
*
|
||||
* The object cannot have any blocks allcated beyond the first. If
|
||||
* the first block is allocated already, the new size must be greater
|
||||
* than the current block size. If these conditions are not met,
|
||||
* ENOTSUP will be returned.
|
||||
*
|
||||
* Returns 0 on success, or EBUSY if there are any holds on the object
|
||||
* contents, or ENOTSUP as described above.
|
||||
*/
|
||||
int dmu_object_set_blocksize(objset_t *os, uint64_t object, uint64_t size,
|
||||
int ibs, dmu_tx_t *tx);
|
||||
|
||||
/*
|
||||
* Set the checksum property on a dnode. The new checksum algorithm will
|
||||
* apply to all newly written blocks; existing blocks will not be affected.
|
||||
*/
|
||||
void dmu_object_set_checksum(objset_t *os, uint64_t object, uint8_t checksum,
|
||||
dmu_tx_t *tx);
|
||||
|
||||
/*
|
||||
* Set the compress property on a dnode. The new compression algorithm will
|
||||
* apply to all newly written blocks; existing blocks will not be affected.
|
||||
*/
|
||||
void dmu_object_set_compress(objset_t *os, uint64_t object, uint8_t compress,
|
||||
dmu_tx_t *tx);
|
||||
|
||||
/*
|
||||
* Decide how to write a block: checksum, compression, number of copies, etc.
|
||||
*/
|
||||
#define WP_NOFILL 0x1
|
||||
#define WP_DMU_SYNC 0x2
|
||||
#define WP_SPILL 0x4
|
||||
|
||||
void dmu_write_policy(objset_t *os, struct dnode *dn, int level, int wp,
|
||||
struct zio_prop *zp);
|
||||
/*
|
||||
* The bonus data is accessed more or less like a regular buffer.
|
||||
* You must dmu_bonus_hold() to get the buffer, which will give you a
|
||||
* dmu_buf_t with db_offset==-1ULL, and db_size = the size of the bonus
|
||||
* data. As with any normal buffer, you must call dmu_buf_read() to
|
||||
* read db_data, dmu_buf_will_dirty() before modifying it, and the
|
||||
* object must be held in an assigned transaction before calling
|
||||
* dmu_buf_will_dirty. You may use dmu_buf_set_user() on the bonus
|
||||
* buffer as well. You must release your hold with dmu_buf_rele().
|
||||
*/
|
||||
int dmu_bonus_hold(objset_t *os, uint64_t object, void *tag, dmu_buf_t **);
|
||||
int dmu_bonus_max(void);
|
||||
int dmu_set_bonus(dmu_buf_t *, int, dmu_tx_t *);
|
||||
int dmu_set_bonustype(dmu_buf_t *, dmu_object_type_t, dmu_tx_t *);
|
||||
dmu_object_type_t dmu_get_bonustype(dmu_buf_t *);
|
||||
int dmu_rm_spill(objset_t *, uint64_t, dmu_tx_t *);
|
||||
|
||||
/*
|
||||
* Special spill buffer support used by "SA" framework
|
||||
*/
|
||||
|
||||
int dmu_spill_hold_by_bonus(dmu_buf_t *bonus, void *tag, dmu_buf_t **dbp);
|
||||
int dmu_spill_hold_by_dnode(struct dnode *dn, uint32_t flags,
|
||||
void *tag, dmu_buf_t **dbp);
|
||||
int dmu_spill_hold_existing(dmu_buf_t *bonus, void *tag, dmu_buf_t **dbp);
|
||||
|
||||
/*
|
||||
* Obtain the DMU buffer from the specified object which contains the
|
||||
* specified offset. dmu_buf_hold() puts a "hold" on the buffer, so
|
||||
* that it will remain in memory. You must release the hold with
|
||||
* dmu_buf_rele(). You musn't access the dmu_buf_t after releasing your
|
||||
* hold. You must have a hold on any dmu_buf_t* you pass to the DMU.
|
||||
*
|
||||
* You must call dmu_buf_read, dmu_buf_will_dirty, or dmu_buf_will_fill
|
||||
* on the returned buffer before reading or writing the buffer's
|
||||
* db_data. The comments for those routines describe what particular
|
||||
* operations are valid after calling them.
|
||||
*
|
||||
* The object number must be a valid, allocated object number.
|
||||
*/
|
||||
int dmu_buf_hold(objset_t *os, uint64_t object, uint64_t offset,
|
||||
void *tag, dmu_buf_t **, int flags);
|
||||
void dmu_buf_add_ref(dmu_buf_t *db, void* tag);
|
||||
void dmu_buf_rele(dmu_buf_t *db, void *tag);
|
||||
uint64_t dmu_buf_refcount(dmu_buf_t *db);
|
||||
|
||||
/*
|
||||
* dmu_buf_hold_array holds the DMU buffers which contain all bytes in a
|
||||
* range of an object. A pointer to an array of dmu_buf_t*'s is
|
||||
* returned (in *dbpp).
|
||||
*
|
||||
* dmu_buf_rele_array releases the hold on an array of dmu_buf_t*'s, and
|
||||
* frees the array. The hold on the array of buffers MUST be released
|
||||
* with dmu_buf_rele_array. You can NOT release the hold on each buffer
|
||||
* individually with dmu_buf_rele.
|
||||
*/
|
||||
int dmu_buf_hold_array_by_bonus(dmu_buf_t *db, uint64_t offset,
|
||||
uint64_t length, int read, void *tag, int *numbufsp, dmu_buf_t ***dbpp);
|
||||
void dmu_buf_rele_array(dmu_buf_t **, int numbufs, void *tag);
|
||||
|
||||
/*
|
||||
* Returns NULL on success, or the existing user ptr if it's already
|
||||
* been set.
|
||||
*
|
||||
* user_ptr is for use by the user and can be obtained via dmu_buf_get_user().
|
||||
*
|
||||
* user_data_ptr_ptr should be NULL, or a pointer to a pointer which
|
||||
* will be set to db->db_data when you are allowed to access it. Note
|
||||
* that db->db_data (the pointer) can change when you do dmu_buf_read(),
|
||||
* dmu_buf_tryupgrade(), dmu_buf_will_dirty(), or dmu_buf_will_fill().
|
||||
* *user_data_ptr_ptr will be set to the new value when it changes.
|
||||
*
|
||||
* If non-NULL, pageout func will be called when this buffer is being
|
||||
* excised from the cache, so that you can clean up the data structure
|
||||
* pointed to by user_ptr.
|
||||
*
|
||||
* dmu_evict_user() will call the pageout func for all buffers in a
|
||||
* objset with a given pageout func.
|
||||
*/
|
||||
void *dmu_buf_set_user(dmu_buf_t *db, void *user_ptr, void *user_data_ptr_ptr,
|
||||
dmu_buf_evict_func_t *pageout_func);
|
||||
/*
|
||||
* set_user_ie is the same as set_user, but request immediate eviction
|
||||
* when hold count goes to zero.
|
||||
*/
|
||||
void *dmu_buf_set_user_ie(dmu_buf_t *db, void *user_ptr,
|
||||
void *user_data_ptr_ptr, dmu_buf_evict_func_t *pageout_func);
|
||||
void *dmu_buf_update_user(dmu_buf_t *db_fake, void *old_user_ptr,
|
||||
void *user_ptr, void *user_data_ptr_ptr,
|
||||
dmu_buf_evict_func_t *pageout_func);
|
||||
void dmu_evict_user(objset_t *os, dmu_buf_evict_func_t *func);
|
||||
|
||||
/*
|
||||
* Returns the user_ptr set with dmu_buf_set_user(), or NULL if not set.
|
||||
*/
|
||||
void *dmu_buf_get_user(dmu_buf_t *db);
|
||||
|
||||
/*
|
||||
* Indicate that you are going to modify the buffer's data (db_data).
|
||||
*
|
||||
* The transaction (tx) must be assigned to a txg (ie. you've called
|
||||
* dmu_tx_assign()). The buffer's object must be held in the tx
|
||||
* (ie. you've called dmu_tx_hold_object(tx, db->db_object)).
|
||||
*/
|
||||
void dmu_buf_will_dirty(dmu_buf_t *db, dmu_tx_t *tx);
|
||||
|
||||
/*
|
||||
* Tells if the given dbuf is freeable.
|
||||
*/
|
||||
boolean_t dmu_buf_freeable(dmu_buf_t *);
|
||||
|
||||
/*
|
||||
* You must create a transaction, then hold the objects which you will
|
||||
* (or might) modify as part of this transaction. Then you must assign
|
||||
* the transaction to a transaction group. Once the transaction has
|
||||
* been assigned, you can modify buffers which belong to held objects as
|
||||
* part of this transaction. You can't modify buffers before the
|
||||
* transaction has been assigned; you can't modify buffers which don't
|
||||
* belong to objects which this transaction holds; you can't hold
|
||||
* objects once the transaction has been assigned. You may hold an
|
||||
* object which you are going to free (with dmu_object_free()), but you
|
||||
* don't have to.
|
||||
*
|
||||
* You can abort the transaction before it has been assigned.
|
||||
*
|
||||
* Note that you may hold buffers (with dmu_buf_hold) at any time,
|
||||
* regardless of transaction state.
|
||||
*/
|
||||
|
||||
#define DMU_NEW_OBJECT (-1ULL)
|
||||
#define DMU_OBJECT_END (-1ULL)
|
||||
|
||||
dmu_tx_t *dmu_tx_create(objset_t *os);
|
||||
void dmu_tx_hold_write(dmu_tx_t *tx, uint64_t object, uint64_t off, int len);
|
||||
void dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object, uint64_t off,
|
||||
uint64_t len);
|
||||
void dmu_tx_hold_zap(dmu_tx_t *tx, uint64_t object, int add, const char *name);
|
||||
void dmu_tx_hold_bonus(dmu_tx_t *tx, uint64_t object);
|
||||
void dmu_tx_hold_spill(dmu_tx_t *tx, uint64_t object);
|
||||
void dmu_tx_hold_sa(dmu_tx_t *tx, struct sa_handle *hdl, boolean_t may_grow);
|
||||
void dmu_tx_hold_sa_create(dmu_tx_t *tx, int total_size);
|
||||
void dmu_tx_abort(dmu_tx_t *tx);
|
||||
int dmu_tx_assign(dmu_tx_t *tx, uint64_t txg_how);
|
||||
void dmu_tx_wait(dmu_tx_t *tx);
|
||||
void dmu_tx_commit(dmu_tx_t *tx);
|
||||
|
||||
/*
|
||||
* To register a commit callback, dmu_tx_callback_register() must be called.
|
||||
*
|
||||
* dcb_data is a pointer to caller private data that is passed on as a
|
||||
* callback parameter. The caller is responsible for properly allocating and
|
||||
* freeing it.
|
||||
*
|
||||
* When registering a callback, the transaction must be already created, but
|
||||
* it cannot be committed or aborted. It can be assigned to a txg or not.
|
||||
*
|
||||
* The callback will be called after the transaction has been safely written
|
||||
* to stable storage and will also be called if the dmu_tx is aborted.
|
||||
* If there is any error which prevents the transaction from being committed to
|
||||
* disk, the callback will be called with a value of error != 0.
|
||||
*/
|
||||
typedef void dmu_tx_callback_func_t(void *dcb_data, int error);
|
||||
|
||||
void dmu_tx_callback_register(dmu_tx_t *tx, dmu_tx_callback_func_t *dcb_func,
|
||||
void *dcb_data);
|
||||
|
||||
/*
|
||||
* Free up the data blocks for a defined range of a file. If size is
|
||||
* zero, the range from offset to end-of-file is freed.
|
||||
*/
|
||||
int dmu_free_range(objset_t *os, uint64_t object, uint64_t offset,
|
||||
uint64_t size, dmu_tx_t *tx);
|
||||
int dmu_free_long_range(objset_t *os, uint64_t object, uint64_t offset,
|
||||
uint64_t size);
|
||||
int dmu_free_object(objset_t *os, uint64_t object);
|
||||
|
||||
/*
|
||||
* Convenience functions.
|
||||
*
|
||||
* Canfail routines will return 0 on success, or an errno if there is a
|
||||
* nonrecoverable I/O error.
|
||||
*/
|
||||
#define DMU_READ_PREFETCH 0 /* prefetch */
|
||||
#define DMU_READ_NO_PREFETCH 1 /* don't prefetch */
|
||||
int dmu_read(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
|
||||
void *buf, uint32_t flags);
|
||||
void dmu_write(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
|
||||
const void *buf, dmu_tx_t *tx);
|
||||
void dmu_prealloc(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
|
||||
dmu_tx_t *tx);
|
||||
int dmu_read_uio(objset_t *os, uint64_t object, struct uio *uio, uint64_t size);
|
||||
int dmu_write_uio(objset_t *os, uint64_t object, struct uio *uio, uint64_t size,
|
||||
dmu_tx_t *tx);
|
||||
int dmu_write_uio_dbuf(dmu_buf_t *zdb, struct uio *uio, uint64_t size,
|
||||
dmu_tx_t *tx);
|
||||
int dmu_write_pages(objset_t *os, uint64_t object, uint64_t offset,
|
||||
uint64_t size, struct page *pp, dmu_tx_t *tx);
|
||||
struct arc_buf *dmu_request_arcbuf(dmu_buf_t *handle, int size);
|
||||
void dmu_return_arcbuf(struct arc_buf *buf);
|
||||
void dmu_assign_arcbuf(dmu_buf_t *handle, uint64_t offset, struct arc_buf *buf,
|
||||
dmu_tx_t *tx);
|
||||
int dmu_xuio_init(struct xuio *uio, int niov);
|
||||
void dmu_xuio_fini(struct xuio *uio);
|
||||
int dmu_xuio_add(struct xuio *uio, struct arc_buf *abuf, offset_t off,
|
||||
size_t n);
|
||||
int dmu_xuio_cnt(struct xuio *uio);
|
||||
struct arc_buf *dmu_xuio_arcbuf(struct xuio *uio, int i);
|
||||
void dmu_xuio_clear(struct xuio *uio, int i);
|
||||
void xuio_stat_wbuf_copied();
|
||||
void xuio_stat_wbuf_nocopy();
|
||||
|
||||
extern int zfs_prefetch_disable;
|
||||
|
||||
/*
|
||||
* Asynchronously try to read in the data.
|
||||
*/
|
||||
void dmu_prefetch(objset_t *os, uint64_t object, uint64_t offset,
|
||||
uint64_t len);
|
||||
|
||||
typedef struct dmu_object_info {
|
||||
/* All sizes are in bytes unless otherwise indicated. */
|
||||
uint32_t doi_data_block_size;
|
||||
uint32_t doi_metadata_block_size;
|
||||
dmu_object_type_t doi_type;
|
||||
dmu_object_type_t doi_bonus_type;
|
||||
uint64_t doi_bonus_size;
|
||||
uint8_t doi_indirection; /* 2 = dnode->indirect->data */
|
||||
uint8_t doi_checksum;
|
||||
uint8_t doi_compress;
|
||||
uint8_t doi_pad[5];
|
||||
uint64_t doi_physical_blocks_512; /* data + metadata, 512b blks */
|
||||
uint64_t doi_max_offset;
|
||||
uint64_t doi_fill_count; /* number of non-empty blocks */
|
||||
} dmu_object_info_t;
|
||||
|
||||
typedef void arc_byteswap_func_t(void *buf, size_t size);
|
||||
|
||||
typedef struct dmu_object_type_info {
|
||||
arc_byteswap_func_t *ot_byteswap;
|
||||
boolean_t ot_metadata;
|
||||
char *ot_name;
|
||||
} dmu_object_type_info_t;
|
||||
|
||||
extern const dmu_object_type_info_t dmu_ot[DMU_OT_NUMTYPES];
|
||||
|
||||
/*
|
||||
* Get information on a DMU object.
|
||||
*
|
||||
* Return 0 on success or ENOENT if object is not allocated.
|
||||
*
|
||||
* If doi is NULL, just indicates whether the object exists.
|
||||
*/
|
||||
int dmu_object_info(objset_t *os, uint64_t object, dmu_object_info_t *doi);
|
||||
void dmu_object_info_from_dnode(struct dnode *dn, dmu_object_info_t *doi);
|
||||
void dmu_object_info_from_db(dmu_buf_t *db, dmu_object_info_t *doi);
|
||||
void dmu_object_size_from_db(dmu_buf_t *db, uint32_t *blksize,
|
||||
u_longlong_t *nblk512);
|
||||
|
||||
typedef struct dmu_objset_stats {
|
||||
uint64_t dds_num_clones; /* number of clones of this */
|
||||
uint64_t dds_creation_txg;
|
||||
uint64_t dds_guid;
|
||||
dmu_objset_type_t dds_type;
|
||||
uint8_t dds_is_snapshot;
|
||||
uint8_t dds_inconsistent;
|
||||
char dds_origin[MAXNAMELEN];
|
||||
} dmu_objset_stats_t;
|
||||
|
||||
/*
|
||||
* Get stats on a dataset.
|
||||
*/
|
||||
void dmu_objset_fast_stat(objset_t *os, dmu_objset_stats_t *stat);
|
||||
|
||||
/*
|
||||
* Add entries to the nvlist for all the objset's properties. See
|
||||
* zfs_prop_table[] and zfs(1m) for details on the properties.
|
||||
*/
|
||||
void dmu_objset_stats(objset_t *os, struct nvlist *nv);
|
||||
|
||||
/*
|
||||
* Get the space usage statistics for statvfs().
|
||||
*
|
||||
* refdbytes is the amount of space "referenced" by this objset.
|
||||
* availbytes is the amount of space available to this objset, taking
|
||||
* into account quotas & reservations, assuming that no other objsets
|
||||
* use the space first. These values correspond to the 'referenced' and
|
||||
* 'available' properties, described in the zfs(1m) manpage.
|
||||
*
|
||||
* usedobjs and availobjs are the number of objects currently allocated,
|
||||
* and available.
|
||||
*/
|
||||
void dmu_objset_space(objset_t *os, uint64_t *refdbytesp, uint64_t *availbytesp,
|
||||
uint64_t *usedobjsp, uint64_t *availobjsp);
|
||||
|
||||
/*
|
||||
* The fsid_guid is a 56-bit ID that can change to avoid collisions.
|
||||
* (Contrast with the ds_guid which is a 64-bit ID that will never
|
||||
* change, so there is a small probability that it will collide.)
|
||||
*/
|
||||
uint64_t dmu_objset_fsid_guid(objset_t *os);
|
||||
|
||||
/*
|
||||
* Get the [cm]time for an objset's snapshot dir
|
||||
*/
|
||||
timestruc_t dmu_objset_snap_cmtime(objset_t *os);
|
||||
|
||||
int dmu_objset_is_snapshot(objset_t *os);
|
||||
|
||||
extern struct spa *dmu_objset_spa(objset_t *os);
|
||||
extern struct zilog *dmu_objset_zil(objset_t *os);
|
||||
extern struct dsl_pool *dmu_objset_pool(objset_t *os);
|
||||
extern struct dsl_dataset *dmu_objset_ds(objset_t *os);
|
||||
extern void dmu_objset_name(objset_t *os, char *buf);
|
||||
extern dmu_objset_type_t dmu_objset_type(objset_t *os);
|
||||
extern uint64_t dmu_objset_id(objset_t *os);
|
||||
extern uint64_t dmu_objset_syncprop(objset_t *os);
|
||||
extern uint64_t dmu_objset_logbias(objset_t *os);
|
||||
extern int dmu_snapshot_list_next(objset_t *os, int namelen, char *name,
|
||||
uint64_t *id, uint64_t *offp, boolean_t *case_conflict);
|
||||
extern int dmu_snapshot_realname(objset_t *os, char *name, char *real,
|
||||
int maxlen, boolean_t *conflict);
|
||||
extern int dmu_dir_list_next(objset_t *os, int namelen, char *name,
|
||||
uint64_t *idp, uint64_t *offp);
|
||||
|
||||
typedef int objset_used_cb_t(dmu_object_type_t bonustype,
|
||||
void *bonus, uint64_t *userp, uint64_t *groupp);
|
||||
extern void dmu_objset_register_type(dmu_objset_type_t ost,
|
||||
objset_used_cb_t *cb);
|
||||
extern void dmu_objset_set_user(objset_t *os, void *user_ptr);
|
||||
extern void *dmu_objset_get_user(objset_t *os);
|
||||
|
||||
/*
|
||||
* Return the txg number for the given assigned transaction.
|
||||
*/
|
||||
uint64_t dmu_tx_get_txg(dmu_tx_t *tx);
|
||||
|
||||
/*
|
||||
* Synchronous write.
|
||||
* If a parent zio is provided this function initiates a write on the
|
||||
* provided buffer as a child of the parent zio.
|
||||
* In the absence of a parent zio, the write is completed synchronously.
|
||||
* At write completion, blk is filled with the bp of the written block.
|
||||
* Note that while the data covered by this function will be on stable
|
||||
* storage when the write completes this new data does not become a
|
||||
* permanent part of the file until the associated transaction commits.
|
||||
*/
|
||||
|
||||
/*
|
||||
* {zfs,zvol,ztest}_get_done() args
|
||||
*/
|
||||
typedef struct zgd {
|
||||
struct zilog *zgd_zilog;
|
||||
struct blkptr *zgd_bp;
|
||||
dmu_buf_t *zgd_db;
|
||||
struct rl *zgd_rl;
|
||||
void *zgd_private;
|
||||
} zgd_t;
|
||||
|
||||
typedef void dmu_sync_cb_t(zgd_t *arg, int error);
|
||||
int dmu_sync(struct zio *zio, uint64_t txg, dmu_sync_cb_t *done, zgd_t *zgd);
|
||||
|
||||
/*
|
||||
* Find the next hole or data block in file starting at *off
|
||||
* Return found offset in *off. Return ESRCH for end of file.
|
||||
*/
|
||||
int dmu_offset_next(objset_t *os, uint64_t object, boolean_t hole,
|
||||
uint64_t *off);
|
||||
|
||||
/*
|
||||
* Initial setup and final teardown.
|
||||
*/
|
||||
extern void dmu_init(void);
|
||||
extern void dmu_fini(void);
|
||||
|
||||
typedef void (*dmu_traverse_cb_t)(objset_t *os, void *arg, struct blkptr *bp,
|
||||
uint64_t object, uint64_t offset, int len);
|
||||
void dmu_traverse_objset(objset_t *os, uint64_t txg_start,
|
||||
dmu_traverse_cb_t cb, void *arg);
|
||||
|
||||
int dmu_sendbackup(objset_t *tosnap, objset_t *fromsnap, boolean_t fromorigin,
|
||||
struct vnode *vp, offset_t *off);
|
||||
|
||||
typedef struct dmu_recv_cookie {
|
||||
/*
|
||||
* This structure is opaque!
|
||||
*
|
||||
* If logical and real are different, we are recving the stream
|
||||
* into the "real" temporary clone, and then switching it with
|
||||
* the "logical" target.
|
||||
*/
|
||||
struct dsl_dataset *drc_logical_ds;
|
||||
struct dsl_dataset *drc_real_ds;
|
||||
struct drr_begin *drc_drrb;
|
||||
char *drc_tosnap;
|
||||
char *drc_top_ds;
|
||||
boolean_t drc_newfs;
|
||||
boolean_t drc_force;
|
||||
} dmu_recv_cookie_t;
|
||||
|
||||
int dmu_recv_begin(char *tofs, char *tosnap, char *topds, struct drr_begin *,
|
||||
boolean_t force, objset_t *origin, dmu_recv_cookie_t *);
|
||||
int dmu_recv_stream(dmu_recv_cookie_t *drc, struct vnode *vp, offset_t *voffp,
|
||||
int cleanup_fd, uint64_t *action_handlep);
|
||||
int dmu_recv_end(dmu_recv_cookie_t *drc);
|
||||
|
||||
int dmu_diff(objset_t *tosnap, objset_t *fromsnap, struct vnode *vp,
|
||||
offset_t *off);
|
||||
|
||||
/* CRC64 table */
|
||||
#define ZFS_CRC64_POLY 0xC96C5795D7870F42ULL /* ECMA-182, reflected form */
|
||||
extern uint64_t zfs_crc64_table[256];
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_DMU_H */
|
272
uts/common/fs/zfs/sys/dmu_impl.h
Normal file
272
uts/common/fs/zfs/sys/dmu_impl.h
Normal file
@ -0,0 +1,272 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2010 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_DMU_IMPL_H
|
||||
#define _SYS_DMU_IMPL_H
|
||||
|
||||
#include <sys/txg_impl.h>
|
||||
#include <sys/zio.h>
|
||||
#include <sys/dnode.h>
|
||||
#include <sys/zfs_context.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
/*
|
||||
* This is the locking strategy for the DMU. Numbers in parenthesis are
|
||||
* cases that use that lock order, referenced below:
|
||||
*
|
||||
* ARC is self-contained
|
||||
* bplist is self-contained
|
||||
* refcount is self-contained
|
||||
* txg is self-contained (hopefully!)
|
||||
* zst_lock
|
||||
* zf_rwlock
|
||||
*
|
||||
* XXX try to improve evicting path?
|
||||
*
|
||||
* dp_config_rwlock > os_obj_lock > dn_struct_rwlock >
|
||||
* dn_dbufs_mtx > hash_mutexes > db_mtx > dd_lock > leafs
|
||||
*
|
||||
* dp_config_rwlock
|
||||
* must be held before: everything
|
||||
* protects dd namespace changes
|
||||
* protects property changes globally
|
||||
* held from:
|
||||
* dsl_dir_open/r:
|
||||
* dsl_dir_create_sync/w:
|
||||
* dsl_dir_sync_destroy/w:
|
||||
* dsl_dir_rename_sync/w:
|
||||
* dsl_prop_changed_notify/r:
|
||||
*
|
||||
* os_obj_lock
|
||||
* must be held before:
|
||||
* everything except dp_config_rwlock
|
||||
* protects os_obj_next
|
||||
* held from:
|
||||
* dmu_object_alloc: dn_dbufs_mtx, db_mtx, hash_mutexes, dn_struct_rwlock
|
||||
*
|
||||
* dn_struct_rwlock
|
||||
* must be held before:
|
||||
* everything except dp_config_rwlock and os_obj_lock
|
||||
* protects structure of dnode (eg. nlevels)
|
||||
* db_blkptr can change when syncing out change to nlevels
|
||||
* dn_maxblkid
|
||||
* dn_nlevels
|
||||
* dn_*blksz*
|
||||
* phys nlevels, maxblkid, physical blkptr_t's (?)
|
||||
* held from:
|
||||
* callers of dbuf_read_impl, dbuf_hold[_impl], dbuf_prefetch
|
||||
* dmu_object_info_from_dnode: dn_dirty_mtx (dn_datablksz)
|
||||
* dmu_tx_count_free:
|
||||
* dbuf_read_impl: db_mtx, dmu_zfetch()
|
||||
* dmu_zfetch: zf_rwlock/r, zst_lock, dbuf_prefetch()
|
||||
* dbuf_new_size: db_mtx
|
||||
* dbuf_dirty: db_mtx
|
||||
* dbuf_findbp: (callers, phys? - the real need)
|
||||
* dbuf_create: dn_dbufs_mtx, hash_mutexes, db_mtx (phys?)
|
||||
* dbuf_prefetch: dn_dirty_mtx, hash_mutexes, db_mtx, dn_dbufs_mtx
|
||||
* dbuf_hold_impl: hash_mutexes, db_mtx, dn_dbufs_mtx, dbuf_findbp()
|
||||
* dnode_sync/w (increase_indirection): db_mtx (phys)
|
||||
* dnode_set_blksz/w: dn_dbufs_mtx (dn_*blksz*)
|
||||
* dnode_new_blkid/w: (dn_maxblkid)
|
||||
* dnode_free_range/w: dn_dirty_mtx (dn_maxblkid)
|
||||
* dnode_next_offset: (phys)
|
||||
*
|
||||
* dn_dbufs_mtx
|
||||
* must be held before:
|
||||
* db_mtx, hash_mutexes
|
||||
* protects:
|
||||
* dn_dbufs
|
||||
* dn_evicted
|
||||
* held from:
|
||||
* dmu_evict_user: db_mtx (dn_dbufs)
|
||||
* dbuf_free_range: db_mtx (dn_dbufs)
|
||||
* dbuf_remove_ref: db_mtx, callees:
|
||||
* dbuf_hash_remove: hash_mutexes, db_mtx
|
||||
* dbuf_create: hash_mutexes, db_mtx (dn_dbufs)
|
||||
* dnode_set_blksz: (dn_dbufs)
|
||||
*
|
||||
* hash_mutexes (global)
|
||||
* must be held before:
|
||||
* db_mtx
|
||||
* protects dbuf_hash_table (global) and db_hash_next
|
||||
* held from:
|
||||
* dbuf_find: db_mtx
|
||||
* dbuf_hash_insert: db_mtx
|
||||
* dbuf_hash_remove: db_mtx
|
||||
*
|
||||
* db_mtx (meta-leaf)
|
||||
* must be held before:
|
||||
* dn_mtx, dn_dirty_mtx, dd_lock (leaf mutexes)
|
||||
* protects:
|
||||
* db_state
|
||||
* db_holds
|
||||
* db_buf
|
||||
* db_changed
|
||||
* db_data_pending
|
||||
* db_dirtied
|
||||
* db_link
|
||||
* db_dirty_node (??)
|
||||
* db_dirtycnt
|
||||
* db_d.*
|
||||
* db.*
|
||||
* held from:
|
||||
* dbuf_dirty: dn_mtx, dn_dirty_mtx
|
||||
* dbuf_dirty->dsl_dir_willuse_space: dd_lock
|
||||
* dbuf_dirty->dbuf_new_block->dsl_dataset_block_freeable: dd_lock
|
||||
* dbuf_undirty: dn_dirty_mtx (db_d)
|
||||
* dbuf_write_done: dn_dirty_mtx (db_state)
|
||||
* dbuf_*
|
||||
* dmu_buf_update_user: none (db_d)
|
||||
* dmu_evict_user: none (db_d) (maybe can eliminate)
|
||||
* dbuf_find: none (db_holds)
|
||||
* dbuf_hash_insert: none (db_holds)
|
||||
* dmu_buf_read_array_impl: none (db_state, db_changed)
|
||||
* dmu_sync: none (db_dirty_node, db_d)
|
||||
* dnode_reallocate: none (db)
|
||||
*
|
||||
* dn_mtx (leaf)
|
||||
* protects:
|
||||
* dn_dirty_dbufs
|
||||
* dn_ranges
|
||||
* phys accounting
|
||||
* dn_allocated_txg
|
||||
* dn_free_txg
|
||||
* dn_assigned_txg
|
||||
* dd_assigned_tx
|
||||
* dn_notxholds
|
||||
* dn_dirtyctx
|
||||
* dn_dirtyctx_firstset
|
||||
* (dn_phys copy fields?)
|
||||
* (dn_phys contents?)
|
||||
* held from:
|
||||
* dnode_*
|
||||
* dbuf_dirty: none
|
||||
* dbuf_sync: none (phys accounting)
|
||||
* dbuf_undirty: none (dn_ranges, dn_dirty_dbufs)
|
||||
* dbuf_write_done: none (phys accounting)
|
||||
* dmu_object_info_from_dnode: none (accounting)
|
||||
* dmu_tx_commit: none
|
||||
* dmu_tx_hold_object_impl: none
|
||||
* dmu_tx_try_assign: dn_notxholds(cv)
|
||||
* dmu_tx_unassign: none
|
||||
*
|
||||
* dd_lock
|
||||
* must be held before:
|
||||
* ds_lock
|
||||
* ancestors' dd_lock
|
||||
* protects:
|
||||
* dd_prop_cbs
|
||||
* dd_sync_*
|
||||
* dd_used_bytes
|
||||
* dd_tempreserved
|
||||
* dd_space_towrite
|
||||
* dd_myname
|
||||
* dd_phys accounting?
|
||||
* held from:
|
||||
* dsl_dir_*
|
||||
* dsl_prop_changed_notify: none (dd_prop_cbs)
|
||||
* dsl_prop_register: none (dd_prop_cbs)
|
||||
* dsl_prop_unregister: none (dd_prop_cbs)
|
||||
* dsl_dataset_block_freeable: none (dd_sync_*)
|
||||
*
|
||||
* os_lock (leaf)
|
||||
* protects:
|
||||
* os_dirty_dnodes
|
||||
* os_free_dnodes
|
||||
* os_dnodes
|
||||
* os_downgraded_dbufs
|
||||
* dn_dirtyblksz
|
||||
* dn_dirty_link
|
||||
* held from:
|
||||
* dnode_create: none (os_dnodes)
|
||||
* dnode_destroy: none (os_dnodes)
|
||||
* dnode_setdirty: none (dn_dirtyblksz, os_*_dnodes)
|
||||
* dnode_free: none (dn_dirtyblksz, os_*_dnodes)
|
||||
*
|
||||
* ds_lock
|
||||
* protects:
|
||||
* ds_objset
|
||||
* ds_open_refcount
|
||||
* ds_snapname
|
||||
* ds_phys accounting
|
||||
* ds_phys userrefs zapobj
|
||||
* ds_reserved
|
||||
* held from:
|
||||
* dsl_dataset_*
|
||||
*
|
||||
* dr_mtx (leaf)
|
||||
* protects:
|
||||
* dr_children
|
||||
* held from:
|
||||
* dbuf_dirty
|
||||
* dbuf_undirty
|
||||
* dbuf_sync_indirect
|
||||
* dnode_new_blkid
|
||||
*/
|
||||
|
||||
struct objset;
|
||||
struct dmu_pool;
|
||||
|
||||
typedef struct dmu_xuio {
|
||||
int next;
|
||||
int cnt;
|
||||
struct arc_buf **bufs;
|
||||
iovec_t *iovp;
|
||||
} dmu_xuio_t;
|
||||
|
||||
typedef struct xuio_stats {
|
||||
/* loaned yet not returned arc_buf */
|
||||
kstat_named_t xuiostat_onloan_rbuf;
|
||||
kstat_named_t xuiostat_onloan_wbuf;
|
||||
/* whether a copy is made when loaning out a read buffer */
|
||||
kstat_named_t xuiostat_rbuf_copied;
|
||||
kstat_named_t xuiostat_rbuf_nocopy;
|
||||
/* whether a copy is made when assigning a write buffer */
|
||||
kstat_named_t xuiostat_wbuf_copied;
|
||||
kstat_named_t xuiostat_wbuf_nocopy;
|
||||
} xuio_stats_t;
|
||||
|
||||
static xuio_stats_t xuio_stats = {
|
||||
{ "onloan_read_buf", KSTAT_DATA_UINT64 },
|
||||
{ "onloan_write_buf", KSTAT_DATA_UINT64 },
|
||||
{ "read_buf_copied", KSTAT_DATA_UINT64 },
|
||||
{ "read_buf_nocopy", KSTAT_DATA_UINT64 },
|
||||
{ "write_buf_copied", KSTAT_DATA_UINT64 },
|
||||
{ "write_buf_nocopy", KSTAT_DATA_UINT64 }
|
||||
};
|
||||
|
||||
#define XUIOSTAT_INCR(stat, val) \
|
||||
atomic_add_64(&xuio_stats.stat.value.ui64, (val))
|
||||
#define XUIOSTAT_BUMP(stat) XUIOSTAT_INCR(stat, 1)
|
||||
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_DMU_IMPL_H */
|
183
uts/common/fs/zfs/sys/dmu_objset.h
Normal file
183
uts/common/fs/zfs/sys/dmu_objset.h
Normal file
@ -0,0 +1,183 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
/* Portions Copyright 2010 Robert Milkowski */
|
||||
|
||||
#ifndef _SYS_DMU_OBJSET_H
|
||||
#define _SYS_DMU_OBJSET_H
|
||||
|
||||
#include <sys/spa.h>
|
||||
#include <sys/arc.h>
|
||||
#include <sys/txg.h>
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/dnode.h>
|
||||
#include <sys/zio.h>
|
||||
#include <sys/zil.h>
|
||||
#include <sys/sa.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
extern krwlock_t os_lock;
|
||||
|
||||
struct dsl_dataset;
|
||||
struct dmu_tx;
|
||||
|
||||
#define OBJSET_PHYS_SIZE 2048
|
||||
#define OBJSET_OLD_PHYS_SIZE 1024
|
||||
|
||||
#define OBJSET_BUF_HAS_USERUSED(buf) \
|
||||
(arc_buf_size(buf) > OBJSET_OLD_PHYS_SIZE)
|
||||
|
||||
#define OBJSET_FLAG_USERACCOUNTING_COMPLETE (1ULL<<0)
|
||||
|
||||
typedef struct objset_phys {
|
||||
dnode_phys_t os_meta_dnode;
|
||||
zil_header_t os_zil_header;
|
||||
uint64_t os_type;
|
||||
uint64_t os_flags;
|
||||
char os_pad[OBJSET_PHYS_SIZE - sizeof (dnode_phys_t)*3 -
|
||||
sizeof (zil_header_t) - sizeof (uint64_t)*2];
|
||||
dnode_phys_t os_userused_dnode;
|
||||
dnode_phys_t os_groupused_dnode;
|
||||
} objset_phys_t;
|
||||
|
||||
struct objset {
|
||||
/* Immutable: */
|
||||
struct dsl_dataset *os_dsl_dataset;
|
||||
spa_t *os_spa;
|
||||
arc_buf_t *os_phys_buf;
|
||||
objset_phys_t *os_phys;
|
||||
/*
|
||||
* The following "special" dnodes have no parent and are exempt from
|
||||
* dnode_move(), but they root their descendents in this objset using
|
||||
* handles anyway, so that all access to dnodes from dbufs consistently
|
||||
* uses handles.
|
||||
*/
|
||||
dnode_handle_t os_meta_dnode;
|
||||
dnode_handle_t os_userused_dnode;
|
||||
dnode_handle_t os_groupused_dnode;
|
||||
zilog_t *os_zil;
|
||||
|
||||
/* can change, under dsl_dir's locks: */
|
||||
uint8_t os_checksum;
|
||||
uint8_t os_compress;
|
||||
uint8_t os_copies;
|
||||
uint8_t os_dedup_checksum;
|
||||
uint8_t os_dedup_verify;
|
||||
uint8_t os_logbias;
|
||||
uint8_t os_primary_cache;
|
||||
uint8_t os_secondary_cache;
|
||||
uint8_t os_sync;
|
||||
|
||||
/* no lock needed: */
|
||||
struct dmu_tx *os_synctx; /* XXX sketchy */
|
||||
blkptr_t *os_rootbp;
|
||||
zil_header_t os_zil_header;
|
||||
list_t os_synced_dnodes;
|
||||
uint64_t os_flags;
|
||||
|
||||
/* Protected by os_obj_lock */
|
||||
kmutex_t os_obj_lock;
|
||||
uint64_t os_obj_next;
|
||||
|
||||
/* Protected by os_lock */
|
||||
kmutex_t os_lock;
|
||||
list_t os_dirty_dnodes[TXG_SIZE];
|
||||
list_t os_free_dnodes[TXG_SIZE];
|
||||
list_t os_dnodes;
|
||||
list_t os_downgraded_dbufs;
|
||||
|
||||
/* stuff we store for the user */
|
||||
kmutex_t os_user_ptr_lock;
|
||||
void *os_user_ptr;
|
||||
|
||||
/* SA layout/attribute registration */
|
||||
sa_os_t *os_sa;
|
||||
};
|
||||
|
||||
#define DMU_META_OBJSET 0
|
||||
#define DMU_META_DNODE_OBJECT 0
|
||||
#define DMU_OBJECT_IS_SPECIAL(obj) ((int64_t)(obj) <= 0)
|
||||
#define DMU_META_DNODE(os) ((os)->os_meta_dnode.dnh_dnode)
|
||||
#define DMU_USERUSED_DNODE(os) ((os)->os_userused_dnode.dnh_dnode)
|
||||
#define DMU_GROUPUSED_DNODE(os) ((os)->os_groupused_dnode.dnh_dnode)
|
||||
|
||||
#define DMU_OS_IS_L2CACHEABLE(os) \
|
||||
((os)->os_secondary_cache == ZFS_CACHE_ALL || \
|
||||
(os)->os_secondary_cache == ZFS_CACHE_METADATA)
|
||||
|
||||
/* called from zpl */
|
||||
int dmu_objset_hold(const char *name, void *tag, objset_t **osp);
|
||||
int dmu_objset_own(const char *name, dmu_objset_type_t type,
|
||||
boolean_t readonly, void *tag, objset_t **osp);
|
||||
void dmu_objset_rele(objset_t *os, void *tag);
|
||||
void dmu_objset_disown(objset_t *os, void *tag);
|
||||
int dmu_objset_from_ds(struct dsl_dataset *ds, objset_t **osp);
|
||||
|
||||
int dmu_objset_create(const char *name, dmu_objset_type_t type, uint64_t flags,
|
||||
void (*func)(objset_t *os, void *arg, cred_t *cr, dmu_tx_t *tx), void *arg);
|
||||
int dmu_objset_clone(const char *name, struct dsl_dataset *clone_origin,
|
||||
uint64_t flags);
|
||||
int dmu_objset_destroy(const char *name, boolean_t defer);
|
||||
int dmu_objset_snapshot(char *fsname, char *snapname, char *tag,
|
||||
struct nvlist *props, boolean_t recursive, boolean_t temporary, int fd);
|
||||
void dmu_objset_stats(objset_t *os, nvlist_t *nv);
|
||||
void dmu_objset_fast_stat(objset_t *os, dmu_objset_stats_t *stat);
|
||||
void dmu_objset_space(objset_t *os, uint64_t *refdbytesp, uint64_t *availbytesp,
|
||||
uint64_t *usedobjsp, uint64_t *availobjsp);
|
||||
uint64_t dmu_objset_fsid_guid(objset_t *os);
|
||||
int dmu_objset_find(char *name, int func(const char *, void *), void *arg,
|
||||
int flags);
|
||||
int dmu_objset_find_spa(spa_t *spa, const char *name,
|
||||
int func(spa_t *, uint64_t, const char *, void *), void *arg, int flags);
|
||||
int dmu_objset_prefetch(const char *name, void *arg);
|
||||
void dmu_objset_byteswap(void *buf, size_t size);
|
||||
int dmu_objset_evict_dbufs(objset_t *os);
|
||||
timestruc_t dmu_objset_snap_cmtime(objset_t *os);
|
||||
|
||||
/* called from dsl */
|
||||
void dmu_objset_sync(objset_t *os, zio_t *zio, dmu_tx_t *tx);
|
||||
boolean_t dmu_objset_is_dirty(objset_t *os, uint64_t txg);
|
||||
boolean_t dmu_objset_is_dirty_anywhere(objset_t *os);
|
||||
objset_t *dmu_objset_create_impl(spa_t *spa, struct dsl_dataset *ds,
|
||||
blkptr_t *bp, dmu_objset_type_t type, dmu_tx_t *tx);
|
||||
int dmu_objset_open_impl(spa_t *spa, struct dsl_dataset *ds, blkptr_t *bp,
|
||||
objset_t **osp);
|
||||
void dmu_objset_evict(objset_t *os);
|
||||
void dmu_objset_do_userquota_updates(objset_t *os, dmu_tx_t *tx);
|
||||
void dmu_objset_userquota_get_ids(dnode_t *dn, boolean_t before, dmu_tx_t *tx);
|
||||
boolean_t dmu_objset_userused_enabled(objset_t *os);
|
||||
int dmu_objset_userspace_upgrade(objset_t *os);
|
||||
boolean_t dmu_objset_userspace_present(objset_t *os);
|
||||
|
||||
void dmu_objset_init(void);
|
||||
void dmu_objset_fini(void);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_DMU_OBJSET_H */
|
64
uts/common/fs/zfs/sys/dmu_traverse.h
Normal file
64
uts/common/fs/zfs/sys/dmu_traverse.h
Normal file
@ -0,0 +1,64 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_DMU_TRAVERSE_H
|
||||
#define _SYS_DMU_TRAVERSE_H
|
||||
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/spa.h>
|
||||
#include <sys/zio.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
struct dnode_phys;
|
||||
struct dsl_dataset;
|
||||
struct zilog;
|
||||
struct arc_buf;
|
||||
|
||||
typedef int (blkptr_cb_t)(spa_t *spa, zilog_t *zilog, const blkptr_t *bp,
|
||||
struct arc_buf *pbuf, const zbookmark_t *zb, const struct dnode_phys *dnp,
|
||||
void *arg);
|
||||
|
||||
#define TRAVERSE_PRE (1<<0)
|
||||
#define TRAVERSE_POST (1<<1)
|
||||
#define TRAVERSE_PREFETCH_METADATA (1<<2)
|
||||
#define TRAVERSE_PREFETCH_DATA (1<<3)
|
||||
#define TRAVERSE_PREFETCH (TRAVERSE_PREFETCH_METADATA | TRAVERSE_PREFETCH_DATA)
|
||||
#define TRAVERSE_HARD (1<<4)
|
||||
|
||||
/* Special traverse error return value to indicate skipping of children */
|
||||
#define TRAVERSE_VISIT_NO_CHILDREN -1
|
||||
|
||||
int traverse_dataset(struct dsl_dataset *ds,
|
||||
uint64_t txg_start, int flags, blkptr_cb_t func, void *arg);
|
||||
int traverse_pool(spa_t *spa,
|
||||
uint64_t txg_start, int flags, blkptr_cb_t func, void *arg);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_DMU_TRAVERSE_H */
|
148
uts/common/fs/zfs/sys/dmu_tx.h
Normal file
148
uts/common/fs/zfs/sys/dmu_tx.h
Normal file
@ -0,0 +1,148 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2010 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_DMU_TX_H
|
||||
#define _SYS_DMU_TX_H
|
||||
|
||||
#include <sys/inttypes.h>
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/txg.h>
|
||||
#include <sys/refcount.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
struct dmu_buf_impl;
|
||||
struct dmu_tx_hold;
|
||||
struct dnode_link;
|
||||
struct dsl_pool;
|
||||
struct dnode;
|
||||
struct dsl_dir;
|
||||
|
||||
struct dmu_tx {
|
||||
/*
|
||||
* No synchronization is needed because a tx can only be handled
|
||||
* by one thread.
|
||||
*/
|
||||
list_t tx_holds; /* list of dmu_tx_hold_t */
|
||||
objset_t *tx_objset;
|
||||
struct dsl_dir *tx_dir;
|
||||
struct dsl_pool *tx_pool;
|
||||
uint64_t tx_txg;
|
||||
uint64_t tx_lastsnap_txg;
|
||||
uint64_t tx_lasttried_txg;
|
||||
txg_handle_t tx_txgh;
|
||||
void *tx_tempreserve_cookie;
|
||||
struct dmu_tx_hold *tx_needassign_txh;
|
||||
list_t tx_callbacks; /* list of dmu_tx_callback_t on this dmu_tx */
|
||||
uint8_t tx_anyobj;
|
||||
int tx_err;
|
||||
#ifdef ZFS_DEBUG
|
||||
uint64_t tx_space_towrite;
|
||||
uint64_t tx_space_tofree;
|
||||
uint64_t tx_space_tooverwrite;
|
||||
uint64_t tx_space_tounref;
|
||||
refcount_t tx_space_written;
|
||||
refcount_t tx_space_freed;
|
||||
#endif
|
||||
};
|
||||
|
||||
enum dmu_tx_hold_type {
|
||||
THT_NEWOBJECT,
|
||||
THT_WRITE,
|
||||
THT_BONUS,
|
||||
THT_FREE,
|
||||
THT_ZAP,
|
||||
THT_SPACE,
|
||||
THT_SPILL,
|
||||
THT_NUMTYPES
|
||||
};
|
||||
|
||||
typedef struct dmu_tx_hold {
|
||||
dmu_tx_t *txh_tx;
|
||||
list_node_t txh_node;
|
||||
struct dnode *txh_dnode;
|
||||
uint64_t txh_space_towrite;
|
||||
uint64_t txh_space_tofree;
|
||||
uint64_t txh_space_tooverwrite;
|
||||
uint64_t txh_space_tounref;
|
||||
uint64_t txh_memory_tohold;
|
||||
uint64_t txh_fudge;
|
||||
#ifdef ZFS_DEBUG
|
||||
enum dmu_tx_hold_type txh_type;
|
||||
uint64_t txh_arg1;
|
||||
uint64_t txh_arg2;
|
||||
#endif
|
||||
} dmu_tx_hold_t;
|
||||
|
||||
typedef struct dmu_tx_callback {
|
||||
list_node_t dcb_node; /* linked to tx_callbacks list */
|
||||
dmu_tx_callback_func_t *dcb_func; /* caller function pointer */
|
||||
void *dcb_data; /* caller private data */
|
||||
} dmu_tx_callback_t;
|
||||
|
||||
/*
|
||||
* These routines are defined in dmu.h, and are called by the user.
|
||||
*/
|
||||
dmu_tx_t *dmu_tx_create(objset_t *dd);
|
||||
int dmu_tx_assign(dmu_tx_t *tx, uint64_t txg_how);
|
||||
void dmu_tx_commit(dmu_tx_t *tx);
|
||||
void dmu_tx_abort(dmu_tx_t *tx);
|
||||
uint64_t dmu_tx_get_txg(dmu_tx_t *tx);
|
||||
void dmu_tx_wait(dmu_tx_t *tx);
|
||||
|
||||
void dmu_tx_callback_register(dmu_tx_t *tx, dmu_tx_callback_func_t *dcb_func,
|
||||
void *dcb_data);
|
||||
void dmu_tx_do_callbacks(list_t *cb_list, int error);
|
||||
|
||||
/*
|
||||
* These routines are defined in dmu_spa.h, and are called by the SPA.
|
||||
*/
|
||||
extern dmu_tx_t *dmu_tx_create_assigned(struct dsl_pool *dp, uint64_t txg);
|
||||
|
||||
/*
|
||||
* These routines are only called by the DMU.
|
||||
*/
|
||||
dmu_tx_t *dmu_tx_create_dd(dsl_dir_t *dd);
|
||||
int dmu_tx_is_syncing(dmu_tx_t *tx);
|
||||
int dmu_tx_private_ok(dmu_tx_t *tx);
|
||||
void dmu_tx_add_new_object(dmu_tx_t *tx, objset_t *os, uint64_t object);
|
||||
void dmu_tx_willuse_space(dmu_tx_t *tx, int64_t delta);
|
||||
void dmu_tx_dirty_buf(dmu_tx_t *tx, struct dmu_buf_impl *db);
|
||||
int dmu_tx_holds(dmu_tx_t *tx, uint64_t object);
|
||||
void dmu_tx_hold_space(dmu_tx_t *tx, uint64_t space);
|
||||
|
||||
#ifdef ZFS_DEBUG
|
||||
#define DMU_TX_DIRTY_BUF(tx, db) dmu_tx_dirty_buf(tx, db)
|
||||
#else
|
||||
#define DMU_TX_DIRTY_BUF(tx, db)
|
||||
#endif
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_DMU_TX_H */
|
76
uts/common/fs/zfs/sys/dmu_zfetch.h
Normal file
76
uts/common/fs/zfs/sys/dmu_zfetch.h
Normal file
@ -0,0 +1,76 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _DFETCH_H
|
||||
#define _DFETCH_H
|
||||
|
||||
#include <sys/zfs_context.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
extern uint64_t zfetch_array_rd_sz;
|
||||
|
||||
struct dnode; /* so we can reference dnode */
|
||||
|
||||
typedef enum zfetch_dirn {
|
||||
ZFETCH_FORWARD = 1, /* prefetch increasing block numbers */
|
||||
ZFETCH_BACKWARD = -1 /* prefetch decreasing block numbers */
|
||||
} zfetch_dirn_t;
|
||||
|
||||
typedef struct zstream {
|
||||
uint64_t zst_offset; /* offset of starting block in range */
|
||||
uint64_t zst_len; /* length of range, in blocks */
|
||||
zfetch_dirn_t zst_direction; /* direction of prefetch */
|
||||
uint64_t zst_stride; /* length of stride, in blocks */
|
||||
uint64_t zst_ph_offset; /* prefetch offset, in blocks */
|
||||
uint64_t zst_cap; /* prefetch limit (cap), in blocks */
|
||||
kmutex_t zst_lock; /* protects stream */
|
||||
clock_t zst_last; /* lbolt of last prefetch */
|
||||
avl_node_t zst_node; /* embed avl node here */
|
||||
} zstream_t;
|
||||
|
||||
typedef struct zfetch {
|
||||
krwlock_t zf_rwlock; /* protects zfetch structure */
|
||||
list_t zf_stream; /* AVL tree of zstream_t's */
|
||||
struct dnode *zf_dnode; /* dnode that owns this zfetch */
|
||||
uint32_t zf_stream_cnt; /* # of active streams */
|
||||
uint64_t zf_alloc_fail; /* # of failed attempts to alloc strm */
|
||||
} zfetch_t;
|
||||
|
||||
void zfetch_init(void);
|
||||
void zfetch_fini(void);
|
||||
|
||||
void dmu_zfetch_init(zfetch_t *, struct dnode *);
|
||||
void dmu_zfetch_rele(zfetch_t *);
|
||||
void dmu_zfetch(zfetch_t *, uint64_t, uint64_t, int);
|
||||
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _DFETCH_H */
|
329
uts/common/fs/zfs/sys/dnode.h
Normal file
329
uts/common/fs/zfs/sys/dnode.h
Normal file
@ -0,0 +1,329 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_DNODE_H
|
||||
#define _SYS_DNODE_H
|
||||
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/avl.h>
|
||||
#include <sys/spa.h>
|
||||
#include <sys/txg.h>
|
||||
#include <sys/zio.h>
|
||||
#include <sys/refcount.h>
|
||||
#include <sys/dmu_zfetch.h>
|
||||
#include <sys/zrlock.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
/*
|
||||
* dnode_hold() flags.
|
||||
*/
|
||||
#define DNODE_MUST_BE_ALLOCATED 1
|
||||
#define DNODE_MUST_BE_FREE 2
|
||||
|
||||
/*
|
||||
* dnode_next_offset() flags.
|
||||
*/
|
||||
#define DNODE_FIND_HOLE 1
|
||||
#define DNODE_FIND_BACKWARDS 2
|
||||
#define DNODE_FIND_HAVELOCK 4
|
||||
|
||||
/*
|
||||
* Fixed constants.
|
||||
*/
|
||||
#define DNODE_SHIFT 9 /* 512 bytes */
|
||||
#define DN_MIN_INDBLKSHIFT 10 /* 1k */
|
||||
#define DN_MAX_INDBLKSHIFT 14 /* 16k */
|
||||
#define DNODE_BLOCK_SHIFT 14 /* 16k */
|
||||
#define DNODE_CORE_SIZE 64 /* 64 bytes for dnode sans blkptrs */
|
||||
#define DN_MAX_OBJECT_SHIFT 48 /* 256 trillion (zfs_fid_t limit) */
|
||||
#define DN_MAX_OFFSET_SHIFT 64 /* 2^64 bytes in a dnode */
|
||||
|
||||
/*
|
||||
* dnode id flags
|
||||
*
|
||||
* Note: a file will never ever have its
|
||||
* ids moved from bonus->spill
|
||||
* and only in a crypto environment would it be on spill
|
||||
*/
|
||||
#define DN_ID_CHKED_BONUS 0x1
|
||||
#define DN_ID_CHKED_SPILL 0x2
|
||||
#define DN_ID_OLD_EXIST 0x4
|
||||
#define DN_ID_NEW_EXIST 0x8
|
||||
|
||||
/*
|
||||
* Derived constants.
|
||||
*/
|
||||
#define DNODE_SIZE (1 << DNODE_SHIFT)
|
||||
#define DN_MAX_NBLKPTR ((DNODE_SIZE - DNODE_CORE_SIZE) >> SPA_BLKPTRSHIFT)
|
||||
#define DN_MAX_BONUSLEN (DNODE_SIZE - DNODE_CORE_SIZE - (1 << SPA_BLKPTRSHIFT))
|
||||
#define DN_MAX_OBJECT (1ULL << DN_MAX_OBJECT_SHIFT)
|
||||
#define DN_ZERO_BONUSLEN (DN_MAX_BONUSLEN + 1)
|
||||
#define DN_KILL_SPILLBLK (1)
|
||||
|
||||
#define DNODES_PER_BLOCK_SHIFT (DNODE_BLOCK_SHIFT - DNODE_SHIFT)
|
||||
#define DNODES_PER_BLOCK (1ULL << DNODES_PER_BLOCK_SHIFT)
|
||||
#define DNODES_PER_LEVEL_SHIFT (DN_MAX_INDBLKSHIFT - SPA_BLKPTRSHIFT)
|
||||
#define DNODES_PER_LEVEL (1ULL << DNODES_PER_LEVEL_SHIFT)
|
||||
|
||||
/* The +2 here is a cheesy way to round up */
|
||||
#define DN_MAX_LEVELS (2 + ((DN_MAX_OFFSET_SHIFT - SPA_MINBLOCKSHIFT) / \
|
||||
(DN_MIN_INDBLKSHIFT - SPA_BLKPTRSHIFT)))
|
||||
|
||||
#define DN_BONUS(dnp) ((void*)((dnp)->dn_bonus + \
|
||||
(((dnp)->dn_nblkptr - 1) * sizeof (blkptr_t))))
|
||||
|
||||
#define DN_USED_BYTES(dnp) (((dnp)->dn_flags & DNODE_FLAG_USED_BYTES) ? \
|
||||
(dnp)->dn_used : (dnp)->dn_used << SPA_MINBLOCKSHIFT)
|
||||
|
||||
#define EPB(blkshift, typeshift) (1 << (blkshift - typeshift))
|
||||
|
||||
struct dmu_buf_impl;
|
||||
struct objset;
|
||||
struct zio;
|
||||
|
||||
enum dnode_dirtycontext {
|
||||
DN_UNDIRTIED,
|
||||
DN_DIRTY_OPEN,
|
||||
DN_DIRTY_SYNC
|
||||
};
|
||||
|
||||
/* Is dn_used in bytes? if not, it's in multiples of SPA_MINBLOCKSIZE */
|
||||
#define DNODE_FLAG_USED_BYTES (1<<0)
|
||||
#define DNODE_FLAG_USERUSED_ACCOUNTED (1<<1)
|
||||
|
||||
/* Does dnode have a SA spill blkptr in bonus? */
|
||||
#define DNODE_FLAG_SPILL_BLKPTR (1<<2)
|
||||
|
||||
typedef struct dnode_phys {
|
||||
uint8_t dn_type; /* dmu_object_type_t */
|
||||
uint8_t dn_indblkshift; /* ln2(indirect block size) */
|
||||
uint8_t dn_nlevels; /* 1=dn_blkptr->data blocks */
|
||||
uint8_t dn_nblkptr; /* length of dn_blkptr */
|
||||
uint8_t dn_bonustype; /* type of data in bonus buffer */
|
||||
uint8_t dn_checksum; /* ZIO_CHECKSUM type */
|
||||
uint8_t dn_compress; /* ZIO_COMPRESS type */
|
||||
uint8_t dn_flags; /* DNODE_FLAG_* */
|
||||
uint16_t dn_datablkszsec; /* data block size in 512b sectors */
|
||||
uint16_t dn_bonuslen; /* length of dn_bonus */
|
||||
uint8_t dn_pad2[4];
|
||||
|
||||
/* accounting is protected by dn_dirty_mtx */
|
||||
uint64_t dn_maxblkid; /* largest allocated block ID */
|
||||
uint64_t dn_used; /* bytes (or sectors) of disk space */
|
||||
|
||||
uint64_t dn_pad3[4];
|
||||
|
||||
blkptr_t dn_blkptr[1];
|
||||
uint8_t dn_bonus[DN_MAX_BONUSLEN - sizeof (blkptr_t)];
|
||||
blkptr_t dn_spill;
|
||||
} dnode_phys_t;
|
||||
|
||||
typedef struct dnode {
|
||||
/*
|
||||
* dn_struct_rwlock protects the structure of the dnode,
|
||||
* including the number of levels of indirection (dn_nlevels),
|
||||
* dn_maxblkid, and dn_next_*
|
||||
*/
|
||||
krwlock_t dn_struct_rwlock;
|
||||
|
||||
/* Our link on dn_objset->os_dnodes list; protected by os_lock. */
|
||||
list_node_t dn_link;
|
||||
|
||||
/* immutable: */
|
||||
struct objset *dn_objset;
|
||||
uint64_t dn_object;
|
||||
struct dmu_buf_impl *dn_dbuf;
|
||||
struct dnode_handle *dn_handle;
|
||||
dnode_phys_t *dn_phys; /* pointer into dn->dn_dbuf->db.db_data */
|
||||
|
||||
/*
|
||||
* Copies of stuff in dn_phys. They're valid in the open
|
||||
* context (eg. even before the dnode is first synced).
|
||||
* Where necessary, these are protected by dn_struct_rwlock.
|
||||
*/
|
||||
dmu_object_type_t dn_type; /* object type */
|
||||
uint16_t dn_bonuslen; /* bonus length */
|
||||
uint8_t dn_bonustype; /* bonus type */
|
||||
uint8_t dn_nblkptr; /* number of blkptrs (immutable) */
|
||||
uint8_t dn_checksum; /* ZIO_CHECKSUM type */
|
||||
uint8_t dn_compress; /* ZIO_COMPRESS type */
|
||||
uint8_t dn_nlevels;
|
||||
uint8_t dn_indblkshift;
|
||||
uint8_t dn_datablkshift; /* zero if blksz not power of 2! */
|
||||
uint8_t dn_moved; /* Has this dnode been moved? */
|
||||
uint16_t dn_datablkszsec; /* in 512b sectors */
|
||||
uint32_t dn_datablksz; /* in bytes */
|
||||
uint64_t dn_maxblkid;
|
||||
uint8_t dn_next_nblkptr[TXG_SIZE];
|
||||
uint8_t dn_next_nlevels[TXG_SIZE];
|
||||
uint8_t dn_next_indblkshift[TXG_SIZE];
|
||||
uint8_t dn_next_bonustype[TXG_SIZE];
|
||||
uint8_t dn_rm_spillblk[TXG_SIZE]; /* for removing spill blk */
|
||||
uint16_t dn_next_bonuslen[TXG_SIZE];
|
||||
uint32_t dn_next_blksz[TXG_SIZE]; /* next block size in bytes */
|
||||
|
||||
/* protected by dn_dbufs_mtx; declared here to fill 32-bit hole */
|
||||
uint32_t dn_dbufs_count; /* count of dn_dbufs */
|
||||
|
||||
/* protected by os_lock: */
|
||||
list_node_t dn_dirty_link[TXG_SIZE]; /* next on dataset's dirty */
|
||||
|
||||
/* protected by dn_mtx: */
|
||||
kmutex_t dn_mtx;
|
||||
list_t dn_dirty_records[TXG_SIZE];
|
||||
avl_tree_t dn_ranges[TXG_SIZE];
|
||||
uint64_t dn_allocated_txg;
|
||||
uint64_t dn_free_txg;
|
||||
uint64_t dn_assigned_txg;
|
||||
kcondvar_t dn_notxholds;
|
||||
enum dnode_dirtycontext dn_dirtyctx;
|
||||
uint8_t *dn_dirtyctx_firstset; /* dbg: contents meaningless */
|
||||
|
||||
/* protected by own devices */
|
||||
refcount_t dn_tx_holds;
|
||||
refcount_t dn_holds;
|
||||
|
||||
kmutex_t dn_dbufs_mtx;
|
||||
list_t dn_dbufs; /* descendent dbufs */
|
||||
|
||||
/* protected by dn_struct_rwlock */
|
||||
struct dmu_buf_impl *dn_bonus; /* bonus buffer dbuf */
|
||||
|
||||
boolean_t dn_have_spill; /* have spill or are spilling */
|
||||
|
||||
/* parent IO for current sync write */
|
||||
zio_t *dn_zio;
|
||||
|
||||
/* used in syncing context */
|
||||
uint64_t dn_oldused; /* old phys used bytes */
|
||||
uint64_t dn_oldflags; /* old phys dn_flags */
|
||||
uint64_t dn_olduid, dn_oldgid;
|
||||
uint64_t dn_newuid, dn_newgid;
|
||||
int dn_id_flags;
|
||||
|
||||
/* holds prefetch structure */
|
||||
struct zfetch dn_zfetch;
|
||||
} dnode_t;
|
||||
|
||||
/*
|
||||
* Adds a level of indirection between the dbuf and the dnode to avoid
|
||||
* iterating descendent dbufs in dnode_move(). Handles are not allocated
|
||||
* individually, but as an array of child dnodes in dnode_hold_impl().
|
||||
*/
|
||||
typedef struct dnode_handle {
|
||||
/* Protects dnh_dnode from modification by dnode_move(). */
|
||||
zrlock_t dnh_zrlock;
|
||||
dnode_t *dnh_dnode;
|
||||
} dnode_handle_t;
|
||||
|
||||
typedef struct dnode_children {
|
||||
size_t dnc_count; /* number of children */
|
||||
dnode_handle_t dnc_children[1]; /* sized dynamically */
|
||||
} dnode_children_t;
|
||||
|
||||
typedef struct free_range {
|
||||
avl_node_t fr_node;
|
||||
uint64_t fr_blkid;
|
||||
uint64_t fr_nblks;
|
||||
} free_range_t;
|
||||
|
||||
dnode_t *dnode_special_open(struct objset *dd, dnode_phys_t *dnp,
|
||||
uint64_t object, dnode_handle_t *dnh);
|
||||
void dnode_special_close(dnode_handle_t *dnh);
|
||||
|
||||
void dnode_setbonuslen(dnode_t *dn, int newsize, dmu_tx_t *tx);
|
||||
void dnode_setbonus_type(dnode_t *dn, dmu_object_type_t, dmu_tx_t *tx);
|
||||
void dnode_rm_spill(dnode_t *dn, dmu_tx_t *tx);
|
||||
|
||||
int dnode_hold(struct objset *dd, uint64_t object,
|
||||
void *ref, dnode_t **dnp);
|
||||
int dnode_hold_impl(struct objset *dd, uint64_t object, int flag,
|
||||
void *ref, dnode_t **dnp);
|
||||
boolean_t dnode_add_ref(dnode_t *dn, void *ref);
|
||||
void dnode_rele(dnode_t *dn, void *ref);
|
||||
void dnode_setdirty(dnode_t *dn, dmu_tx_t *tx);
|
||||
void dnode_sync(dnode_t *dn, dmu_tx_t *tx);
|
||||
void dnode_allocate(dnode_t *dn, dmu_object_type_t ot, int blocksize, int ibs,
|
||||
dmu_object_type_t bonustype, int bonuslen, dmu_tx_t *tx);
|
||||
void dnode_reallocate(dnode_t *dn, dmu_object_type_t ot, int blocksize,
|
||||
dmu_object_type_t bonustype, int bonuslen, dmu_tx_t *tx);
|
||||
void dnode_free(dnode_t *dn, dmu_tx_t *tx);
|
||||
void dnode_byteswap(dnode_phys_t *dnp);
|
||||
void dnode_buf_byteswap(void *buf, size_t size);
|
||||
void dnode_verify(dnode_t *dn);
|
||||
int dnode_set_blksz(dnode_t *dn, uint64_t size, int ibs, dmu_tx_t *tx);
|
||||
uint64_t dnode_current_max_length(dnode_t *dn);
|
||||
void dnode_free_range(dnode_t *dn, uint64_t off, uint64_t len, dmu_tx_t *tx);
|
||||
void dnode_clear_range(dnode_t *dn, uint64_t blkid,
|
||||
uint64_t nblks, dmu_tx_t *tx);
|
||||
void dnode_diduse_space(dnode_t *dn, int64_t space);
|
||||
void dnode_willuse_space(dnode_t *dn, int64_t space, dmu_tx_t *tx);
|
||||
void dnode_new_blkid(dnode_t *dn, uint64_t blkid, dmu_tx_t *tx, boolean_t);
|
||||
uint64_t dnode_block_freed(dnode_t *dn, uint64_t blkid);
|
||||
void dnode_init(void);
|
||||
void dnode_fini(void);
|
||||
int dnode_next_offset(dnode_t *dn, int flags, uint64_t *off,
|
||||
int minlvl, uint64_t blkfill, uint64_t txg);
|
||||
void dnode_evict_dbufs(dnode_t *dn);
|
||||
|
||||
#ifdef ZFS_DEBUG
|
||||
|
||||
/*
|
||||
* There should be a ## between the string literal and fmt, to make it
|
||||
* clear that we're joining two strings together, but that piece of shit
|
||||
* gcc doesn't support that preprocessor token.
|
||||
*/
|
||||
#define dprintf_dnode(dn, fmt, ...) do { \
|
||||
if (zfs_flags & ZFS_DEBUG_DPRINTF) { \
|
||||
char __db_buf[32]; \
|
||||
uint64_t __db_obj = (dn)->dn_object; \
|
||||
if (__db_obj == DMU_META_DNODE_OBJECT) \
|
||||
(void) strcpy(__db_buf, "mdn"); \
|
||||
else \
|
||||
(void) snprintf(__db_buf, sizeof (__db_buf), "%lld", \
|
||||
(u_longlong_t)__db_obj);\
|
||||
dprintf_ds((dn)->dn_objset->os_dsl_dataset, "obj=%s " fmt, \
|
||||
__db_buf, __VA_ARGS__); \
|
||||
} \
|
||||
_NOTE(CONSTCOND) } while (0)
|
||||
|
||||
#define DNODE_VERIFY(dn) dnode_verify(dn)
|
||||
#define FREE_VERIFY(db, start, end, tx) free_verify(db, start, end, tx)
|
||||
|
||||
#else
|
||||
|
||||
#define dprintf_dnode(db, fmt, ...)
|
||||
#define DNODE_VERIFY(dn)
|
||||
#define FREE_VERIFY(db, start, end, tx)
|
||||
|
||||
#endif
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_DNODE_H */
|
283
uts/common/fs/zfs/sys/dsl_dataset.h
Normal file
283
uts/common/fs/zfs/sys/dsl_dataset.h
Normal file
@ -0,0 +1,283 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_DSL_DATASET_H
|
||||
#define _SYS_DSL_DATASET_H
|
||||
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/spa.h>
|
||||
#include <sys/txg.h>
|
||||
#include <sys/zio.h>
|
||||
#include <sys/bplist.h>
|
||||
#include <sys/dsl_synctask.h>
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/dsl_deadlist.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
struct dsl_dataset;
|
||||
struct dsl_dir;
|
||||
struct dsl_pool;
|
||||
|
||||
#define DS_FLAG_INCONSISTENT (1ULL<<0)
|
||||
#define DS_IS_INCONSISTENT(ds) \
|
||||
((ds)->ds_phys->ds_flags & DS_FLAG_INCONSISTENT)
|
||||
/*
|
||||
* NB: nopromote can not yet be set, but we want support for it in this
|
||||
* on-disk version, so that we don't need to upgrade for it later. It
|
||||
* will be needed when we implement 'zfs split' (where the split off
|
||||
* clone should not be promoted).
|
||||
*/
|
||||
#define DS_FLAG_NOPROMOTE (1ULL<<1)
|
||||
|
||||
/*
|
||||
* DS_FLAG_UNIQUE_ACCURATE is set if ds_unique_bytes has been correctly
|
||||
* calculated for head datasets (starting with SPA_VERSION_UNIQUE_ACCURATE,
|
||||
* refquota/refreservations).
|
||||
*/
|
||||
#define DS_FLAG_UNIQUE_ACCURATE (1ULL<<2)
|
||||
|
||||
/*
|
||||
* DS_FLAG_DEFER_DESTROY is set after 'zfs destroy -d' has been called
|
||||
* on a dataset. This allows the dataset to be destroyed using 'zfs release'.
|
||||
*/
|
||||
#define DS_FLAG_DEFER_DESTROY (1ULL<<3)
|
||||
#define DS_IS_DEFER_DESTROY(ds) \
|
||||
((ds)->ds_phys->ds_flags & DS_FLAG_DEFER_DESTROY)
|
||||
|
||||
/*
|
||||
* DS_FLAG_CI_DATASET is set if the dataset contains a file system whose
|
||||
* name lookups should be performed case-insensitively.
|
||||
*/
|
||||
#define DS_FLAG_CI_DATASET (1ULL<<16)
|
||||
|
||||
typedef struct dsl_dataset_phys {
|
||||
uint64_t ds_dir_obj; /* DMU_OT_DSL_DIR */
|
||||
uint64_t ds_prev_snap_obj; /* DMU_OT_DSL_DATASET */
|
||||
uint64_t ds_prev_snap_txg;
|
||||
uint64_t ds_next_snap_obj; /* DMU_OT_DSL_DATASET */
|
||||
uint64_t ds_snapnames_zapobj; /* DMU_OT_DSL_DS_SNAP_MAP 0 for snaps */
|
||||
uint64_t ds_num_children; /* clone/snap children; ==0 for head */
|
||||
uint64_t ds_creation_time; /* seconds since 1970 */
|
||||
uint64_t ds_creation_txg;
|
||||
uint64_t ds_deadlist_obj; /* DMU_OT_DEADLIST */
|
||||
uint64_t ds_used_bytes;
|
||||
uint64_t ds_compressed_bytes;
|
||||
uint64_t ds_uncompressed_bytes;
|
||||
uint64_t ds_unique_bytes; /* only relevant to snapshots */
|
||||
/*
|
||||
* The ds_fsid_guid is a 56-bit ID that can change to avoid
|
||||
* collisions. The ds_guid is a 64-bit ID that will never
|
||||
* change, so there is a small probability that it will collide.
|
||||
*/
|
||||
uint64_t ds_fsid_guid;
|
||||
uint64_t ds_guid;
|
||||
uint64_t ds_flags; /* DS_FLAG_* */
|
||||
blkptr_t ds_bp;
|
||||
uint64_t ds_next_clones_obj; /* DMU_OT_DSL_CLONES */
|
||||
uint64_t ds_props_obj; /* DMU_OT_DSL_PROPS for snaps */
|
||||
uint64_t ds_userrefs_obj; /* DMU_OT_USERREFS */
|
||||
uint64_t ds_pad[5]; /* pad out to 320 bytes for good measure */
|
||||
} dsl_dataset_phys_t;
|
||||
|
||||
typedef struct dsl_dataset {
|
||||
/* Immutable: */
|
||||
struct dsl_dir *ds_dir;
|
||||
dsl_dataset_phys_t *ds_phys;
|
||||
dmu_buf_t *ds_dbuf;
|
||||
uint64_t ds_object;
|
||||
uint64_t ds_fsid_guid;
|
||||
|
||||
/* only used in syncing context, only valid for non-snapshots: */
|
||||
struct dsl_dataset *ds_prev;
|
||||
|
||||
/* has internal locking: */
|
||||
dsl_deadlist_t ds_deadlist;
|
||||
bplist_t ds_pending_deadlist;
|
||||
|
||||
/* to protect against multiple concurrent incremental recv */
|
||||
kmutex_t ds_recvlock;
|
||||
|
||||
/* protected by lock on pool's dp_dirty_datasets list */
|
||||
txg_node_t ds_dirty_link;
|
||||
list_node_t ds_synced_link;
|
||||
|
||||
/*
|
||||
* ds_phys->ds_<accounting> is also protected by ds_lock.
|
||||
* Protected by ds_lock:
|
||||
*/
|
||||
kmutex_t ds_lock;
|
||||
objset_t *ds_objset;
|
||||
uint64_t ds_userrefs;
|
||||
|
||||
/*
|
||||
* ds_owner is protected by the ds_rwlock and the ds_lock
|
||||
*/
|
||||
krwlock_t ds_rwlock;
|
||||
kcondvar_t ds_exclusive_cv;
|
||||
void *ds_owner;
|
||||
|
||||
/* no locking; only for making guesses */
|
||||
uint64_t ds_trysnap_txg;
|
||||
|
||||
/* for objset_open() */
|
||||
kmutex_t ds_opening_lock;
|
||||
|
||||
uint64_t ds_reserved; /* cached refreservation */
|
||||
uint64_t ds_quota; /* cached refquota */
|
||||
|
||||
/* Protected by ds_lock; keep at end of struct for better locality */
|
||||
char ds_snapname[MAXNAMELEN];
|
||||
} dsl_dataset_t;
|
||||
|
||||
struct dsl_ds_destroyarg {
|
||||
dsl_dataset_t *ds; /* ds to destroy */
|
||||
dsl_dataset_t *rm_origin; /* also remove our origin? */
|
||||
boolean_t is_origin_rm; /* set if removing origin snap */
|
||||
boolean_t defer; /* destroy -d requested? */
|
||||
boolean_t releasing; /* destroying due to release? */
|
||||
boolean_t need_prep; /* do we need to retry due to EBUSY? */
|
||||
};
|
||||
|
||||
/*
|
||||
* The max length of a temporary tag prefix is the number of hex digits
|
||||
* required to express UINT64_MAX plus one for the hyphen.
|
||||
*/
|
||||
#define MAX_TAG_PREFIX_LEN 17
|
||||
|
||||
struct dsl_ds_holdarg {
|
||||
dsl_sync_task_group_t *dstg;
|
||||
char *htag;
|
||||
char *snapname;
|
||||
boolean_t recursive;
|
||||
boolean_t gotone;
|
||||
boolean_t temphold;
|
||||
char failed[MAXPATHLEN];
|
||||
};
|
||||
|
||||
#define dsl_dataset_is_snapshot(ds) \
|
||||
((ds)->ds_phys->ds_num_children != 0)
|
||||
|
||||
#define DS_UNIQUE_IS_ACCURATE(ds) \
|
||||
(((ds)->ds_phys->ds_flags & DS_FLAG_UNIQUE_ACCURATE) != 0)
|
||||
|
||||
int dsl_dataset_hold(const char *name, void *tag, dsl_dataset_t **dsp);
|
||||
int dsl_dataset_hold_obj(struct dsl_pool *dp, uint64_t dsobj,
|
||||
void *tag, dsl_dataset_t **);
|
||||
int dsl_dataset_own(const char *name, boolean_t inconsistentok,
|
||||
void *tag, dsl_dataset_t **dsp);
|
||||
int dsl_dataset_own_obj(struct dsl_pool *dp, uint64_t dsobj,
|
||||
boolean_t inconsistentok, void *tag, dsl_dataset_t **dsp);
|
||||
void dsl_dataset_name(dsl_dataset_t *ds, char *name);
|
||||
void dsl_dataset_rele(dsl_dataset_t *ds, void *tag);
|
||||
void dsl_dataset_disown(dsl_dataset_t *ds, void *tag);
|
||||
void dsl_dataset_drop_ref(dsl_dataset_t *ds, void *tag);
|
||||
boolean_t dsl_dataset_tryown(dsl_dataset_t *ds, boolean_t inconsistentok,
|
||||
void *tag);
|
||||
void dsl_dataset_make_exclusive(dsl_dataset_t *ds, void *tag);
|
||||
void dsl_register_onexit_hold_cleanup(dsl_dataset_t *ds, const char *htag,
|
||||
minor_t minor);
|
||||
uint64_t dsl_dataset_create_sync(dsl_dir_t *pds, const char *lastname,
|
||||
dsl_dataset_t *origin, uint64_t flags, cred_t *, dmu_tx_t *);
|
||||
uint64_t dsl_dataset_create_sync_dd(dsl_dir_t *dd, dsl_dataset_t *origin,
|
||||
uint64_t flags, dmu_tx_t *tx);
|
||||
int dsl_dataset_destroy(dsl_dataset_t *ds, void *tag, boolean_t defer);
|
||||
int dsl_snapshots_destroy(char *fsname, char *snapname, boolean_t defer);
|
||||
dsl_checkfunc_t dsl_dataset_destroy_check;
|
||||
dsl_syncfunc_t dsl_dataset_destroy_sync;
|
||||
dsl_checkfunc_t dsl_dataset_snapshot_check;
|
||||
dsl_syncfunc_t dsl_dataset_snapshot_sync;
|
||||
dsl_syncfunc_t dsl_dataset_user_hold_sync;
|
||||
int dsl_dataset_rename(char *name, const char *newname, boolean_t recursive);
|
||||
int dsl_dataset_promote(const char *name, char *conflsnap);
|
||||
int dsl_dataset_clone_swap(dsl_dataset_t *clone, dsl_dataset_t *origin_head,
|
||||
boolean_t force);
|
||||
int dsl_dataset_user_hold(char *dsname, char *snapname, char *htag,
|
||||
boolean_t recursive, boolean_t temphold, int cleanup_fd);
|
||||
int dsl_dataset_user_hold_for_send(dsl_dataset_t *ds, char *htag,
|
||||
boolean_t temphold);
|
||||
int dsl_dataset_user_release(char *dsname, char *snapname, char *htag,
|
||||
boolean_t recursive);
|
||||
int dsl_dataset_user_release_tmp(struct dsl_pool *dp, uint64_t dsobj,
|
||||
char *htag, boolean_t retry);
|
||||
int dsl_dataset_get_holds(const char *dsname, nvlist_t **nvp);
|
||||
|
||||
blkptr_t *dsl_dataset_get_blkptr(dsl_dataset_t *ds);
|
||||
void dsl_dataset_set_blkptr(dsl_dataset_t *ds, blkptr_t *bp, dmu_tx_t *tx);
|
||||
|
||||
spa_t *dsl_dataset_get_spa(dsl_dataset_t *ds);
|
||||
|
||||
boolean_t dsl_dataset_modified_since_lastsnap(dsl_dataset_t *ds);
|
||||
|
||||
void dsl_dataset_sync(dsl_dataset_t *os, zio_t *zio, dmu_tx_t *tx);
|
||||
|
||||
void dsl_dataset_block_born(dsl_dataset_t *ds, const blkptr_t *bp,
|
||||
dmu_tx_t *tx);
|
||||
int dsl_dataset_block_kill(dsl_dataset_t *ds, const blkptr_t *bp,
|
||||
dmu_tx_t *tx, boolean_t async);
|
||||
boolean_t dsl_dataset_block_freeable(dsl_dataset_t *ds, const blkptr_t *bp,
|
||||
uint64_t blk_birth);
|
||||
uint64_t dsl_dataset_prev_snap_txg(dsl_dataset_t *ds);
|
||||
|
||||
void dsl_dataset_dirty(dsl_dataset_t *ds, dmu_tx_t *tx);
|
||||
void dsl_dataset_stats(dsl_dataset_t *os, nvlist_t *nv);
|
||||
void dsl_dataset_fast_stat(dsl_dataset_t *ds, dmu_objset_stats_t *stat);
|
||||
void dsl_dataset_space(dsl_dataset_t *ds,
|
||||
uint64_t *refdbytesp, uint64_t *availbytesp,
|
||||
uint64_t *usedobjsp, uint64_t *availobjsp);
|
||||
uint64_t dsl_dataset_fsid_guid(dsl_dataset_t *ds);
|
||||
|
||||
int dsl_dsobj_to_dsname(char *pname, uint64_t obj, char *buf);
|
||||
|
||||
int dsl_dataset_check_quota(dsl_dataset_t *ds, boolean_t check_quota,
|
||||
uint64_t asize, uint64_t inflight, uint64_t *used,
|
||||
uint64_t *ref_rsrv);
|
||||
int dsl_dataset_set_quota(const char *dsname, zprop_source_t source,
|
||||
uint64_t quota);
|
||||
dsl_syncfunc_t dsl_dataset_set_quota_sync;
|
||||
int dsl_dataset_set_reservation(const char *dsname, zprop_source_t source,
|
||||
uint64_t reservation);
|
||||
|
||||
int dsl_destroy_inconsistent(const char *dsname, void *arg);
|
||||
|
||||
#ifdef ZFS_DEBUG
|
||||
#define dprintf_ds(ds, fmt, ...) do { \
|
||||
if (zfs_flags & ZFS_DEBUG_DPRINTF) { \
|
||||
char *__ds_name = kmem_alloc(MAXNAMELEN, KM_SLEEP); \
|
||||
dsl_dataset_name(ds, __ds_name); \
|
||||
dprintf("ds=%s " fmt, __ds_name, __VA_ARGS__); \
|
||||
kmem_free(__ds_name, MAXNAMELEN); \
|
||||
} \
|
||||
_NOTE(CONSTCOND) } while (0)
|
||||
#else
|
||||
#define dprintf_ds(dd, fmt, ...)
|
||||
#endif
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_DSL_DATASET_H */
|
87
uts/common/fs/zfs/sys/dsl_deadlist.h
Normal file
87
uts/common/fs/zfs/sys/dsl_deadlist.h
Normal file
@ -0,0 +1,87 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_DSL_DEADLIST_H
|
||||
#define _SYS_DSL_DEADLIST_H
|
||||
|
||||
#include <sys/bpobj.h>
|
||||
#include <sys/zfs_context.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
struct dmu_buf;
|
||||
struct dsl_dataset;
|
||||
|
||||
typedef struct dsl_deadlist_phys {
|
||||
uint64_t dl_used;
|
||||
uint64_t dl_comp;
|
||||
uint64_t dl_uncomp;
|
||||
uint64_t dl_pad[37]; /* pad out to 320b for future expansion */
|
||||
} dsl_deadlist_phys_t;
|
||||
|
||||
typedef struct dsl_deadlist {
|
||||
objset_t *dl_os;
|
||||
uint64_t dl_object;
|
||||
avl_tree_t dl_tree;
|
||||
boolean_t dl_havetree;
|
||||
struct dmu_buf *dl_dbuf;
|
||||
dsl_deadlist_phys_t *dl_phys;
|
||||
kmutex_t dl_lock;
|
||||
|
||||
/* if it's the old on-disk format: */
|
||||
bpobj_t dl_bpobj;
|
||||
boolean_t dl_oldfmt;
|
||||
} dsl_deadlist_t;
|
||||
|
||||
typedef struct dsl_deadlist_entry {
|
||||
avl_node_t dle_node;
|
||||
uint64_t dle_mintxg;
|
||||
bpobj_t dle_bpobj;
|
||||
} dsl_deadlist_entry_t;
|
||||
|
||||
void dsl_deadlist_open(dsl_deadlist_t *dl, objset_t *os, uint64_t object);
|
||||
void dsl_deadlist_close(dsl_deadlist_t *dl);
|
||||
uint64_t dsl_deadlist_alloc(objset_t *os, dmu_tx_t *tx);
|
||||
void dsl_deadlist_free(objset_t *os, uint64_t dlobj, dmu_tx_t *tx);
|
||||
void dsl_deadlist_insert(dsl_deadlist_t *dl, const blkptr_t *bp, dmu_tx_t *tx);
|
||||
void dsl_deadlist_add_key(dsl_deadlist_t *dl, uint64_t mintxg, dmu_tx_t *tx);
|
||||
void dsl_deadlist_remove_key(dsl_deadlist_t *dl, uint64_t mintxg, dmu_tx_t *tx);
|
||||
uint64_t dsl_deadlist_clone(dsl_deadlist_t *dl, uint64_t maxtxg,
|
||||
uint64_t mrs_obj, dmu_tx_t *tx);
|
||||
void dsl_deadlist_space(dsl_deadlist_t *dl,
|
||||
uint64_t *usedp, uint64_t *compp, uint64_t *uncompp);
|
||||
void dsl_deadlist_space_range(dsl_deadlist_t *dl,
|
||||
uint64_t mintxg, uint64_t maxtxg,
|
||||
uint64_t *usedp, uint64_t *compp, uint64_t *uncompp);
|
||||
void dsl_deadlist_merge(dsl_deadlist_t *dl, uint64_t obj, dmu_tx_t *tx);
|
||||
void dsl_deadlist_move_bpobj(dsl_deadlist_t *dl, bpobj_t *bpo, uint64_t mintxg,
|
||||
dmu_tx_t *tx);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_DSL_DEADLIST_H */
|
78
uts/common/fs/zfs/sys/dsl_deleg.h
Normal file
78
uts/common/fs/zfs/sys/dsl_deleg.h
Normal file
@ -0,0 +1,78 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_DSL_DELEG_H
|
||||
#define _SYS_DSL_DELEG_H
|
||||
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/dsl_pool.h>
|
||||
#include <sys/zfs_context.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
#define ZFS_DELEG_PERM_NONE ""
|
||||
#define ZFS_DELEG_PERM_CREATE "create"
|
||||
#define ZFS_DELEG_PERM_DESTROY "destroy"
|
||||
#define ZFS_DELEG_PERM_SNAPSHOT "snapshot"
|
||||
#define ZFS_DELEG_PERM_ROLLBACK "rollback"
|
||||
#define ZFS_DELEG_PERM_CLONE "clone"
|
||||
#define ZFS_DELEG_PERM_PROMOTE "promote"
|
||||
#define ZFS_DELEG_PERM_RENAME "rename"
|
||||
#define ZFS_DELEG_PERM_MOUNT "mount"
|
||||
#define ZFS_DELEG_PERM_SHARE "share"
|
||||
#define ZFS_DELEG_PERM_SEND "send"
|
||||
#define ZFS_DELEG_PERM_RECEIVE "receive"
|
||||
#define ZFS_DELEG_PERM_ALLOW "allow"
|
||||
#define ZFS_DELEG_PERM_USERPROP "userprop"
|
||||
#define ZFS_DELEG_PERM_VSCAN "vscan"
|
||||
#define ZFS_DELEG_PERM_USERQUOTA "userquota"
|
||||
#define ZFS_DELEG_PERM_GROUPQUOTA "groupquota"
|
||||
#define ZFS_DELEG_PERM_USERUSED "userused"
|
||||
#define ZFS_DELEG_PERM_GROUPUSED "groupused"
|
||||
#define ZFS_DELEG_PERM_HOLD "hold"
|
||||
#define ZFS_DELEG_PERM_RELEASE "release"
|
||||
#define ZFS_DELEG_PERM_DIFF "diff"
|
||||
|
||||
/*
|
||||
* Note: the names of properties that are marked delegatable are also
|
||||
* valid delegated permissions
|
||||
*/
|
||||
|
||||
int dsl_deleg_get(const char *ddname, nvlist_t **nvp);
|
||||
int dsl_deleg_set(const char *ddname, nvlist_t *nvp, boolean_t unset);
|
||||
int dsl_deleg_access(const char *ddname, const char *perm, cred_t *cr);
|
||||
int dsl_deleg_access_impl(struct dsl_dataset *ds, const char *perm, cred_t *cr);
|
||||
void dsl_deleg_set_create_perms(dsl_dir_t *dd, dmu_tx_t *tx, cred_t *cr);
|
||||
int dsl_deleg_can_allow(char *ddname, nvlist_t *nvp, cred_t *cr);
|
||||
int dsl_deleg_can_unallow(char *ddname, nvlist_t *nvp, cred_t *cr);
|
||||
int dsl_deleg_destroy(objset_t *os, uint64_t zapobj, dmu_tx_t *tx);
|
||||
boolean_t dsl_delegation_on(objset_t *os);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_DSL_DELEG_H */
|
167
uts/common/fs/zfs/sys/dsl_dir.h
Normal file
167
uts/common/fs/zfs/sys/dsl_dir.h
Normal file
@ -0,0 +1,167 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_DSL_DIR_H
|
||||
#define _SYS_DSL_DIR_H
|
||||
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/dsl_pool.h>
|
||||
#include <sys/dsl_synctask.h>
|
||||
#include <sys/refcount.h>
|
||||
#include <sys/zfs_context.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
struct dsl_dataset;
|
||||
|
||||
typedef enum dd_used {
|
||||
DD_USED_HEAD,
|
||||
DD_USED_SNAP,
|
||||
DD_USED_CHILD,
|
||||
DD_USED_CHILD_RSRV,
|
||||
DD_USED_REFRSRV,
|
||||
DD_USED_NUM
|
||||
} dd_used_t;
|
||||
|
||||
#define DD_FLAG_USED_BREAKDOWN (1<<0)
|
||||
|
||||
typedef struct dsl_dir_phys {
|
||||
uint64_t dd_creation_time; /* not actually used */
|
||||
uint64_t dd_head_dataset_obj;
|
||||
uint64_t dd_parent_obj;
|
||||
uint64_t dd_origin_obj;
|
||||
uint64_t dd_child_dir_zapobj;
|
||||
/*
|
||||
* how much space our children are accounting for; for leaf
|
||||
* datasets, == physical space used by fs + snaps
|
||||
*/
|
||||
uint64_t dd_used_bytes;
|
||||
uint64_t dd_compressed_bytes;
|
||||
uint64_t dd_uncompressed_bytes;
|
||||
/* Administrative quota setting */
|
||||
uint64_t dd_quota;
|
||||
/* Administrative reservation setting */
|
||||
uint64_t dd_reserved;
|
||||
uint64_t dd_props_zapobj;
|
||||
uint64_t dd_deleg_zapobj; /* dataset delegation permissions */
|
||||
uint64_t dd_flags;
|
||||
uint64_t dd_used_breakdown[DD_USED_NUM];
|
||||
uint64_t dd_clones; /* dsl_dir objects */
|
||||
uint64_t dd_pad[13]; /* pad out to 256 bytes for good measure */
|
||||
} dsl_dir_phys_t;
|
||||
|
||||
struct dsl_dir {
|
||||
/* These are immutable; no lock needed: */
|
||||
uint64_t dd_object;
|
||||
dsl_dir_phys_t *dd_phys;
|
||||
dmu_buf_t *dd_dbuf;
|
||||
dsl_pool_t *dd_pool;
|
||||
|
||||
/* protected by lock on pool's dp_dirty_dirs list */
|
||||
txg_node_t dd_dirty_link;
|
||||
|
||||
/* protected by dp_config_rwlock */
|
||||
dsl_dir_t *dd_parent;
|
||||
|
||||
/* Protected by dd_lock */
|
||||
kmutex_t dd_lock;
|
||||
list_t dd_prop_cbs; /* list of dsl_prop_cb_record_t's */
|
||||
timestruc_t dd_snap_cmtime; /* last time snapshot namespace changed */
|
||||
uint64_t dd_origin_txg;
|
||||
|
||||
/* gross estimate of space used by in-flight tx's */
|
||||
uint64_t dd_tempreserved[TXG_SIZE];
|
||||
/* amount of space we expect to write; == amount of dirty data */
|
||||
int64_t dd_space_towrite[TXG_SIZE];
|
||||
|
||||
/* protected by dd_lock; keep at end of struct for better locality */
|
||||
char dd_myname[MAXNAMELEN];
|
||||
};
|
||||
|
||||
void dsl_dir_close(dsl_dir_t *dd, void *tag);
|
||||
int dsl_dir_open(const char *name, void *tag, dsl_dir_t **, const char **tail);
|
||||
int dsl_dir_open_spa(spa_t *spa, const char *name, void *tag, dsl_dir_t **,
|
||||
const char **tailp);
|
||||
int dsl_dir_open_obj(dsl_pool_t *dp, uint64_t ddobj,
|
||||
const char *tail, void *tag, dsl_dir_t **);
|
||||
void dsl_dir_name(dsl_dir_t *dd, char *buf);
|
||||
int dsl_dir_namelen(dsl_dir_t *dd);
|
||||
uint64_t dsl_dir_create_sync(dsl_pool_t *dp, dsl_dir_t *pds,
|
||||
const char *name, dmu_tx_t *tx);
|
||||
dsl_checkfunc_t dsl_dir_destroy_check;
|
||||
dsl_syncfunc_t dsl_dir_destroy_sync;
|
||||
void dsl_dir_stats(dsl_dir_t *dd, nvlist_t *nv);
|
||||
uint64_t dsl_dir_space_available(dsl_dir_t *dd,
|
||||
dsl_dir_t *ancestor, int64_t delta, int ondiskonly);
|
||||
void dsl_dir_dirty(dsl_dir_t *dd, dmu_tx_t *tx);
|
||||
void dsl_dir_sync(dsl_dir_t *dd, dmu_tx_t *tx);
|
||||
int dsl_dir_tempreserve_space(dsl_dir_t *dd, uint64_t mem,
|
||||
uint64_t asize, uint64_t fsize, uint64_t usize, void **tr_cookiep,
|
||||
dmu_tx_t *tx);
|
||||
void dsl_dir_tempreserve_clear(void *tr_cookie, dmu_tx_t *tx);
|
||||
void dsl_dir_willuse_space(dsl_dir_t *dd, int64_t space, dmu_tx_t *tx);
|
||||
void dsl_dir_diduse_space(dsl_dir_t *dd, dd_used_t type,
|
||||
int64_t used, int64_t compressed, int64_t uncompressed, dmu_tx_t *tx);
|
||||
void dsl_dir_transfer_space(dsl_dir_t *dd, int64_t delta,
|
||||
dd_used_t oldtype, dd_used_t newtype, dmu_tx_t *tx);
|
||||
int dsl_dir_set_quota(const char *ddname, zprop_source_t source,
|
||||
uint64_t quota);
|
||||
int dsl_dir_set_reservation(const char *ddname, zprop_source_t source,
|
||||
uint64_t reservation);
|
||||
int dsl_dir_rename(dsl_dir_t *dd, const char *newname);
|
||||
int dsl_dir_transfer_possible(dsl_dir_t *sdd, dsl_dir_t *tdd, uint64_t space);
|
||||
int dsl_dir_set_reservation_check(void *arg1, void *arg2, dmu_tx_t *tx);
|
||||
boolean_t dsl_dir_is_clone(dsl_dir_t *dd);
|
||||
void dsl_dir_new_refreservation(dsl_dir_t *dd, struct dsl_dataset *ds,
|
||||
uint64_t reservation, cred_t *cr, dmu_tx_t *tx);
|
||||
void dsl_dir_snap_cmtime_update(dsl_dir_t *dd);
|
||||
timestruc_t dsl_dir_snap_cmtime(dsl_dir_t *dd);
|
||||
|
||||
/* internal reserved dir name */
|
||||
#define MOS_DIR_NAME "$MOS"
|
||||
#define ORIGIN_DIR_NAME "$ORIGIN"
|
||||
#define XLATION_DIR_NAME "$XLATION"
|
||||
#define FREE_DIR_NAME "$FREE"
|
||||
|
||||
#ifdef ZFS_DEBUG
|
||||
#define dprintf_dd(dd, fmt, ...) do { \
|
||||
if (zfs_flags & ZFS_DEBUG_DPRINTF) { \
|
||||
char *__ds_name = kmem_alloc(MAXNAMELEN + strlen(MOS_DIR_NAME) + 1, \
|
||||
KM_SLEEP); \
|
||||
dsl_dir_name(dd, __ds_name); \
|
||||
dprintf("dd=%s " fmt, __ds_name, __VA_ARGS__); \
|
||||
kmem_free(__ds_name, MAXNAMELEN + strlen(MOS_DIR_NAME) + 1); \
|
||||
} \
|
||||
_NOTE(CONSTCOND) } while (0)
|
||||
#else
|
||||
#define dprintf_dd(dd, fmt, ...)
|
||||
#endif
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_DSL_DIR_H */
|
151
uts/common/fs/zfs/sys/dsl_pool.h
Normal file
151
uts/common/fs/zfs/sys/dsl_pool.h
Normal file
@ -0,0 +1,151 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_DSL_POOL_H
|
||||
#define _SYS_DSL_POOL_H
|
||||
|
||||
#include <sys/spa.h>
|
||||
#include <sys/txg.h>
|
||||
#include <sys/txg_impl.h>
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/zio.h>
|
||||
#include <sys/dnode.h>
|
||||
#include <sys/ddt.h>
|
||||
#include <sys/arc.h>
|
||||
#include <sys/bpobj.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
struct objset;
|
||||
struct dsl_dir;
|
||||
struct dsl_dataset;
|
||||
struct dsl_pool;
|
||||
struct dmu_tx;
|
||||
struct dsl_scan;
|
||||
|
||||
/* These macros are for indexing into the zfs_all_blkstats_t. */
|
||||
#define DMU_OT_DEFERRED DMU_OT_NONE
|
||||
#define DMU_OT_TOTAL DMU_OT_NUMTYPES
|
||||
|
||||
typedef struct zfs_blkstat {
|
||||
uint64_t zb_count;
|
||||
uint64_t zb_asize;
|
||||
uint64_t zb_lsize;
|
||||
uint64_t zb_psize;
|
||||
uint64_t zb_gangs;
|
||||
uint64_t zb_ditto_2_of_2_samevdev;
|
||||
uint64_t zb_ditto_2_of_3_samevdev;
|
||||
uint64_t zb_ditto_3_of_3_samevdev;
|
||||
} zfs_blkstat_t;
|
||||
|
||||
typedef struct zfs_all_blkstats {
|
||||
zfs_blkstat_t zab_type[DN_MAX_LEVELS + 1][DMU_OT_TOTAL + 1];
|
||||
} zfs_all_blkstats_t;
|
||||
|
||||
|
||||
typedef struct dsl_pool {
|
||||
/* Immutable */
|
||||
spa_t *dp_spa;
|
||||
struct objset *dp_meta_objset;
|
||||
struct dsl_dir *dp_root_dir;
|
||||
struct dsl_dir *dp_mos_dir;
|
||||
struct dsl_dir *dp_free_dir;
|
||||
struct dsl_dataset *dp_origin_snap;
|
||||
uint64_t dp_root_dir_obj;
|
||||
struct taskq *dp_vnrele_taskq;
|
||||
|
||||
/* No lock needed - sync context only */
|
||||
blkptr_t dp_meta_rootbp;
|
||||
list_t dp_synced_datasets;
|
||||
hrtime_t dp_read_overhead;
|
||||
uint64_t dp_throughput; /* bytes per millisec */
|
||||
uint64_t dp_write_limit;
|
||||
uint64_t dp_tmp_userrefs_obj;
|
||||
bpobj_t dp_free_bpobj;
|
||||
|
||||
struct dsl_scan *dp_scan;
|
||||
|
||||
/* Uses dp_lock */
|
||||
kmutex_t dp_lock;
|
||||
uint64_t dp_space_towrite[TXG_SIZE];
|
||||
uint64_t dp_tempreserved[TXG_SIZE];
|
||||
|
||||
/* Has its own locking */
|
||||
tx_state_t dp_tx;
|
||||
txg_list_t dp_dirty_datasets;
|
||||
txg_list_t dp_dirty_dirs;
|
||||
txg_list_t dp_sync_tasks;
|
||||
|
||||
/*
|
||||
* Protects administrative changes (properties, namespace)
|
||||
* It is only held for write in syncing context. Therefore
|
||||
* syncing context does not need to ever have it for read, since
|
||||
* nobody else could possibly have it for write.
|
||||
*/
|
||||
krwlock_t dp_config_rwlock;
|
||||
|
||||
zfs_all_blkstats_t *dp_blkstats;
|
||||
} dsl_pool_t;
|
||||
|
||||
int dsl_pool_open(spa_t *spa, uint64_t txg, dsl_pool_t **dpp);
|
||||
void dsl_pool_close(dsl_pool_t *dp);
|
||||
dsl_pool_t *dsl_pool_create(spa_t *spa, nvlist_t *zplprops, uint64_t txg);
|
||||
void dsl_pool_sync(dsl_pool_t *dp, uint64_t txg);
|
||||
void dsl_pool_sync_done(dsl_pool_t *dp, uint64_t txg);
|
||||
int dsl_pool_sync_context(dsl_pool_t *dp);
|
||||
uint64_t dsl_pool_adjustedsize(dsl_pool_t *dp, boolean_t netfree);
|
||||
uint64_t dsl_pool_adjustedfree(dsl_pool_t *dp, boolean_t netfree);
|
||||
int dsl_pool_tempreserve_space(dsl_pool_t *dp, uint64_t space, dmu_tx_t *tx);
|
||||
void dsl_pool_tempreserve_clear(dsl_pool_t *dp, int64_t space, dmu_tx_t *tx);
|
||||
void dsl_pool_memory_pressure(dsl_pool_t *dp);
|
||||
void dsl_pool_willuse_space(dsl_pool_t *dp, int64_t space, dmu_tx_t *tx);
|
||||
void dsl_free(dsl_pool_t *dp, uint64_t txg, const blkptr_t *bpp);
|
||||
void dsl_free_sync(zio_t *pio, dsl_pool_t *dp, uint64_t txg,
|
||||
const blkptr_t *bpp);
|
||||
int dsl_read(zio_t *pio, spa_t *spa, const blkptr_t *bpp, arc_buf_t *pbuf,
|
||||
arc_done_func_t *done, void *private, int priority, int zio_flags,
|
||||
uint32_t *arc_flags, const zbookmark_t *zb);
|
||||
int dsl_read_nolock(zio_t *pio, spa_t *spa, const blkptr_t *bpp,
|
||||
arc_done_func_t *done, void *private, int priority, int zio_flags,
|
||||
uint32_t *arc_flags, const zbookmark_t *zb);
|
||||
void dsl_pool_create_origin(dsl_pool_t *dp, dmu_tx_t *tx);
|
||||
void dsl_pool_upgrade_clones(dsl_pool_t *dp, dmu_tx_t *tx);
|
||||
void dsl_pool_upgrade_dir_clones(dsl_pool_t *dp, dmu_tx_t *tx);
|
||||
|
||||
taskq_t *dsl_pool_vnrele_taskq(dsl_pool_t *dp);
|
||||
|
||||
extern int dsl_pool_user_hold(dsl_pool_t *dp, uint64_t dsobj,
|
||||
const char *tag, uint64_t *now, dmu_tx_t *tx);
|
||||
extern int dsl_pool_user_release(dsl_pool_t *dp, uint64_t dsobj,
|
||||
const char *tag, dmu_tx_t *tx);
|
||||
extern void dsl_pool_clean_tmp_userrefs(dsl_pool_t *dp);
|
||||
int dsl_pool_open_special_dir(dsl_pool_t *dp, const char *name, dsl_dir_t **);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_DSL_POOL_H */
|
119
uts/common/fs/zfs/sys/dsl_prop.h
Normal file
119
uts/common/fs/zfs/sys/dsl_prop.h
Normal file
@ -0,0 +1,119 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_DSL_PROP_H
|
||||
#define _SYS_DSL_PROP_H
|
||||
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/dsl_pool.h>
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/dsl_synctask.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
struct dsl_dataset;
|
||||
struct dsl_dir;
|
||||
|
||||
/* The callback func may not call into the DMU or DSL! */
|
||||
typedef void (dsl_prop_changed_cb_t)(void *arg, uint64_t newval);
|
||||
|
||||
typedef struct dsl_prop_cb_record {
|
||||
list_node_t cbr_node; /* link on dd_prop_cbs */
|
||||
struct dsl_dataset *cbr_ds;
|
||||
const char *cbr_propname;
|
||||
dsl_prop_changed_cb_t *cbr_func;
|
||||
void *cbr_arg;
|
||||
} dsl_prop_cb_record_t;
|
||||
|
||||
typedef struct dsl_props_arg {
|
||||
nvlist_t *pa_props;
|
||||
zprop_source_t pa_source;
|
||||
} dsl_props_arg_t;
|
||||
|
||||
typedef struct dsl_prop_set_arg {
|
||||
const char *psa_name;
|
||||
zprop_source_t psa_source;
|
||||
int psa_intsz;
|
||||
int psa_numints;
|
||||
const void *psa_value;
|
||||
|
||||
/*
|
||||
* Used to handle the special requirements of the quota and reservation
|
||||
* properties.
|
||||
*/
|
||||
uint64_t psa_effective_value;
|
||||
} dsl_prop_setarg_t;
|
||||
|
||||
int dsl_prop_register(struct dsl_dataset *ds, const char *propname,
|
||||
dsl_prop_changed_cb_t *callback, void *cbarg);
|
||||
int dsl_prop_unregister(struct dsl_dataset *ds, const char *propname,
|
||||
dsl_prop_changed_cb_t *callback, void *cbarg);
|
||||
int dsl_prop_numcb(struct dsl_dataset *ds);
|
||||
|
||||
int dsl_prop_get(const char *ddname, const char *propname,
|
||||
int intsz, int numints, void *buf, char *setpoint);
|
||||
int dsl_prop_get_integer(const char *ddname, const char *propname,
|
||||
uint64_t *valuep, char *setpoint);
|
||||
int dsl_prop_get_all(objset_t *os, nvlist_t **nvp);
|
||||
int dsl_prop_get_received(objset_t *os, nvlist_t **nvp);
|
||||
int dsl_prop_get_ds(struct dsl_dataset *ds, const char *propname,
|
||||
int intsz, int numints, void *buf, char *setpoint);
|
||||
int dsl_prop_get_dd(struct dsl_dir *dd, const char *propname,
|
||||
int intsz, int numints, void *buf, char *setpoint,
|
||||
boolean_t snapshot);
|
||||
|
||||
dsl_syncfunc_t dsl_props_set_sync;
|
||||
int dsl_prop_set(const char *ddname, const char *propname,
|
||||
zprop_source_t source, int intsz, int numints, const void *buf);
|
||||
int dsl_props_set(const char *dsname, zprop_source_t source, nvlist_t *nvl);
|
||||
void dsl_dir_prop_set_uint64_sync(dsl_dir_t *dd, const char *name, uint64_t val,
|
||||
dmu_tx_t *tx);
|
||||
|
||||
void dsl_prop_setarg_init_uint64(dsl_prop_setarg_t *psa, const char *propname,
|
||||
zprop_source_t source, uint64_t *value);
|
||||
int dsl_prop_predict_sync(dsl_dir_t *dd, dsl_prop_setarg_t *psa);
|
||||
#ifdef ZFS_DEBUG
|
||||
void dsl_prop_check_prediction(dsl_dir_t *dd, dsl_prop_setarg_t *psa);
|
||||
#define DSL_PROP_CHECK_PREDICTION(dd, psa) \
|
||||
dsl_prop_check_prediction((dd), (psa))
|
||||
#else
|
||||
#define DSL_PROP_CHECK_PREDICTION(dd, psa) /* nothing */
|
||||
#endif
|
||||
|
||||
/* flag first receive on or after SPA_VERSION_RECVD_PROPS */
|
||||
boolean_t dsl_prop_get_hasrecvd(objset_t *os);
|
||||
void dsl_prop_set_hasrecvd(objset_t *os);
|
||||
void dsl_prop_unset_hasrecvd(objset_t *os);
|
||||
|
||||
void dsl_prop_nvlist_add_uint64(nvlist_t *nv, zfs_prop_t prop, uint64_t value);
|
||||
void dsl_prop_nvlist_add_string(nvlist_t *nv,
|
||||
zfs_prop_t prop, const char *value);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_DSL_PROP_H */
|
108
uts/common/fs/zfs/sys/dsl_scan.h
Normal file
108
uts/common/fs/zfs/sys/dsl_scan.h
Normal file
@ -0,0 +1,108 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_DSL_SCAN_H
|
||||
#define _SYS_DSL_SCAN_H
|
||||
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/zio.h>
|
||||
#include <sys/ddt.h>
|
||||
#include <sys/bplist.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
struct objset;
|
||||
struct dsl_dir;
|
||||
struct dsl_dataset;
|
||||
struct dsl_pool;
|
||||
struct dmu_tx;
|
||||
|
||||
/*
|
||||
* All members of this structure must be uint64_t, for byteswap
|
||||
* purposes.
|
||||
*/
|
||||
typedef struct dsl_scan_phys {
|
||||
uint64_t scn_func; /* pool_scan_func_t */
|
||||
uint64_t scn_state; /* dsl_scan_state_t */
|
||||
uint64_t scn_queue_obj;
|
||||
uint64_t scn_min_txg;
|
||||
uint64_t scn_max_txg;
|
||||
uint64_t scn_cur_min_txg;
|
||||
uint64_t scn_cur_max_txg;
|
||||
uint64_t scn_start_time;
|
||||
uint64_t scn_end_time;
|
||||
uint64_t scn_to_examine; /* total bytes to be scanned */
|
||||
uint64_t scn_examined; /* bytes scanned so far */
|
||||
uint64_t scn_to_process;
|
||||
uint64_t scn_processed;
|
||||
uint64_t scn_errors; /* scan I/O error count */
|
||||
uint64_t scn_ddt_class_max;
|
||||
ddt_bookmark_t scn_ddt_bookmark;
|
||||
zbookmark_t scn_bookmark;
|
||||
uint64_t scn_flags; /* dsl_scan_flags_t */
|
||||
} dsl_scan_phys_t;
|
||||
|
||||
#define SCAN_PHYS_NUMINTS (sizeof (dsl_scan_phys_t) / sizeof (uint64_t))
|
||||
|
||||
typedef enum dsl_scan_flags {
|
||||
DSF_VISIT_DS_AGAIN = 1<<0,
|
||||
} dsl_scan_flags_t;
|
||||
|
||||
typedef struct dsl_scan {
|
||||
struct dsl_pool *scn_dp;
|
||||
|
||||
boolean_t scn_pausing;
|
||||
uint64_t scn_restart_txg;
|
||||
uint64_t scn_sync_start_time;
|
||||
zio_t *scn_zio_root;
|
||||
|
||||
/* for debugging / information */
|
||||
uint64_t scn_visited_this_txg;
|
||||
|
||||
dsl_scan_phys_t scn_phys;
|
||||
} dsl_scan_t;
|
||||
|
||||
int dsl_scan_init(struct dsl_pool *dp, uint64_t txg);
|
||||
void dsl_scan_fini(struct dsl_pool *dp);
|
||||
void dsl_scan_sync(struct dsl_pool *, dmu_tx_t *);
|
||||
int dsl_scan_cancel(struct dsl_pool *);
|
||||
int dsl_scan(struct dsl_pool *, pool_scan_func_t);
|
||||
void dsl_resilver_restart(struct dsl_pool *, uint64_t txg);
|
||||
boolean_t dsl_scan_resilvering(struct dsl_pool *dp);
|
||||
boolean_t dsl_dataset_unstable(struct dsl_dataset *ds);
|
||||
void dsl_scan_ddt_entry(dsl_scan_t *scn, enum zio_checksum checksum,
|
||||
ddt_entry_t *dde, dmu_tx_t *tx);
|
||||
void dsl_scan_ds_destroyed(struct dsl_dataset *ds, struct dmu_tx *tx);
|
||||
void dsl_scan_ds_snapshotted(struct dsl_dataset *ds, struct dmu_tx *tx);
|
||||
void dsl_scan_ds_clone_swapped(struct dsl_dataset *ds1, struct dsl_dataset *ds2,
|
||||
struct dmu_tx *tx);
|
||||
boolean_t dsl_scan_active(dsl_scan_t *scn);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_DSL_SCAN_H */
|
79
uts/common/fs/zfs/sys/dsl_synctask.h
Normal file
79
uts/common/fs/zfs/sys/dsl_synctask.h
Normal file
@ -0,0 +1,79 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_DSL_SYNCTASK_H
|
||||
#define _SYS_DSL_SYNCTASK_H
|
||||
|
||||
#include <sys/txg.h>
|
||||
#include <sys/zfs_context.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
struct dsl_pool;
|
||||
|
||||
typedef int (dsl_checkfunc_t)(void *, void *, dmu_tx_t *);
|
||||
typedef void (dsl_syncfunc_t)(void *, void *, dmu_tx_t *);
|
||||
|
||||
typedef struct dsl_sync_task {
|
||||
list_node_t dst_node;
|
||||
dsl_checkfunc_t *dst_checkfunc;
|
||||
dsl_syncfunc_t *dst_syncfunc;
|
||||
void *dst_arg1;
|
||||
void *dst_arg2;
|
||||
int dst_err;
|
||||
} dsl_sync_task_t;
|
||||
|
||||
typedef struct dsl_sync_task_group {
|
||||
txg_node_t dstg_node;
|
||||
list_t dstg_tasks;
|
||||
struct dsl_pool *dstg_pool;
|
||||
uint64_t dstg_txg;
|
||||
int dstg_err;
|
||||
int dstg_space;
|
||||
boolean_t dstg_nowaiter;
|
||||
} dsl_sync_task_group_t;
|
||||
|
||||
dsl_sync_task_group_t *dsl_sync_task_group_create(struct dsl_pool *dp);
|
||||
void dsl_sync_task_create(dsl_sync_task_group_t *dstg,
|
||||
dsl_checkfunc_t *, dsl_syncfunc_t *,
|
||||
void *arg1, void *arg2, int blocks_modified);
|
||||
int dsl_sync_task_group_wait(dsl_sync_task_group_t *dstg);
|
||||
void dsl_sync_task_group_nowait(dsl_sync_task_group_t *dstg, dmu_tx_t *tx);
|
||||
void dsl_sync_task_group_destroy(dsl_sync_task_group_t *dstg);
|
||||
void dsl_sync_task_group_sync(dsl_sync_task_group_t *dstg, dmu_tx_t *tx);
|
||||
|
||||
int dsl_sync_task_do(struct dsl_pool *dp,
|
||||
dsl_checkfunc_t *checkfunc, dsl_syncfunc_t *syncfunc,
|
||||
void *arg1, void *arg2, int blocks_modified);
|
||||
void dsl_sync_task_do_nowait(struct dsl_pool *dp,
|
||||
dsl_checkfunc_t *checkfunc, dsl_syncfunc_t *syncfunc,
|
||||
void *arg1, void *arg2, int blocks_modified, dmu_tx_t *tx);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_DSL_SYNCTASK_H */
|
80
uts/common/fs/zfs/sys/metaslab.h
Normal file
80
uts/common/fs/zfs/sys/metaslab.h
Normal file
@ -0,0 +1,80 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_METASLAB_H
|
||||
#define _SYS_METASLAB_H
|
||||
|
||||
#include <sys/spa.h>
|
||||
#include <sys/space_map.h>
|
||||
#include <sys/txg.h>
|
||||
#include <sys/zio.h>
|
||||
#include <sys/avl.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
extern space_map_ops_t *zfs_metaslab_ops;
|
||||
|
||||
extern metaslab_t *metaslab_init(metaslab_group_t *mg, space_map_obj_t *smo,
|
||||
uint64_t start, uint64_t size, uint64_t txg);
|
||||
extern void metaslab_fini(metaslab_t *msp);
|
||||
extern void metaslab_sync(metaslab_t *msp, uint64_t txg);
|
||||
extern void metaslab_sync_done(metaslab_t *msp, uint64_t txg);
|
||||
extern void metaslab_sync_reassess(metaslab_group_t *mg);
|
||||
|
||||
#define METASLAB_HINTBP_FAVOR 0x0
|
||||
#define METASLAB_HINTBP_AVOID 0x1
|
||||
#define METASLAB_GANG_HEADER 0x2
|
||||
|
||||
extern int metaslab_alloc(spa_t *spa, metaslab_class_t *mc, uint64_t psize,
|
||||
blkptr_t *bp, int ncopies, uint64_t txg, blkptr_t *hintbp, int flags);
|
||||
extern void metaslab_free(spa_t *spa, const blkptr_t *bp, uint64_t txg,
|
||||
boolean_t now);
|
||||
extern int metaslab_claim(spa_t *spa, const blkptr_t *bp, uint64_t txg);
|
||||
|
||||
extern metaslab_class_t *metaslab_class_create(spa_t *spa,
|
||||
space_map_ops_t *ops);
|
||||
extern void metaslab_class_destroy(metaslab_class_t *mc);
|
||||
extern int metaslab_class_validate(metaslab_class_t *mc);
|
||||
|
||||
extern void metaslab_class_space_update(metaslab_class_t *mc,
|
||||
int64_t alloc_delta, int64_t defer_delta,
|
||||
int64_t space_delta, int64_t dspace_delta);
|
||||
extern uint64_t metaslab_class_get_alloc(metaslab_class_t *mc);
|
||||
extern uint64_t metaslab_class_get_space(metaslab_class_t *mc);
|
||||
extern uint64_t metaslab_class_get_dspace(metaslab_class_t *mc);
|
||||
extern uint64_t metaslab_class_get_deferred(metaslab_class_t *mc);
|
||||
|
||||
extern metaslab_group_t *metaslab_group_create(metaslab_class_t *mc,
|
||||
vdev_t *vd);
|
||||
extern void metaslab_group_destroy(metaslab_group_t *mg);
|
||||
extern void metaslab_group_activate(metaslab_group_t *mg);
|
||||
extern void metaslab_group_passivate(metaslab_group_t *mg);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_METASLAB_H */
|
89
uts/common/fs/zfs/sys/metaslab_impl.h
Normal file
89
uts/common/fs/zfs/sys/metaslab_impl.h
Normal file
@ -0,0 +1,89 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_METASLAB_IMPL_H
|
||||
#define _SYS_METASLAB_IMPL_H
|
||||
|
||||
#include <sys/metaslab.h>
|
||||
#include <sys/space_map.h>
|
||||
#include <sys/vdev.h>
|
||||
#include <sys/txg.h>
|
||||
#include <sys/avl.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
struct metaslab_class {
|
||||
spa_t *mc_spa;
|
||||
metaslab_group_t *mc_rotor;
|
||||
space_map_ops_t *mc_ops;
|
||||
uint64_t mc_aliquot;
|
||||
uint64_t mc_alloc; /* total allocated space */
|
||||
uint64_t mc_deferred; /* total deferred frees */
|
||||
uint64_t mc_space; /* total space (alloc + free) */
|
||||
uint64_t mc_dspace; /* total deflated space */
|
||||
};
|
||||
|
||||
struct metaslab_group {
|
||||
kmutex_t mg_lock;
|
||||
avl_tree_t mg_metaslab_tree;
|
||||
uint64_t mg_aliquot;
|
||||
uint64_t mg_bonus_area;
|
||||
int64_t mg_bias;
|
||||
int64_t mg_activation_count;
|
||||
metaslab_class_t *mg_class;
|
||||
vdev_t *mg_vd;
|
||||
metaslab_group_t *mg_prev;
|
||||
metaslab_group_t *mg_next;
|
||||
};
|
||||
|
||||
/*
|
||||
* Each metaslab's free space is tracked in space map object in the MOS,
|
||||
* which is only updated in syncing context. Each time we sync a txg,
|
||||
* we append the allocs and frees from that txg to the space map object.
|
||||
* When the txg is done syncing, metaslab_sync_done() updates ms_smo
|
||||
* to ms_smo_syncing. Everything in ms_smo is always safe to allocate.
|
||||
*/
|
||||
struct metaslab {
|
||||
kmutex_t ms_lock; /* metaslab lock */
|
||||
space_map_obj_t ms_smo; /* synced space map object */
|
||||
space_map_obj_t ms_smo_syncing; /* syncing space map object */
|
||||
space_map_t ms_allocmap[TXG_SIZE]; /* allocated this txg */
|
||||
space_map_t ms_freemap[TXG_SIZE]; /* freed this txg */
|
||||
space_map_t ms_defermap[TXG_DEFER_SIZE]; /* deferred frees */
|
||||
space_map_t ms_map; /* in-core free space map */
|
||||
int64_t ms_deferspace; /* sum of ms_defermap[] space */
|
||||
uint64_t ms_weight; /* weight vs. others in group */
|
||||
metaslab_group_t *ms_group; /* metaslab group */
|
||||
avl_node_t ms_group_node; /* node in metaslab group tree */
|
||||
txg_node_t ms_txg_node; /* per-txg dirty metaslab links */
|
||||
};
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_METASLAB_IMPL_H */
|
107
uts/common/fs/zfs/sys/refcount.h
Normal file
107
uts/common/fs/zfs/sys/refcount.h
Normal file
@ -0,0 +1,107 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_REFCOUNT_H
|
||||
#define _SYS_REFCOUNT_H
|
||||
|
||||
#include <sys/inttypes.h>
|
||||
#include <sys/list.h>
|
||||
#include <sys/zfs_context.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
/*
|
||||
* If the reference is held only by the calling function and not any
|
||||
* particular object, use FTAG (which is a string) for the holder_tag.
|
||||
* Otherwise, use the object that holds the reference.
|
||||
*/
|
||||
#define FTAG ((char *)__func__)
|
||||
|
||||
#ifdef ZFS_DEBUG
|
||||
typedef struct reference {
|
||||
list_node_t ref_link;
|
||||
void *ref_holder;
|
||||
uint64_t ref_number;
|
||||
uint8_t *ref_removed;
|
||||
} reference_t;
|
||||
|
||||
typedef struct refcount {
|
||||
kmutex_t rc_mtx;
|
||||
list_t rc_list;
|
||||
list_t rc_removed;
|
||||
int64_t rc_count;
|
||||
int64_t rc_removed_count;
|
||||
} refcount_t;
|
||||
|
||||
/* Note: refcount_t must be initialized with refcount_create() */
|
||||
|
||||
void refcount_create(refcount_t *rc);
|
||||
void refcount_destroy(refcount_t *rc);
|
||||
void refcount_destroy_many(refcount_t *rc, uint64_t number);
|
||||
int refcount_is_zero(refcount_t *rc);
|
||||
int64_t refcount_count(refcount_t *rc);
|
||||
int64_t refcount_add(refcount_t *rc, void *holder_tag);
|
||||
int64_t refcount_remove(refcount_t *rc, void *holder_tag);
|
||||
int64_t refcount_add_many(refcount_t *rc, uint64_t number, void *holder_tag);
|
||||
int64_t refcount_remove_many(refcount_t *rc, uint64_t number, void *holder_tag);
|
||||
void refcount_transfer(refcount_t *dst, refcount_t *src);
|
||||
|
||||
void refcount_init(void);
|
||||
void refcount_fini(void);
|
||||
|
||||
#else /* ZFS_DEBUG */
|
||||
|
||||
typedef struct refcount {
|
||||
uint64_t rc_count;
|
||||
} refcount_t;
|
||||
|
||||
#define refcount_create(rc) ((rc)->rc_count = 0)
|
||||
#define refcount_destroy(rc) ((rc)->rc_count = 0)
|
||||
#define refcount_destroy_many(rc, number) ((rc)->rc_count = 0)
|
||||
#define refcount_is_zero(rc) ((rc)->rc_count == 0)
|
||||
#define refcount_count(rc) ((rc)->rc_count)
|
||||
#define refcount_add(rc, holder) atomic_add_64_nv(&(rc)->rc_count, 1)
|
||||
#define refcount_remove(rc, holder) atomic_add_64_nv(&(rc)->rc_count, -1)
|
||||
#define refcount_add_many(rc, number, holder) \
|
||||
atomic_add_64_nv(&(rc)->rc_count, number)
|
||||
#define refcount_remove_many(rc, number, holder) \
|
||||
atomic_add_64_nv(&(rc)->rc_count, -number)
|
||||
#define refcount_transfer(dst, src) { \
|
||||
uint64_t __tmp = (src)->rc_count; \
|
||||
atomic_add_64(&(src)->rc_count, -__tmp); \
|
||||
atomic_add_64(&(dst)->rc_count, __tmp); \
|
||||
}
|
||||
|
||||
#define refcount_init()
|
||||
#define refcount_fini()
|
||||
|
||||
#endif /* ZFS_DEBUG */
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_REFCOUNT_H */
|
80
uts/common/fs/zfs/sys/rrwlock.h
Normal file
80
uts/common/fs/zfs/sys/rrwlock.h
Normal file
@ -0,0 +1,80 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2007 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_RR_RW_LOCK_H
|
||||
#define _SYS_RR_RW_LOCK_H
|
||||
|
||||
#pragma ident "%Z%%M% %I% %E% SMI"
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
#include <sys/inttypes.h>
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/refcount.h>
|
||||
|
||||
/*
|
||||
* A reader-writer lock implementation that allows re-entrant reads, but
|
||||
* still gives writers priority on "new" reads.
|
||||
*
|
||||
* See rrwlock.c for more details about the implementation.
|
||||
*
|
||||
* Fields of the rrwlock_t structure:
|
||||
* - rr_lock: protects modification and reading of rrwlock_t fields
|
||||
* - rr_cv: cv for waking up readers or waiting writers
|
||||
* - rr_writer: thread id of the current writer
|
||||
* - rr_anon_rount: number of active anonymous readers
|
||||
* - rr_linked_rcount: total number of non-anonymous active readers
|
||||
* - rr_writer_wanted: a writer wants the lock
|
||||
*/
|
||||
typedef struct rrwlock {
|
||||
kmutex_t rr_lock;
|
||||
kcondvar_t rr_cv;
|
||||
kthread_t *rr_writer;
|
||||
refcount_t rr_anon_rcount;
|
||||
refcount_t rr_linked_rcount;
|
||||
boolean_t rr_writer_wanted;
|
||||
} rrwlock_t;
|
||||
|
||||
/*
|
||||
* 'tag' is used in reference counting tracking. The
|
||||
* 'tag' must be the same in a rrw_enter() as in its
|
||||
* corresponding rrw_exit().
|
||||
*/
|
||||
void rrw_init(rrwlock_t *rrl);
|
||||
void rrw_destroy(rrwlock_t *rrl);
|
||||
void rrw_enter(rrwlock_t *rrl, krw_t rw, void *tag);
|
||||
void rrw_exit(rrwlock_t *rrl, void *tag);
|
||||
boolean_t rrw_held(rrwlock_t *rrl, krw_t rw);
|
||||
|
||||
#define RRW_READ_HELD(x) rrw_held(x, RW_READER)
|
||||
#define RRW_WRITE_HELD(x) rrw_held(x, RW_WRITER)
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_RR_RW_LOCK_H */
|
170
uts/common/fs/zfs/sys/sa.h
Normal file
170
uts/common/fs/zfs/sys/sa.h
Normal file
@ -0,0 +1,170 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_SA_H
|
||||
#define _SYS_SA_H
|
||||
|
||||
#include <sys/dmu.h>
|
||||
|
||||
/*
|
||||
* Currently available byteswap functions.
|
||||
* If it all possible new attributes should used
|
||||
* one of the already defined byteswap functions.
|
||||
* If a new byteswap function is added then the
|
||||
* ZPL/Pool version will need to be bumped.
|
||||
*/
|
||||
|
||||
typedef enum sa_bswap_type {
|
||||
SA_UINT64_ARRAY,
|
||||
SA_UINT32_ARRAY,
|
||||
SA_UINT16_ARRAY,
|
||||
SA_UINT8_ARRAY,
|
||||
SA_ACL,
|
||||
} sa_bswap_type_t;
|
||||
|
||||
typedef uint16_t sa_attr_type_t;
|
||||
|
||||
/*
|
||||
* Attribute to register support for.
|
||||
*/
|
||||
typedef struct sa_attr_reg {
|
||||
char *sa_name; /* attribute name */
|
||||
uint16_t sa_length;
|
||||
sa_bswap_type_t sa_byteswap; /* bswap functon enum */
|
||||
sa_attr_type_t sa_attr; /* filled in during registration */
|
||||
} sa_attr_reg_t;
|
||||
|
||||
|
||||
typedef void (sa_data_locator_t)(void **, uint32_t *, uint32_t,
|
||||
boolean_t, void *userptr);
|
||||
|
||||
/*
|
||||
* array of attributes to store.
|
||||
*
|
||||
* This array should be treated as opaque/private data.
|
||||
* The SA_BULK_ADD_ATTR() macro should be used for manipulating
|
||||
* the array.
|
||||
*
|
||||
* When sa_replace_all_by_template() is used the attributes
|
||||
* will be stored in the order defined in the array, except that
|
||||
* the attributes may be split between the bonus and the spill buffer
|
||||
*
|
||||
*/
|
||||
typedef struct sa_bulk_attr {
|
||||
void *sa_data;
|
||||
sa_data_locator_t *sa_data_func;
|
||||
uint16_t sa_length;
|
||||
sa_attr_type_t sa_attr;
|
||||
/* the following are private to the sa framework */
|
||||
void *sa_addr;
|
||||
uint16_t sa_buftype;
|
||||
uint16_t sa_size;
|
||||
} sa_bulk_attr_t;
|
||||
|
||||
|
||||
/*
|
||||
* special macro for adding entries for bulk attr support
|
||||
* bulk - sa_bulk_attr_t
|
||||
* count - integer that will be incremented during each add
|
||||
* attr - attribute to manipulate
|
||||
* func - function for accessing data.
|
||||
* data - pointer to data.
|
||||
* len - length of data
|
||||
*/
|
||||
|
||||
#define SA_ADD_BULK_ATTR(b, idx, attr, func, data, len) \
|
||||
{ \
|
||||
b[idx].sa_attr = attr;\
|
||||
b[idx].sa_data_func = func; \
|
||||
b[idx].sa_data = data; \
|
||||
b[idx++].sa_length = len; \
|
||||
}
|
||||
|
||||
typedef struct sa_os sa_os_t;
|
||||
|
||||
typedef enum sa_handle_type {
|
||||
SA_HDL_SHARED,
|
||||
SA_HDL_PRIVATE
|
||||
} sa_handle_type_t;
|
||||
|
||||
struct sa_handle;
|
||||
typedef void *sa_lookup_tab_t;
|
||||
typedef struct sa_handle sa_handle_t;
|
||||
|
||||
typedef void (sa_update_cb_t)(sa_handle_t *, dmu_tx_t *tx);
|
||||
|
||||
int sa_handle_get(objset_t *, uint64_t, void *userp,
|
||||
sa_handle_type_t, sa_handle_t **);
|
||||
int sa_handle_get_from_db(objset_t *, dmu_buf_t *, void *userp,
|
||||
sa_handle_type_t, sa_handle_t **);
|
||||
void sa_handle_destroy(sa_handle_t *);
|
||||
int sa_buf_hold(objset_t *, uint64_t, void *, dmu_buf_t **);
|
||||
void sa_buf_rele(dmu_buf_t *, void *);
|
||||
int sa_lookup(sa_handle_t *, sa_attr_type_t, void *buf, uint32_t buflen);
|
||||
int sa_update(sa_handle_t *, sa_attr_type_t, void *buf,
|
||||
uint32_t buflen, dmu_tx_t *);
|
||||
int sa_remove(sa_handle_t *, sa_attr_type_t, dmu_tx_t *);
|
||||
int sa_bulk_lookup(sa_handle_t *, sa_bulk_attr_t *, int count);
|
||||
int sa_bulk_lookup_locked(sa_handle_t *, sa_bulk_attr_t *, int count);
|
||||
int sa_bulk_update(sa_handle_t *, sa_bulk_attr_t *, int count, dmu_tx_t *);
|
||||
int sa_size(sa_handle_t *, sa_attr_type_t, int *);
|
||||
int sa_update_from_cb(sa_handle_t *, sa_attr_type_t,
|
||||
uint32_t buflen, sa_data_locator_t *, void *userdata, dmu_tx_t *);
|
||||
void sa_object_info(sa_handle_t *, dmu_object_info_t *);
|
||||
void sa_object_size(sa_handle_t *, uint32_t *, u_longlong_t *);
|
||||
void sa_update_user(sa_handle_t *, sa_handle_t *);
|
||||
void *sa_get_userdata(sa_handle_t *);
|
||||
void sa_set_userp(sa_handle_t *, void *);
|
||||
dmu_buf_t *sa_get_db(sa_handle_t *);
|
||||
uint64_t sa_handle_object(sa_handle_t *);
|
||||
boolean_t sa_attr_would_spill(sa_handle_t *, sa_attr_type_t, int size);
|
||||
void sa_register_update_callback(objset_t *, sa_update_cb_t *);
|
||||
int sa_setup(objset_t *, uint64_t, sa_attr_reg_t *, int, sa_attr_type_t **);
|
||||
void sa_tear_down(objset_t *);
|
||||
int sa_replace_all_by_template(sa_handle_t *, sa_bulk_attr_t *,
|
||||
int, dmu_tx_t *);
|
||||
int sa_replace_all_by_template_locked(sa_handle_t *, sa_bulk_attr_t *,
|
||||
int, dmu_tx_t *);
|
||||
boolean_t sa_enabled(objset_t *);
|
||||
void sa_cache_init();
|
||||
void sa_cache_fini();
|
||||
int sa_set_sa_object(objset_t *, uint64_t);
|
||||
int sa_hdrsize(void *);
|
||||
void sa_handle_lock(sa_handle_t *);
|
||||
void sa_handle_unlock(sa_handle_t *);
|
||||
|
||||
#ifdef _KERNEL
|
||||
int sa_lookup_uio(sa_handle_t *, sa_attr_type_t, uio_t *);
|
||||
#endif
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_SA_H */
|
287
uts/common/fs/zfs/sys/sa_impl.h
Normal file
287
uts/common/fs/zfs/sys/sa_impl.h
Normal file
@ -0,0 +1,287 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_SA_IMPL_H
|
||||
#define _SYS_SA_IMPL_H
|
||||
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/refcount.h>
|
||||
#include <sys/list.h>
|
||||
|
||||
/*
|
||||
* Array of known attributes and their
|
||||
* various characteristics.
|
||||
*/
|
||||
typedef struct sa_attr_table {
|
||||
sa_attr_type_t sa_attr;
|
||||
uint8_t sa_registered;
|
||||
uint16_t sa_length;
|
||||
sa_bswap_type_t sa_byteswap;
|
||||
char *sa_name;
|
||||
} sa_attr_table_t;
|
||||
|
||||
/*
|
||||
* Zap attribute format for attribute registration
|
||||
*
|
||||
* 64 56 48 40 32 24 16 8 0
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
* | unused | len | bswap | attr num |
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
*
|
||||
* Zap attribute format for layout information.
|
||||
*
|
||||
* layout information is stored as an array of attribute numbers
|
||||
* The name of the attribute is the layout number (0, 1, 2, ...)
|
||||
*
|
||||
* 16 0
|
||||
* +---- ---+
|
||||
* | attr # |
|
||||
* +--------+
|
||||
* | attr # |
|
||||
* +--- ----+
|
||||
* ......
|
||||
*
|
||||
*/
|
||||
|
||||
#define ATTR_BSWAP(x) BF32_GET(x, 16, 8)
|
||||
#define ATTR_LENGTH(x) BF32_GET(x, 24, 16)
|
||||
#define ATTR_NUM(x) BF32_GET(x, 0, 16)
|
||||
#define ATTR_ENCODE(x, attr, length, bswap) \
|
||||
{ \
|
||||
BF64_SET(x, 24, 16, length); \
|
||||
BF64_SET(x, 16, 8, bswap); \
|
||||
BF64_SET(x, 0, 16, attr); \
|
||||
}
|
||||
|
||||
#define TOC_OFF(x) BF32_GET(x, 0, 23)
|
||||
#define TOC_ATTR_PRESENT(x) BF32_GET(x, 31, 1)
|
||||
#define TOC_LEN_IDX(x) BF32_GET(x, 24, 4)
|
||||
#define TOC_ATTR_ENCODE(x, len_idx, offset) \
|
||||
{ \
|
||||
BF32_SET(x, 31, 1, 1); \
|
||||
BF32_SET(x, 24, 7, len_idx); \
|
||||
BF32_SET(x, 0, 24, offset); \
|
||||
}
|
||||
|
||||
#define SA_LAYOUTS "LAYOUTS"
|
||||
#define SA_REGISTRY "REGISTRY"
|
||||
|
||||
/*
|
||||
* Each unique layout will have their own table
|
||||
* sa_lot (layout_table)
|
||||
*/
|
||||
typedef struct sa_lot {
|
||||
avl_node_t lot_num_node;
|
||||
avl_node_t lot_hash_node;
|
||||
uint64_t lot_num;
|
||||
uint64_t lot_hash;
|
||||
sa_attr_type_t *lot_attrs; /* array of attr #'s */
|
||||
uint32_t lot_var_sizes; /* how many aren't fixed size */
|
||||
uint32_t lot_attr_count; /* total attr count */
|
||||
list_t lot_idx_tab; /* should be only a couple of entries */
|
||||
int lot_instance; /* used with lot_hash to identify entry */
|
||||
} sa_lot_t;
|
||||
|
||||
/* index table of offsets */
|
||||
typedef struct sa_idx_tab {
|
||||
list_node_t sa_next;
|
||||
sa_lot_t *sa_layout;
|
||||
uint16_t *sa_variable_lengths;
|
||||
refcount_t sa_refcount;
|
||||
uint32_t *sa_idx_tab; /* array of offsets */
|
||||
} sa_idx_tab_t;
|
||||
|
||||
/*
|
||||
* Since the offset/index information into the actual data
|
||||
* will usually be identical we can share that information with
|
||||
* all handles that have the exact same offsets.
|
||||
*
|
||||
* You would typically only have a large number of different table of
|
||||
* contents if you had a several variable sized attributes.
|
||||
*
|
||||
* Two AVL trees are used to track the attribute layout numbers.
|
||||
* one is keyed by number and will be consulted when a DMU_OT_SA
|
||||
* object is first read. The second tree is keyed by the hash signature
|
||||
* of the attributes and will be consulted when an attribute is added
|
||||
* to determine if we already have an instance of that layout. Both
|
||||
* of these tree's are interconnected. The only difference is that
|
||||
* when an entry is found in the "hash" tree the list of attributes will
|
||||
* need to be compared against the list of attributes you have in hand.
|
||||
* The assumption is that typically attributes will just be updated and
|
||||
* adding a completely new attribute is a very rare operation.
|
||||
*/
|
||||
struct sa_os {
|
||||
kmutex_t sa_lock;
|
||||
boolean_t sa_need_attr_registration;
|
||||
boolean_t sa_force_spill;
|
||||
uint64_t sa_master_obj;
|
||||
uint64_t sa_reg_attr_obj;
|
||||
uint64_t sa_layout_attr_obj;
|
||||
int sa_num_attrs;
|
||||
sa_attr_table_t *sa_attr_table; /* private attr table */
|
||||
sa_update_cb_t *sa_update_cb;
|
||||
avl_tree_t sa_layout_num_tree; /* keyed by layout number */
|
||||
avl_tree_t sa_layout_hash_tree; /* keyed by layout hash value */
|
||||
int sa_user_table_sz;
|
||||
sa_attr_type_t *sa_user_table; /* user name->attr mapping table */
|
||||
};
|
||||
|
||||
/*
|
||||
* header for all bonus and spill buffers.
|
||||
* The header has a fixed portion with a variable number
|
||||
* of "lengths" depending on the number of variable sized
|
||||
* attribues which are determined by the "layout number"
|
||||
*/
|
||||
|
||||
#define SA_MAGIC 0x2F505A /* ZFS SA */
|
||||
typedef struct sa_hdr_phys {
|
||||
uint32_t sa_magic;
|
||||
uint16_t sa_layout_info; /* Encoded with hdrsize and layout number */
|
||||
uint16_t sa_lengths[1]; /* optional sizes for variable length attrs */
|
||||
/* ... Data follows the lengths. */
|
||||
} sa_hdr_phys_t;
|
||||
|
||||
/*
|
||||
* sa_hdr_phys -> sa_layout_info
|
||||
*
|
||||
* 16 10 0
|
||||
* +--------+-------+
|
||||
* | hdrsz |layout |
|
||||
* +--------+-------+
|
||||
*
|
||||
* Bits 0-10 are the layout number
|
||||
* Bits 11-16 are the size of the header.
|
||||
* The hdrsize is the number * 8
|
||||
*
|
||||
* For example.
|
||||
* hdrsz of 1 ==> 8 byte header
|
||||
* 2 ==> 16 byte header
|
||||
*
|
||||
*/
|
||||
|
||||
#define SA_HDR_LAYOUT_NUM(hdr) BF32_GET(hdr->sa_layout_info, 0, 10)
|
||||
#define SA_HDR_SIZE(hdr) BF32_GET_SB(hdr->sa_layout_info, 10, 16, 3, 0)
|
||||
#define SA_HDR_LAYOUT_INFO_ENCODE(x, num, size) \
|
||||
{ \
|
||||
BF32_SET_SB(x, 10, 6, 3, 0, size); \
|
||||
BF32_SET(x, 0, 10, num); \
|
||||
}
|
||||
|
||||
typedef enum sa_buf_type {
|
||||
SA_BONUS = 1,
|
||||
SA_SPILL = 2
|
||||
} sa_buf_type_t;
|
||||
|
||||
typedef enum sa_data_op {
|
||||
SA_LOOKUP,
|
||||
SA_UPDATE,
|
||||
SA_ADD,
|
||||
SA_REPLACE,
|
||||
SA_REMOVE
|
||||
} sa_data_op_t;
|
||||
|
||||
/*
|
||||
* Opaque handle used for most sa functions
|
||||
*
|
||||
* This needs to be kept as small as possible.
|
||||
*/
|
||||
|
||||
struct sa_handle {
|
||||
kmutex_t sa_lock;
|
||||
dmu_buf_t *sa_bonus;
|
||||
dmu_buf_t *sa_spill;
|
||||
objset_t *sa_os;
|
||||
void *sa_userp;
|
||||
sa_idx_tab_t *sa_bonus_tab; /* idx of bonus */
|
||||
sa_idx_tab_t *sa_spill_tab; /* only present if spill activated */
|
||||
};
|
||||
|
||||
#define SA_GET_DB(hdl, type) \
|
||||
(dmu_buf_impl_t *)((type == SA_BONUS) ? hdl->sa_bonus : hdl->sa_spill)
|
||||
|
||||
#define SA_GET_HDR(hdl, type) \
|
||||
((sa_hdr_phys_t *)((dmu_buf_impl_t *)(SA_GET_DB(hdl, \
|
||||
type))->db.db_data))
|
||||
|
||||
#define SA_IDX_TAB_GET(hdl, type) \
|
||||
(type == SA_BONUS ? hdl->sa_bonus_tab : hdl->sa_spill_tab)
|
||||
|
||||
#define IS_SA_BONUSTYPE(a) \
|
||||
((a == DMU_OT_SA) ? B_TRUE : B_FALSE)
|
||||
|
||||
#define SA_BONUSTYPE_FROM_DB(db) \
|
||||
(dmu_get_bonustype((dmu_buf_t *)db))
|
||||
|
||||
#define SA_BLKPTR_SPACE (DN_MAX_BONUSLEN - sizeof (blkptr_t))
|
||||
|
||||
#define SA_LAYOUT_NUM(x, type) \
|
||||
((!IS_SA_BONUSTYPE(type) ? 0 : (((IS_SA_BONUSTYPE(type)) && \
|
||||
((SA_HDR_LAYOUT_NUM(x)) == 0)) ? 1 : SA_HDR_LAYOUT_NUM(x))))
|
||||
|
||||
|
||||
#define SA_REGISTERED_LEN(sa, attr) sa->sa_attr_table[attr].sa_length
|
||||
|
||||
#define SA_ATTR_LEN(sa, idx, attr, hdr) ((SA_REGISTERED_LEN(sa, attr) == 0) ?\
|
||||
hdr->sa_lengths[TOC_LEN_IDX(idx->sa_idx_tab[attr])] : \
|
||||
SA_REGISTERED_LEN(sa, attr))
|
||||
|
||||
#define SA_SET_HDR(hdr, num, size) \
|
||||
{ \
|
||||
hdr->sa_magic = SA_MAGIC; \
|
||||
SA_HDR_LAYOUT_INFO_ENCODE(hdr->sa_layout_info, num, size); \
|
||||
}
|
||||
|
||||
#define SA_ATTR_INFO(sa, idx, hdr, attr, bulk, type, hdl) \
|
||||
{ \
|
||||
bulk.sa_size = SA_ATTR_LEN(sa, idx, attr, hdr); \
|
||||
bulk.sa_buftype = type; \
|
||||
bulk.sa_addr = \
|
||||
(void *)((uintptr_t)TOC_OFF(idx->sa_idx_tab[attr]) + \
|
||||
(uintptr_t)hdr); \
|
||||
}
|
||||
|
||||
#define SA_HDR_SIZE_MATCH_LAYOUT(hdr, tb) \
|
||||
(SA_HDR_SIZE(hdr) == (sizeof (sa_hdr_phys_t) + \
|
||||
(tb->lot_var_sizes > 1 ? P2ROUNDUP((tb->lot_var_sizes - 1) * \
|
||||
sizeof (uint16_t), 8) : 0)))
|
||||
|
||||
int sa_add_impl(sa_handle_t *, sa_attr_type_t,
|
||||
uint32_t, sa_data_locator_t, void *, dmu_tx_t *);
|
||||
|
||||
void sa_register_update_callback_locked(objset_t *, sa_update_cb_t *);
|
||||
int sa_size_locked(sa_handle_t *, sa_attr_type_t, int *);
|
||||
|
||||
void sa_default_locator(void **, uint32_t *, uint32_t, boolean_t, void *);
|
||||
int sa_attr_size(sa_os_t *, sa_idx_tab_t *, sa_attr_type_t,
|
||||
uint16_t *, sa_hdr_phys_t *);
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_SA_IMPL_H */
|
706
uts/common/fs/zfs/sys/spa.h
Normal file
706
uts/common/fs/zfs/sys/spa.h
Normal file
@ -0,0 +1,706 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_SPA_H
|
||||
#define _SYS_SPA_H
|
||||
|
||||
#include <sys/avl.h>
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/nvpair.h>
|
||||
#include <sys/sysmacros.h>
|
||||
#include <sys/types.h>
|
||||
#include <sys/fs/zfs.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Forward references that lots of things need.
|
||||
*/
|
||||
typedef struct spa spa_t;
|
||||
typedef struct vdev vdev_t;
|
||||
typedef struct metaslab metaslab_t;
|
||||
typedef struct metaslab_group metaslab_group_t;
|
||||
typedef struct metaslab_class metaslab_class_t;
|
||||
typedef struct zio zio_t;
|
||||
typedef struct zilog zilog_t;
|
||||
typedef struct spa_aux_vdev spa_aux_vdev_t;
|
||||
typedef struct ddt ddt_t;
|
||||
typedef struct ddt_entry ddt_entry_t;
|
||||
struct dsl_pool;
|
||||
|
||||
/*
|
||||
* General-purpose 32-bit and 64-bit bitfield encodings.
|
||||
*/
|
||||
#define BF32_DECODE(x, low, len) P2PHASE((x) >> (low), 1U << (len))
|
||||
#define BF64_DECODE(x, low, len) P2PHASE((x) >> (low), 1ULL << (len))
|
||||
#define BF32_ENCODE(x, low, len) (P2PHASE((x), 1U << (len)) << (low))
|
||||
#define BF64_ENCODE(x, low, len) (P2PHASE((x), 1ULL << (len)) << (low))
|
||||
|
||||
#define BF32_GET(x, low, len) BF32_DECODE(x, low, len)
|
||||
#define BF64_GET(x, low, len) BF64_DECODE(x, low, len)
|
||||
|
||||
#define BF32_SET(x, low, len, val) \
|
||||
((x) ^= BF32_ENCODE((x >> low) ^ (val), low, len))
|
||||
#define BF64_SET(x, low, len, val) \
|
||||
((x) ^= BF64_ENCODE((x >> low) ^ (val), low, len))
|
||||
|
||||
#define BF32_GET_SB(x, low, len, shift, bias) \
|
||||
((BF32_GET(x, low, len) + (bias)) << (shift))
|
||||
#define BF64_GET_SB(x, low, len, shift, bias) \
|
||||
((BF64_GET(x, low, len) + (bias)) << (shift))
|
||||
|
||||
#define BF32_SET_SB(x, low, len, shift, bias, val) \
|
||||
BF32_SET(x, low, len, ((val) >> (shift)) - (bias))
|
||||
#define BF64_SET_SB(x, low, len, shift, bias, val) \
|
||||
BF64_SET(x, low, len, ((val) >> (shift)) - (bias))
|
||||
|
||||
/*
|
||||
* We currently support nine block sizes, from 512 bytes to 128K.
|
||||
* We could go higher, but the benefits are near-zero and the cost
|
||||
* of COWing a giant block to modify one byte would become excessive.
|
||||
*/
|
||||
#define SPA_MINBLOCKSHIFT 9
|
||||
#define SPA_MAXBLOCKSHIFT 17
|
||||
#define SPA_MINBLOCKSIZE (1ULL << SPA_MINBLOCKSHIFT)
|
||||
#define SPA_MAXBLOCKSIZE (1ULL << SPA_MAXBLOCKSHIFT)
|
||||
|
||||
#define SPA_BLOCKSIZES (SPA_MAXBLOCKSHIFT - SPA_MINBLOCKSHIFT + 1)
|
||||
|
||||
/*
|
||||
* Size of block to hold the configuration data (a packed nvlist)
|
||||
*/
|
||||
#define SPA_CONFIG_BLOCKSIZE (1 << 14)
|
||||
|
||||
/*
|
||||
* The DVA size encodings for LSIZE and PSIZE support blocks up to 32MB.
|
||||
* The ASIZE encoding should be at least 64 times larger (6 more bits)
|
||||
* to support up to 4-way RAID-Z mirror mode with worst-case gang block
|
||||
* overhead, three DVAs per bp, plus one more bit in case we do anything
|
||||
* else that expands the ASIZE.
|
||||
*/
|
||||
#define SPA_LSIZEBITS 16 /* LSIZE up to 32M (2^16 * 512) */
|
||||
#define SPA_PSIZEBITS 16 /* PSIZE up to 32M (2^16 * 512) */
|
||||
#define SPA_ASIZEBITS 24 /* ASIZE up to 64 times larger */
|
||||
|
||||
/*
|
||||
* All SPA data is represented by 128-bit data virtual addresses (DVAs).
|
||||
* The members of the dva_t should be considered opaque outside the SPA.
|
||||
*/
|
||||
typedef struct dva {
|
||||
uint64_t dva_word[2];
|
||||
} dva_t;
|
||||
|
||||
/*
|
||||
* Each block has a 256-bit checksum -- strong enough for cryptographic hashes.
|
||||
*/
|
||||
typedef struct zio_cksum {
|
||||
uint64_t zc_word[4];
|
||||
} zio_cksum_t;
|
||||
|
||||
/*
|
||||
* Each block is described by its DVAs, time of birth, checksum, etc.
|
||||
* The word-by-word, bit-by-bit layout of the blkptr is as follows:
|
||||
*
|
||||
* 64 56 48 40 32 24 16 8 0
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
* 0 | vdev1 | GRID | ASIZE |
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
* 1 |G| offset1 |
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
* 2 | vdev2 | GRID | ASIZE |
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
* 3 |G| offset2 |
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
* 4 | vdev3 | GRID | ASIZE |
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
* 5 |G| offset3 |
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
* 6 |BDX|lvl| type | cksum | comp | PSIZE | LSIZE |
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
* 7 | padding |
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
* 8 | padding |
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
* 9 | physical birth txg |
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
* a | logical birth txg |
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
* b | fill count |
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
* c | checksum[0] |
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
* d | checksum[1] |
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
* e | checksum[2] |
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
* f | checksum[3] |
|
||||
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
||||
*
|
||||
* Legend:
|
||||
*
|
||||
* vdev virtual device ID
|
||||
* offset offset into virtual device
|
||||
* LSIZE logical size
|
||||
* PSIZE physical size (after compression)
|
||||
* ASIZE allocated size (including RAID-Z parity and gang block headers)
|
||||
* GRID RAID-Z layout information (reserved for future use)
|
||||
* cksum checksum function
|
||||
* comp compression function
|
||||
* G gang block indicator
|
||||
* B byteorder (endianness)
|
||||
* D dedup
|
||||
* X unused
|
||||
* lvl level of indirection
|
||||
* type DMU object type
|
||||
* phys birth txg of block allocation; zero if same as logical birth txg
|
||||
* log. birth transaction group in which the block was logically born
|
||||
* fill count number of non-zero blocks under this bp
|
||||
* checksum[4] 256-bit checksum of the data this bp describes
|
||||
*/
|
||||
#define SPA_BLKPTRSHIFT 7 /* blkptr_t is 128 bytes */
|
||||
#define SPA_DVAS_PER_BP 3 /* Number of DVAs in a bp */
|
||||
|
||||
typedef struct blkptr {
|
||||
dva_t blk_dva[SPA_DVAS_PER_BP]; /* Data Virtual Addresses */
|
||||
uint64_t blk_prop; /* size, compression, type, etc */
|
||||
uint64_t blk_pad[2]; /* Extra space for the future */
|
||||
uint64_t blk_phys_birth; /* txg when block was allocated */
|
||||
uint64_t blk_birth; /* transaction group at birth */
|
||||
uint64_t blk_fill; /* fill count */
|
||||
zio_cksum_t blk_cksum; /* 256-bit checksum */
|
||||
} blkptr_t;
|
||||
|
||||
/*
|
||||
* Macros to get and set fields in a bp or DVA.
|
||||
*/
|
||||
#define DVA_GET_ASIZE(dva) \
|
||||
BF64_GET_SB((dva)->dva_word[0], 0, 24, SPA_MINBLOCKSHIFT, 0)
|
||||
#define DVA_SET_ASIZE(dva, x) \
|
||||
BF64_SET_SB((dva)->dva_word[0], 0, 24, SPA_MINBLOCKSHIFT, 0, x)
|
||||
|
||||
#define DVA_GET_GRID(dva) BF64_GET((dva)->dva_word[0], 24, 8)
|
||||
#define DVA_SET_GRID(dva, x) BF64_SET((dva)->dva_word[0], 24, 8, x)
|
||||
|
||||
#define DVA_GET_VDEV(dva) BF64_GET((dva)->dva_word[0], 32, 32)
|
||||
#define DVA_SET_VDEV(dva, x) BF64_SET((dva)->dva_word[0], 32, 32, x)
|
||||
|
||||
#define DVA_GET_OFFSET(dva) \
|
||||
BF64_GET_SB((dva)->dva_word[1], 0, 63, SPA_MINBLOCKSHIFT, 0)
|
||||
#define DVA_SET_OFFSET(dva, x) \
|
||||
BF64_SET_SB((dva)->dva_word[1], 0, 63, SPA_MINBLOCKSHIFT, 0, x)
|
||||
|
||||
#define DVA_GET_GANG(dva) BF64_GET((dva)->dva_word[1], 63, 1)
|
||||
#define DVA_SET_GANG(dva, x) BF64_SET((dva)->dva_word[1], 63, 1, x)
|
||||
|
||||
#define BP_GET_LSIZE(bp) \
|
||||
BF64_GET_SB((bp)->blk_prop, 0, 16, SPA_MINBLOCKSHIFT, 1)
|
||||
#define BP_SET_LSIZE(bp, x) \
|
||||
BF64_SET_SB((bp)->blk_prop, 0, 16, SPA_MINBLOCKSHIFT, 1, x)
|
||||
|
||||
#define BP_GET_PSIZE(bp) \
|
||||
BF64_GET_SB((bp)->blk_prop, 16, 16, SPA_MINBLOCKSHIFT, 1)
|
||||
#define BP_SET_PSIZE(bp, x) \
|
||||
BF64_SET_SB((bp)->blk_prop, 16, 16, SPA_MINBLOCKSHIFT, 1, x)
|
||||
|
||||
#define BP_GET_COMPRESS(bp) BF64_GET((bp)->blk_prop, 32, 8)
|
||||
#define BP_SET_COMPRESS(bp, x) BF64_SET((bp)->blk_prop, 32, 8, x)
|
||||
|
||||
#define BP_GET_CHECKSUM(bp) BF64_GET((bp)->blk_prop, 40, 8)
|
||||
#define BP_SET_CHECKSUM(bp, x) BF64_SET((bp)->blk_prop, 40, 8, x)
|
||||
|
||||
#define BP_GET_TYPE(bp) BF64_GET((bp)->blk_prop, 48, 8)
|
||||
#define BP_SET_TYPE(bp, x) BF64_SET((bp)->blk_prop, 48, 8, x)
|
||||
|
||||
#define BP_GET_LEVEL(bp) BF64_GET((bp)->blk_prop, 56, 5)
|
||||
#define BP_SET_LEVEL(bp, x) BF64_SET((bp)->blk_prop, 56, 5, x)
|
||||
|
||||
#define BP_GET_PROP_BIT_61(bp) BF64_GET((bp)->blk_prop, 61, 1)
|
||||
#define BP_SET_PROP_BIT_61(bp, x) BF64_SET((bp)->blk_prop, 61, 1, x)
|
||||
|
||||
#define BP_GET_DEDUP(bp) BF64_GET((bp)->blk_prop, 62, 1)
|
||||
#define BP_SET_DEDUP(bp, x) BF64_SET((bp)->blk_prop, 62, 1, x)
|
||||
|
||||
#define BP_GET_BYTEORDER(bp) (0 - BF64_GET((bp)->blk_prop, 63, 1))
|
||||
#define BP_SET_BYTEORDER(bp, x) BF64_SET((bp)->blk_prop, 63, 1, x)
|
||||
|
||||
#define BP_PHYSICAL_BIRTH(bp) \
|
||||
((bp)->blk_phys_birth ? (bp)->blk_phys_birth : (bp)->blk_birth)
|
||||
|
||||
#define BP_SET_BIRTH(bp, logical, physical) \
|
||||
{ \
|
||||
(bp)->blk_birth = (logical); \
|
||||
(bp)->blk_phys_birth = ((logical) == (physical) ? 0 : (physical)); \
|
||||
}
|
||||
|
||||
#define BP_GET_ASIZE(bp) \
|
||||
(DVA_GET_ASIZE(&(bp)->blk_dva[0]) + DVA_GET_ASIZE(&(bp)->blk_dva[1]) + \
|
||||
DVA_GET_ASIZE(&(bp)->blk_dva[2]))
|
||||
|
||||
#define BP_GET_UCSIZE(bp) \
|
||||
((BP_GET_LEVEL(bp) > 0 || dmu_ot[BP_GET_TYPE(bp)].ot_metadata) ? \
|
||||
BP_GET_PSIZE(bp) : BP_GET_LSIZE(bp))
|
||||
|
||||
#define BP_GET_NDVAS(bp) \
|
||||
(!!DVA_GET_ASIZE(&(bp)->blk_dva[0]) + \
|
||||
!!DVA_GET_ASIZE(&(bp)->blk_dva[1]) + \
|
||||
!!DVA_GET_ASIZE(&(bp)->blk_dva[2]))
|
||||
|
||||
#define BP_COUNT_GANG(bp) \
|
||||
(DVA_GET_GANG(&(bp)->blk_dva[0]) + \
|
||||
DVA_GET_GANG(&(bp)->blk_dva[1]) + \
|
||||
DVA_GET_GANG(&(bp)->blk_dva[2]))
|
||||
|
||||
#define DVA_EQUAL(dva1, dva2) \
|
||||
((dva1)->dva_word[1] == (dva2)->dva_word[1] && \
|
||||
(dva1)->dva_word[0] == (dva2)->dva_word[0])
|
||||
|
||||
#define BP_EQUAL(bp1, bp2) \
|
||||
(BP_PHYSICAL_BIRTH(bp1) == BP_PHYSICAL_BIRTH(bp2) && \
|
||||
DVA_EQUAL(&(bp1)->blk_dva[0], &(bp2)->blk_dva[0]) && \
|
||||
DVA_EQUAL(&(bp1)->blk_dva[1], &(bp2)->blk_dva[1]) && \
|
||||
DVA_EQUAL(&(bp1)->blk_dva[2], &(bp2)->blk_dva[2]))
|
||||
|
||||
#define ZIO_CHECKSUM_EQUAL(zc1, zc2) \
|
||||
(0 == (((zc1).zc_word[0] - (zc2).zc_word[0]) | \
|
||||
((zc1).zc_word[1] - (zc2).zc_word[1]) | \
|
||||
((zc1).zc_word[2] - (zc2).zc_word[2]) | \
|
||||
((zc1).zc_word[3] - (zc2).zc_word[3])))
|
||||
|
||||
#define DVA_IS_VALID(dva) (DVA_GET_ASIZE(dva) != 0)
|
||||
|
||||
#define ZIO_SET_CHECKSUM(zcp, w0, w1, w2, w3) \
|
||||
{ \
|
||||
(zcp)->zc_word[0] = w0; \
|
||||
(zcp)->zc_word[1] = w1; \
|
||||
(zcp)->zc_word[2] = w2; \
|
||||
(zcp)->zc_word[3] = w3; \
|
||||
}
|
||||
|
||||
#define BP_IDENTITY(bp) (&(bp)->blk_dva[0])
|
||||
#define BP_IS_GANG(bp) DVA_GET_GANG(BP_IDENTITY(bp))
|
||||
#define BP_IS_HOLE(bp) ((bp)->blk_birth == 0)
|
||||
|
||||
/* BP_IS_RAIDZ(bp) assumes no block compression */
|
||||
#define BP_IS_RAIDZ(bp) (DVA_GET_ASIZE(&(bp)->blk_dva[0]) > \
|
||||
BP_GET_PSIZE(bp))
|
||||
|
||||
#define BP_ZERO(bp) \
|
||||
{ \
|
||||
(bp)->blk_dva[0].dva_word[0] = 0; \
|
||||
(bp)->blk_dva[0].dva_word[1] = 0; \
|
||||
(bp)->blk_dva[1].dva_word[0] = 0; \
|
||||
(bp)->blk_dva[1].dva_word[1] = 0; \
|
||||
(bp)->blk_dva[2].dva_word[0] = 0; \
|
||||
(bp)->blk_dva[2].dva_word[1] = 0; \
|
||||
(bp)->blk_prop = 0; \
|
||||
(bp)->blk_pad[0] = 0; \
|
||||
(bp)->blk_pad[1] = 0; \
|
||||
(bp)->blk_phys_birth = 0; \
|
||||
(bp)->blk_birth = 0; \
|
||||
(bp)->blk_fill = 0; \
|
||||
ZIO_SET_CHECKSUM(&(bp)->blk_cksum, 0, 0, 0, 0); \
|
||||
}
|
||||
|
||||
/*
|
||||
* Note: the byteorder is either 0 or -1, both of which are palindromes.
|
||||
* This simplifies the endianness handling a bit.
|
||||
*/
|
||||
#ifdef _BIG_ENDIAN
|
||||
#define ZFS_HOST_BYTEORDER (0ULL)
|
||||
#else
|
||||
#define ZFS_HOST_BYTEORDER (-1ULL)
|
||||
#endif
|
||||
|
||||
#define BP_SHOULD_BYTESWAP(bp) (BP_GET_BYTEORDER(bp) != ZFS_HOST_BYTEORDER)
|
||||
|
||||
#define BP_SPRINTF_LEN 320
|
||||
|
||||
/*
|
||||
* This macro allows code sharing between zfs, libzpool, and mdb.
|
||||
* 'func' is either snprintf() or mdb_snprintf().
|
||||
* 'ws' (whitespace) can be ' ' for single-line format, '\n' for multi-line.
|
||||
*/
|
||||
#define SPRINTF_BLKPTR(func, ws, buf, bp, type, checksum, compress) \
|
||||
{ \
|
||||
static const char *copyname[] = \
|
||||
{ "zero", "single", "double", "triple" }; \
|
||||
int size = BP_SPRINTF_LEN; \
|
||||
int len = 0; \
|
||||
int copies = 0; \
|
||||
\
|
||||
if (bp == NULL) { \
|
||||
len = func(buf + len, size - len, "<NULL>"); \
|
||||
} else if (BP_IS_HOLE(bp)) { \
|
||||
len = func(buf + len, size - len, "<hole>"); \
|
||||
} else { \
|
||||
for (int d = 0; d < BP_GET_NDVAS(bp); d++) { \
|
||||
const dva_t *dva = &bp->blk_dva[d]; \
|
||||
if (DVA_IS_VALID(dva)) \
|
||||
copies++; \
|
||||
len += func(buf + len, size - len, \
|
||||
"DVA[%d]=<%llu:%llx:%llx>%c", d, \
|
||||
(u_longlong_t)DVA_GET_VDEV(dva), \
|
||||
(u_longlong_t)DVA_GET_OFFSET(dva), \
|
||||
(u_longlong_t)DVA_GET_ASIZE(dva), \
|
||||
ws); \
|
||||
} \
|
||||
if (BP_IS_GANG(bp) && \
|
||||
DVA_GET_ASIZE(&bp->blk_dva[2]) <= \
|
||||
DVA_GET_ASIZE(&bp->blk_dva[1]) / 2) \
|
||||
copies--; \
|
||||
len += func(buf + len, size - len, \
|
||||
"[L%llu %s] %s %s %s %s %s %s%c" \
|
||||
"size=%llxL/%llxP birth=%lluL/%lluP fill=%llu%c" \
|
||||
"cksum=%llx:%llx:%llx:%llx", \
|
||||
(u_longlong_t)BP_GET_LEVEL(bp), \
|
||||
type, \
|
||||
checksum, \
|
||||
compress, \
|
||||
BP_GET_BYTEORDER(bp) == 0 ? "BE" : "LE", \
|
||||
BP_IS_GANG(bp) ? "gang" : "contiguous", \
|
||||
BP_GET_DEDUP(bp) ? "dedup" : "unique", \
|
||||
copyname[copies], \
|
||||
ws, \
|
||||
(u_longlong_t)BP_GET_LSIZE(bp), \
|
||||
(u_longlong_t)BP_GET_PSIZE(bp), \
|
||||
(u_longlong_t)bp->blk_birth, \
|
||||
(u_longlong_t)BP_PHYSICAL_BIRTH(bp), \
|
||||
(u_longlong_t)bp->blk_fill, \
|
||||
ws, \
|
||||
(u_longlong_t)bp->blk_cksum.zc_word[0], \
|
||||
(u_longlong_t)bp->blk_cksum.zc_word[1], \
|
||||
(u_longlong_t)bp->blk_cksum.zc_word[2], \
|
||||
(u_longlong_t)bp->blk_cksum.zc_word[3]); \
|
||||
} \
|
||||
ASSERT(len < size); \
|
||||
}
|
||||
|
||||
#include <sys/dmu.h>
|
||||
|
||||
#define BP_GET_BUFC_TYPE(bp) \
|
||||
(((BP_GET_LEVEL(bp) > 0) || (dmu_ot[BP_GET_TYPE(bp)].ot_metadata)) ? \
|
||||
ARC_BUFC_METADATA : ARC_BUFC_DATA);
|
||||
|
||||
typedef enum spa_import_type {
|
||||
SPA_IMPORT_EXISTING,
|
||||
SPA_IMPORT_ASSEMBLE
|
||||
} spa_import_type_t;
|
||||
|
||||
/* state manipulation functions */
|
||||
extern int spa_open(const char *pool, spa_t **, void *tag);
|
||||
extern int spa_open_rewind(const char *pool, spa_t **, void *tag,
|
||||
nvlist_t *policy, nvlist_t **config);
|
||||
extern int spa_get_stats(const char *pool, nvlist_t **config,
|
||||
char *altroot, size_t buflen);
|
||||
extern int spa_create(const char *pool, nvlist_t *config, nvlist_t *props,
|
||||
const char *history_str, nvlist_t *zplprops);
|
||||
extern int spa_import_rootpool(char *devpath, char *devid);
|
||||
extern int spa_import(const char *pool, nvlist_t *config, nvlist_t *props,
|
||||
uint64_t flags);
|
||||
extern nvlist_t *spa_tryimport(nvlist_t *tryconfig);
|
||||
extern int spa_destroy(char *pool);
|
||||
extern int spa_export(char *pool, nvlist_t **oldconfig, boolean_t force,
|
||||
boolean_t hardforce);
|
||||
extern int spa_reset(char *pool);
|
||||
extern void spa_async_request(spa_t *spa, int flag);
|
||||
extern void spa_async_unrequest(spa_t *spa, int flag);
|
||||
extern void spa_async_suspend(spa_t *spa);
|
||||
extern void spa_async_resume(spa_t *spa);
|
||||
extern spa_t *spa_inject_addref(char *pool);
|
||||
extern void spa_inject_delref(spa_t *spa);
|
||||
extern void spa_scan_stat_init(spa_t *spa);
|
||||
extern int spa_scan_get_stats(spa_t *spa, pool_scan_stat_t *ps);
|
||||
|
||||
#define SPA_ASYNC_CONFIG_UPDATE 0x01
|
||||
#define SPA_ASYNC_REMOVE 0x02
|
||||
#define SPA_ASYNC_PROBE 0x04
|
||||
#define SPA_ASYNC_RESILVER_DONE 0x08
|
||||
#define SPA_ASYNC_RESILVER 0x10
|
||||
#define SPA_ASYNC_AUTOEXPAND 0x20
|
||||
#define SPA_ASYNC_REMOVE_DONE 0x40
|
||||
#define SPA_ASYNC_REMOVE_STOP 0x80
|
||||
|
||||
/*
|
||||
* Controls the behavior of spa_vdev_remove().
|
||||
*/
|
||||
#define SPA_REMOVE_UNSPARE 0x01
|
||||
#define SPA_REMOVE_DONE 0x02
|
||||
|
||||
/* device manipulation */
|
||||
extern int spa_vdev_add(spa_t *spa, nvlist_t *nvroot);
|
||||
extern int spa_vdev_attach(spa_t *spa, uint64_t guid, nvlist_t *nvroot,
|
||||
int replacing);
|
||||
extern int spa_vdev_detach(spa_t *spa, uint64_t guid, uint64_t pguid,
|
||||
int replace_done);
|
||||
extern int spa_vdev_remove(spa_t *spa, uint64_t guid, boolean_t unspare);
|
||||
extern boolean_t spa_vdev_remove_active(spa_t *spa);
|
||||
extern int spa_vdev_setpath(spa_t *spa, uint64_t guid, const char *newpath);
|
||||
extern int spa_vdev_setfru(spa_t *spa, uint64_t guid, const char *newfru);
|
||||
extern int spa_vdev_split_mirror(spa_t *spa, char *newname, nvlist_t *config,
|
||||
nvlist_t *props, boolean_t exp);
|
||||
|
||||
/* spare state (which is global across all pools) */
|
||||
extern void spa_spare_add(vdev_t *vd);
|
||||
extern void spa_spare_remove(vdev_t *vd);
|
||||
extern boolean_t spa_spare_exists(uint64_t guid, uint64_t *pool, int *refcnt);
|
||||
extern void spa_spare_activate(vdev_t *vd);
|
||||
|
||||
/* L2ARC state (which is global across all pools) */
|
||||
extern void spa_l2cache_add(vdev_t *vd);
|
||||
extern void spa_l2cache_remove(vdev_t *vd);
|
||||
extern boolean_t spa_l2cache_exists(uint64_t guid, uint64_t *pool);
|
||||
extern void spa_l2cache_activate(vdev_t *vd);
|
||||
extern void spa_l2cache_drop(spa_t *spa);
|
||||
|
||||
/* scanning */
|
||||
extern int spa_scan(spa_t *spa, pool_scan_func_t func);
|
||||
extern int spa_scan_stop(spa_t *spa);
|
||||
|
||||
/* spa syncing */
|
||||
extern void spa_sync(spa_t *spa, uint64_t txg); /* only for DMU use */
|
||||
extern void spa_sync_allpools(void);
|
||||
|
||||
/*
|
||||
* DEFERRED_FREE must be large enough that regular blocks are not
|
||||
* deferred. XXX so can't we change it back to 1?
|
||||
*/
|
||||
#define SYNC_PASS_DEFERRED_FREE 2 /* defer frees after this pass */
|
||||
#define SYNC_PASS_DONT_COMPRESS 4 /* don't compress after this pass */
|
||||
#define SYNC_PASS_REWRITE 1 /* rewrite new bps after this pass */
|
||||
|
||||
/* spa namespace global mutex */
|
||||
extern kmutex_t spa_namespace_lock;
|
||||
|
||||
/*
|
||||
* SPA configuration functions in spa_config.c
|
||||
*/
|
||||
|
||||
#define SPA_CONFIG_UPDATE_POOL 0
|
||||
#define SPA_CONFIG_UPDATE_VDEVS 1
|
||||
|
||||
extern void spa_config_sync(spa_t *, boolean_t, boolean_t);
|
||||
extern void spa_config_load(void);
|
||||
extern nvlist_t *spa_all_configs(uint64_t *);
|
||||
extern void spa_config_set(spa_t *spa, nvlist_t *config);
|
||||
extern nvlist_t *spa_config_generate(spa_t *spa, vdev_t *vd, uint64_t txg,
|
||||
int getstats);
|
||||
extern void spa_config_update(spa_t *spa, int what);
|
||||
|
||||
/*
|
||||
* Miscellaneous SPA routines in spa_misc.c
|
||||
*/
|
||||
|
||||
/* Namespace manipulation */
|
||||
extern spa_t *spa_lookup(const char *name);
|
||||
extern spa_t *spa_add(const char *name, nvlist_t *config, const char *altroot);
|
||||
extern void spa_remove(spa_t *spa);
|
||||
extern spa_t *spa_next(spa_t *prev);
|
||||
|
||||
/* Refcount functions */
|
||||
extern void spa_open_ref(spa_t *spa, void *tag);
|
||||
extern void spa_close(spa_t *spa, void *tag);
|
||||
extern boolean_t spa_refcount_zero(spa_t *spa);
|
||||
|
||||
#define SCL_NONE 0x00
|
||||
#define SCL_CONFIG 0x01
|
||||
#define SCL_STATE 0x02
|
||||
#define SCL_L2ARC 0x04 /* hack until L2ARC 2.0 */
|
||||
#define SCL_ALLOC 0x08
|
||||
#define SCL_ZIO 0x10
|
||||
#define SCL_FREE 0x20
|
||||
#define SCL_VDEV 0x40
|
||||
#define SCL_LOCKS 7
|
||||
#define SCL_ALL ((1 << SCL_LOCKS) - 1)
|
||||
#define SCL_STATE_ALL (SCL_STATE | SCL_L2ARC | SCL_ZIO)
|
||||
|
||||
/* Pool configuration locks */
|
||||
extern int spa_config_tryenter(spa_t *spa, int locks, void *tag, krw_t rw);
|
||||
extern void spa_config_enter(spa_t *spa, int locks, void *tag, krw_t rw);
|
||||
extern void spa_config_exit(spa_t *spa, int locks, void *tag);
|
||||
extern int spa_config_held(spa_t *spa, int locks, krw_t rw);
|
||||
|
||||
/* Pool vdev add/remove lock */
|
||||
extern uint64_t spa_vdev_enter(spa_t *spa);
|
||||
extern uint64_t spa_vdev_config_enter(spa_t *spa);
|
||||
extern void spa_vdev_config_exit(spa_t *spa, vdev_t *vd, uint64_t txg,
|
||||
int error, char *tag);
|
||||
extern int spa_vdev_exit(spa_t *spa, vdev_t *vd, uint64_t txg, int error);
|
||||
|
||||
/* Pool vdev state change lock */
|
||||
extern void spa_vdev_state_enter(spa_t *spa, int oplock);
|
||||
extern int spa_vdev_state_exit(spa_t *spa, vdev_t *vd, int error);
|
||||
|
||||
/* Log state */
|
||||
typedef enum spa_log_state {
|
||||
SPA_LOG_UNKNOWN = 0, /* unknown log state */
|
||||
SPA_LOG_MISSING, /* missing log(s) */
|
||||
SPA_LOG_CLEAR, /* clear the log(s) */
|
||||
SPA_LOG_GOOD, /* log(s) are good */
|
||||
} spa_log_state_t;
|
||||
|
||||
extern spa_log_state_t spa_get_log_state(spa_t *spa);
|
||||
extern void spa_set_log_state(spa_t *spa, spa_log_state_t state);
|
||||
extern int spa_offline_log(spa_t *spa);
|
||||
|
||||
/* Log claim callback */
|
||||
extern void spa_claim_notify(zio_t *zio);
|
||||
|
||||
/* Accessor functions */
|
||||
extern boolean_t spa_shutting_down(spa_t *spa);
|
||||
extern struct dsl_pool *spa_get_dsl(spa_t *spa);
|
||||
extern blkptr_t *spa_get_rootblkptr(spa_t *spa);
|
||||
extern void spa_set_rootblkptr(spa_t *spa, const blkptr_t *bp);
|
||||
extern void spa_altroot(spa_t *, char *, size_t);
|
||||
extern int spa_sync_pass(spa_t *spa);
|
||||
extern char *spa_name(spa_t *spa);
|
||||
extern uint64_t spa_guid(spa_t *spa);
|
||||
extern uint64_t spa_last_synced_txg(spa_t *spa);
|
||||
extern uint64_t spa_first_txg(spa_t *spa);
|
||||
extern uint64_t spa_syncing_txg(spa_t *spa);
|
||||
extern uint64_t spa_version(spa_t *spa);
|
||||
extern pool_state_t spa_state(spa_t *spa);
|
||||
extern spa_load_state_t spa_load_state(spa_t *spa);
|
||||
extern uint64_t spa_freeze_txg(spa_t *spa);
|
||||
extern uint64_t spa_get_asize(spa_t *spa, uint64_t lsize);
|
||||
extern uint64_t spa_get_dspace(spa_t *spa);
|
||||
extern void spa_update_dspace(spa_t *spa);
|
||||
extern uint64_t spa_version(spa_t *spa);
|
||||
extern boolean_t spa_deflate(spa_t *spa);
|
||||
extern metaslab_class_t *spa_normal_class(spa_t *spa);
|
||||
extern metaslab_class_t *spa_log_class(spa_t *spa);
|
||||
extern int spa_max_replication(spa_t *spa);
|
||||
extern int spa_prev_software_version(spa_t *spa);
|
||||
extern int spa_busy(void);
|
||||
extern uint8_t spa_get_failmode(spa_t *spa);
|
||||
extern boolean_t spa_suspended(spa_t *spa);
|
||||
extern uint64_t spa_bootfs(spa_t *spa);
|
||||
extern uint64_t spa_delegation(spa_t *spa);
|
||||
extern objset_t *spa_meta_objset(spa_t *spa);
|
||||
|
||||
/* Miscellaneous support routines */
|
||||
extern int spa_rename(const char *oldname, const char *newname);
|
||||
extern spa_t *spa_by_guid(uint64_t pool_guid, uint64_t device_guid);
|
||||
extern boolean_t spa_guid_exists(uint64_t pool_guid, uint64_t device_guid);
|
||||
extern char *spa_strdup(const char *);
|
||||
extern void spa_strfree(char *);
|
||||
extern uint64_t spa_get_random(uint64_t range);
|
||||
extern uint64_t spa_generate_guid(spa_t *spa);
|
||||
extern void sprintf_blkptr(char *buf, const blkptr_t *bp);
|
||||
extern void spa_freeze(spa_t *spa);
|
||||
extern void spa_upgrade(spa_t *spa, uint64_t version);
|
||||
extern void spa_evict_all(void);
|
||||
extern vdev_t *spa_lookup_by_guid(spa_t *spa, uint64_t guid,
|
||||
boolean_t l2cache);
|
||||
extern boolean_t spa_has_spare(spa_t *, uint64_t guid);
|
||||
extern uint64_t dva_get_dsize_sync(spa_t *spa, const dva_t *dva);
|
||||
extern uint64_t bp_get_dsize_sync(spa_t *spa, const blkptr_t *bp);
|
||||
extern uint64_t bp_get_dsize(spa_t *spa, const blkptr_t *bp);
|
||||
extern boolean_t spa_has_slogs(spa_t *spa);
|
||||
extern boolean_t spa_is_root(spa_t *spa);
|
||||
extern boolean_t spa_writeable(spa_t *spa);
|
||||
|
||||
extern int spa_mode(spa_t *spa);
|
||||
extern uint64_t strtonum(const char *str, char **nptr);
|
||||
|
||||
/* history logging */
|
||||
typedef enum history_log_type {
|
||||
LOG_CMD_POOL_CREATE,
|
||||
LOG_CMD_NORMAL,
|
||||
LOG_INTERNAL
|
||||
} history_log_type_t;
|
||||
|
||||
typedef struct history_arg {
|
||||
char *ha_history_str;
|
||||
history_log_type_t ha_log_type;
|
||||
history_internal_events_t ha_event;
|
||||
char *ha_zone;
|
||||
uid_t ha_uid;
|
||||
} history_arg_t;
|
||||
|
||||
extern char *spa_his_ievent_table[];
|
||||
|
||||
extern void spa_history_create_obj(spa_t *spa, dmu_tx_t *tx);
|
||||
extern int spa_history_get(spa_t *spa, uint64_t *offset, uint64_t *len_read,
|
||||
char *his_buf);
|
||||
extern int spa_history_log(spa_t *spa, const char *his_buf,
|
||||
history_log_type_t what);
|
||||
extern void spa_history_log_internal(history_internal_events_t event,
|
||||
spa_t *spa, dmu_tx_t *tx, const char *fmt, ...);
|
||||
extern void spa_history_log_version(spa_t *spa, history_internal_events_t evt);
|
||||
|
||||
/* error handling */
|
||||
struct zbookmark;
|
||||
extern void spa_log_error(spa_t *spa, zio_t *zio);
|
||||
extern void zfs_ereport_post(const char *class, spa_t *spa, vdev_t *vd,
|
||||
zio_t *zio, uint64_t stateoroffset, uint64_t length);
|
||||
extern void zfs_post_remove(spa_t *spa, vdev_t *vd);
|
||||
extern void zfs_post_state_change(spa_t *spa, vdev_t *vd);
|
||||
extern void zfs_post_autoreplace(spa_t *spa, vdev_t *vd);
|
||||
extern uint64_t spa_get_errlog_size(spa_t *spa);
|
||||
extern int spa_get_errlog(spa_t *spa, void *uaddr, size_t *count);
|
||||
extern void spa_errlog_rotate(spa_t *spa);
|
||||
extern void spa_errlog_drain(spa_t *spa);
|
||||
extern void spa_errlog_sync(spa_t *spa, uint64_t txg);
|
||||
extern void spa_get_errlists(spa_t *spa, avl_tree_t *last, avl_tree_t *scrub);
|
||||
|
||||
/* vdev cache */
|
||||
extern void vdev_cache_stat_init(void);
|
||||
extern void vdev_cache_stat_fini(void);
|
||||
|
||||
/* Initialization and termination */
|
||||
extern void spa_init(int flags);
|
||||
extern void spa_fini(void);
|
||||
extern void spa_boot_init();
|
||||
|
||||
/* properties */
|
||||
extern int spa_prop_set(spa_t *spa, nvlist_t *nvp);
|
||||
extern int spa_prop_get(spa_t *spa, nvlist_t **nvp);
|
||||
extern void spa_prop_clear_bootfs(spa_t *spa, uint64_t obj, dmu_tx_t *tx);
|
||||
extern void spa_configfile_set(spa_t *, nvlist_t *, boolean_t);
|
||||
|
||||
/* asynchronous event notification */
|
||||
extern void spa_event_notify(spa_t *spa, vdev_t *vdev, const char *name);
|
||||
|
||||
#ifdef ZFS_DEBUG
|
||||
#define dprintf_bp(bp, fmt, ...) do { \
|
||||
if (zfs_flags & ZFS_DEBUG_DPRINTF) { \
|
||||
char *__blkbuf = kmem_alloc(BP_SPRINTF_LEN, KM_SLEEP); \
|
||||
sprintf_blkptr(__blkbuf, (bp)); \
|
||||
dprintf(fmt " %s\n", __VA_ARGS__, __blkbuf); \
|
||||
kmem_free(__blkbuf, BP_SPRINTF_LEN); \
|
||||
} \
|
||||
_NOTE(CONSTCOND) } while (0)
|
||||
#else
|
||||
#define dprintf_bp(bp, fmt, ...)
|
||||
#endif
|
||||
|
||||
extern int spa_mode_global; /* mode, e.g. FREAD | FWRITE */
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_SPA_H */
|
42
uts/common/fs/zfs/sys/spa_boot.h
Normal file
42
uts/common/fs/zfs/sys/spa_boot.h
Normal file
@ -0,0 +1,42 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_SPA_BOOT_H
|
||||
#define _SYS_SPA_BOOT_H
|
||||
|
||||
#include <sys/nvpair.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
extern char *spa_get_bootprop(char *prop);
|
||||
extern void spa_free_bootprop(char *prop);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_SPA_BOOT_H */
|
235
uts/common/fs/zfs/sys/spa_impl.h
Normal file
235
uts/common/fs/zfs/sys/spa_impl.h
Normal file
@ -0,0 +1,235 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_SPA_IMPL_H
|
||||
#define _SYS_SPA_IMPL_H
|
||||
|
||||
#include <sys/spa.h>
|
||||
#include <sys/vdev.h>
|
||||
#include <sys/metaslab.h>
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/dsl_pool.h>
|
||||
#include <sys/uberblock_impl.h>
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/avl.h>
|
||||
#include <sys/refcount.h>
|
||||
#include <sys/bplist.h>
|
||||
#include <sys/bpobj.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
typedef struct spa_error_entry {
|
||||
zbookmark_t se_bookmark;
|
||||
char *se_name;
|
||||
avl_node_t se_avl;
|
||||
} spa_error_entry_t;
|
||||
|
||||
typedef struct spa_history_phys {
|
||||
uint64_t sh_pool_create_len; /* ending offset of zpool create */
|
||||
uint64_t sh_phys_max_off; /* physical EOF */
|
||||
uint64_t sh_bof; /* logical BOF */
|
||||
uint64_t sh_eof; /* logical EOF */
|
||||
uint64_t sh_records_lost; /* num of records overwritten */
|
||||
} spa_history_phys_t;
|
||||
|
||||
struct spa_aux_vdev {
|
||||
uint64_t sav_object; /* MOS object for device list */
|
||||
nvlist_t *sav_config; /* cached device config */
|
||||
vdev_t **sav_vdevs; /* devices */
|
||||
int sav_count; /* number devices */
|
||||
boolean_t sav_sync; /* sync the device list */
|
||||
nvlist_t **sav_pending; /* pending device additions */
|
||||
uint_t sav_npending; /* # pending devices */
|
||||
};
|
||||
|
||||
typedef struct spa_config_lock {
|
||||
kmutex_t scl_lock;
|
||||
kthread_t *scl_writer;
|
||||
int scl_write_wanted;
|
||||
kcondvar_t scl_cv;
|
||||
refcount_t scl_count;
|
||||
} spa_config_lock_t;
|
||||
|
||||
typedef struct spa_config_dirent {
|
||||
list_node_t scd_link;
|
||||
char *scd_path;
|
||||
} spa_config_dirent_t;
|
||||
|
||||
enum zio_taskq_type {
|
||||
ZIO_TASKQ_ISSUE = 0,
|
||||
ZIO_TASKQ_ISSUE_HIGH,
|
||||
ZIO_TASKQ_INTERRUPT,
|
||||
ZIO_TASKQ_INTERRUPT_HIGH,
|
||||
ZIO_TASKQ_TYPES
|
||||
};
|
||||
|
||||
/*
|
||||
* State machine for the zpool-pooname process. The states transitions
|
||||
* are done as follows:
|
||||
*
|
||||
* From To Routine
|
||||
* PROC_NONE -> PROC_CREATED spa_activate()
|
||||
* PROC_CREATED -> PROC_ACTIVE spa_thread()
|
||||
* PROC_ACTIVE -> PROC_DEACTIVATE spa_deactivate()
|
||||
* PROC_DEACTIVATE -> PROC_GONE spa_thread()
|
||||
* PROC_GONE -> PROC_NONE spa_deactivate()
|
||||
*/
|
||||
typedef enum spa_proc_state {
|
||||
SPA_PROC_NONE, /* spa_proc = &p0, no process created */
|
||||
SPA_PROC_CREATED, /* spa_activate() has proc, is waiting */
|
||||
SPA_PROC_ACTIVE, /* taskqs created, spa_proc set */
|
||||
SPA_PROC_DEACTIVATE, /* spa_deactivate() requests process exit */
|
||||
SPA_PROC_GONE /* spa_thread() is exiting, spa_proc = &p0 */
|
||||
} spa_proc_state_t;
|
||||
|
||||
struct spa {
|
||||
/*
|
||||
* Fields protected by spa_namespace_lock.
|
||||
*/
|
||||
char spa_name[MAXNAMELEN]; /* pool name */
|
||||
avl_node_t spa_avl; /* node in spa_namespace_avl */
|
||||
nvlist_t *spa_config; /* last synced config */
|
||||
nvlist_t *spa_config_syncing; /* currently syncing config */
|
||||
nvlist_t *spa_config_splitting; /* config for splitting */
|
||||
nvlist_t *spa_load_info; /* info and errors from load */
|
||||
uint64_t spa_config_txg; /* txg of last config change */
|
||||
int spa_sync_pass; /* iterate-to-convergence */
|
||||
pool_state_t spa_state; /* pool state */
|
||||
int spa_inject_ref; /* injection references */
|
||||
uint8_t spa_sync_on; /* sync threads are running */
|
||||
spa_load_state_t spa_load_state; /* current load operation */
|
||||
uint64_t spa_import_flags; /* import specific flags */
|
||||
taskq_t *spa_zio_taskq[ZIO_TYPES][ZIO_TASKQ_TYPES];
|
||||
dsl_pool_t *spa_dsl_pool;
|
||||
metaslab_class_t *spa_normal_class; /* normal data class */
|
||||
metaslab_class_t *spa_log_class; /* intent log data class */
|
||||
uint64_t spa_first_txg; /* first txg after spa_open() */
|
||||
uint64_t spa_final_txg; /* txg of export/destroy */
|
||||
uint64_t spa_freeze_txg; /* freeze pool at this txg */
|
||||
uint64_t spa_load_max_txg; /* best initial ub_txg */
|
||||
uint64_t spa_claim_max_txg; /* highest claimed birth txg */
|
||||
timespec_t spa_loaded_ts; /* 1st successful open time */
|
||||
objset_t *spa_meta_objset; /* copy of dp->dp_meta_objset */
|
||||
txg_list_t spa_vdev_txg_list; /* per-txg dirty vdev list */
|
||||
vdev_t *spa_root_vdev; /* top-level vdev container */
|
||||
uint64_t spa_load_guid; /* initial guid for spa_load */
|
||||
list_t spa_config_dirty_list; /* vdevs with dirty config */
|
||||
list_t spa_state_dirty_list; /* vdevs with dirty state */
|
||||
spa_aux_vdev_t spa_spares; /* hot spares */
|
||||
spa_aux_vdev_t spa_l2cache; /* L2ARC cache devices */
|
||||
uint64_t spa_config_object; /* MOS object for pool config */
|
||||
uint64_t spa_config_generation; /* config generation number */
|
||||
uint64_t spa_syncing_txg; /* txg currently syncing */
|
||||
bpobj_t spa_deferred_bpobj; /* deferred-free bplist */
|
||||
bplist_t spa_free_bplist[TXG_SIZE]; /* bplist of stuff to free */
|
||||
uberblock_t spa_ubsync; /* last synced uberblock */
|
||||
uberblock_t spa_uberblock; /* current uberblock */
|
||||
boolean_t spa_extreme_rewind; /* rewind past deferred frees */
|
||||
uint64_t spa_last_io; /* lbolt of last non-scan I/O */
|
||||
kmutex_t spa_scrub_lock; /* resilver/scrub lock */
|
||||
uint64_t spa_scrub_inflight; /* in-flight scrub I/Os */
|
||||
kcondvar_t spa_scrub_io_cv; /* scrub I/O completion */
|
||||
uint8_t spa_scrub_active; /* active or suspended? */
|
||||
uint8_t spa_scrub_type; /* type of scrub we're doing */
|
||||
uint8_t spa_scrub_finished; /* indicator to rotate logs */
|
||||
uint8_t spa_scrub_started; /* started since last boot */
|
||||
uint8_t spa_scrub_reopen; /* scrub doing vdev_reopen */
|
||||
uint64_t spa_scan_pass_start; /* start time per pass/reboot */
|
||||
uint64_t spa_scan_pass_exam; /* examined bytes per pass */
|
||||
kmutex_t spa_async_lock; /* protect async state */
|
||||
kthread_t *spa_async_thread; /* thread doing async task */
|
||||
int spa_async_suspended; /* async tasks suspended */
|
||||
kcondvar_t spa_async_cv; /* wait for thread_exit() */
|
||||
uint16_t spa_async_tasks; /* async task mask */
|
||||
char *spa_root; /* alternate root directory */
|
||||
uint64_t spa_ena; /* spa-wide ereport ENA */
|
||||
int spa_last_open_failed; /* error if last open failed */
|
||||
uint64_t spa_last_ubsync_txg; /* "best" uberblock txg */
|
||||
uint64_t spa_last_ubsync_txg_ts; /* timestamp from that ub */
|
||||
uint64_t spa_load_txg; /* ub txg that loaded */
|
||||
uint64_t spa_load_txg_ts; /* timestamp from that ub */
|
||||
uint64_t spa_load_meta_errors; /* verify metadata err count */
|
||||
uint64_t spa_load_data_errors; /* verify data err count */
|
||||
uint64_t spa_verify_min_txg; /* start txg of verify scrub */
|
||||
kmutex_t spa_errlog_lock; /* error log lock */
|
||||
uint64_t spa_errlog_last; /* last error log object */
|
||||
uint64_t spa_errlog_scrub; /* scrub error log object */
|
||||
kmutex_t spa_errlist_lock; /* error list/ereport lock */
|
||||
avl_tree_t spa_errlist_last; /* last error list */
|
||||
avl_tree_t spa_errlist_scrub; /* scrub error list */
|
||||
uint64_t spa_deflate; /* should we deflate? */
|
||||
uint64_t spa_history; /* history object */
|
||||
kmutex_t spa_history_lock; /* history lock */
|
||||
vdev_t *spa_pending_vdev; /* pending vdev additions */
|
||||
kmutex_t spa_props_lock; /* property lock */
|
||||
uint64_t spa_pool_props_object; /* object for properties */
|
||||
uint64_t spa_bootfs; /* default boot filesystem */
|
||||
uint64_t spa_failmode; /* failure mode for the pool */
|
||||
uint64_t spa_delegation; /* delegation on/off */
|
||||
list_t spa_config_list; /* previous cache file(s) */
|
||||
zio_t *spa_async_zio_root; /* root of all async I/O */
|
||||
zio_t *spa_suspend_zio_root; /* root of all suspended I/O */
|
||||
kmutex_t spa_suspend_lock; /* protects suspend_zio_root */
|
||||
kcondvar_t spa_suspend_cv; /* notification of resume */
|
||||
uint8_t spa_suspended; /* pool is suspended */
|
||||
uint8_t spa_claiming; /* pool is doing zil_claim() */
|
||||
boolean_t spa_is_root; /* pool is root */
|
||||
int spa_minref; /* num refs when first opened */
|
||||
int spa_mode; /* FREAD | FWRITE */
|
||||
spa_log_state_t spa_log_state; /* log state */
|
||||
uint64_t spa_autoexpand; /* lun expansion on/off */
|
||||
ddt_t *spa_ddt[ZIO_CHECKSUM_FUNCTIONS]; /* in-core DDTs */
|
||||
uint64_t spa_ddt_stat_object; /* DDT statistics */
|
||||
uint64_t spa_dedup_ditto; /* dedup ditto threshold */
|
||||
uint64_t spa_dedup_checksum; /* default dedup checksum */
|
||||
uint64_t spa_dspace; /* dspace in normal class */
|
||||
kmutex_t spa_vdev_top_lock; /* dueling offline/remove */
|
||||
kmutex_t spa_proc_lock; /* protects spa_proc* */
|
||||
kcondvar_t spa_proc_cv; /* spa_proc_state transitions */
|
||||
spa_proc_state_t spa_proc_state; /* see definition */
|
||||
struct proc *spa_proc; /* "zpool-poolname" process */
|
||||
uint64_t spa_did; /* if procp != p0, did of t1 */
|
||||
boolean_t spa_autoreplace; /* autoreplace set in open */
|
||||
int spa_vdev_locks; /* locks grabbed */
|
||||
uint64_t spa_creation_version; /* version at pool creation */
|
||||
uint64_t spa_prev_software_version;
|
||||
/*
|
||||
* spa_refcnt & spa_config_lock must be the last elements
|
||||
* because refcount_t changes size based on compilation options.
|
||||
* In order for the MDB module to function correctly, the other
|
||||
* fields must remain in the same location.
|
||||
*/
|
||||
spa_config_lock_t spa_config_lock[SCL_LOCKS]; /* config changes */
|
||||
refcount_t spa_refcount; /* number of opens */
|
||||
};
|
||||
|
||||
extern const char *spa_config_path;
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_SPA_IMPL_H */
|
179
uts/common/fs/zfs/sys/space_map.h
Normal file
179
uts/common/fs/zfs/sys/space_map.h
Normal file
@ -0,0 +1,179 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_SPACE_MAP_H
|
||||
#define _SYS_SPACE_MAP_H
|
||||
|
||||
#include <sys/avl.h>
|
||||
#include <sys/dmu.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
typedef struct space_map_ops space_map_ops_t;
|
||||
|
||||
typedef struct space_map {
|
||||
avl_tree_t sm_root; /* AVL tree of map segments */
|
||||
uint64_t sm_space; /* sum of all segments in the map */
|
||||
uint64_t sm_start; /* start of map */
|
||||
uint64_t sm_size; /* size of map */
|
||||
uint8_t sm_shift; /* unit shift */
|
||||
uint8_t sm_pad[3]; /* unused */
|
||||
uint8_t sm_loaded; /* map loaded? */
|
||||
uint8_t sm_loading; /* map loading? */
|
||||
kcondvar_t sm_load_cv; /* map load completion */
|
||||
space_map_ops_t *sm_ops; /* space map block picker ops vector */
|
||||
avl_tree_t *sm_pp_root; /* picker-private AVL tree */
|
||||
void *sm_ppd; /* picker-private data */
|
||||
kmutex_t *sm_lock; /* pointer to lock that protects map */
|
||||
} space_map_t;
|
||||
|
||||
typedef struct space_seg {
|
||||
avl_node_t ss_node; /* AVL node */
|
||||
avl_node_t ss_pp_node; /* AVL picker-private node */
|
||||
uint64_t ss_start; /* starting offset of this segment */
|
||||
uint64_t ss_end; /* ending offset (non-inclusive) */
|
||||
} space_seg_t;
|
||||
|
||||
typedef struct space_ref {
|
||||
avl_node_t sr_node; /* AVL node */
|
||||
uint64_t sr_offset; /* offset (start or end) */
|
||||
int64_t sr_refcnt; /* associated reference count */
|
||||
} space_ref_t;
|
||||
|
||||
typedef struct space_map_obj {
|
||||
uint64_t smo_object; /* on-disk space map object */
|
||||
uint64_t smo_objsize; /* size of the object */
|
||||
uint64_t smo_alloc; /* space allocated from the map */
|
||||
} space_map_obj_t;
|
||||
|
||||
struct space_map_ops {
|
||||
void (*smop_load)(space_map_t *sm);
|
||||
void (*smop_unload)(space_map_t *sm);
|
||||
uint64_t (*smop_alloc)(space_map_t *sm, uint64_t size);
|
||||
void (*smop_claim)(space_map_t *sm, uint64_t start, uint64_t size);
|
||||
void (*smop_free)(space_map_t *sm, uint64_t start, uint64_t size);
|
||||
uint64_t (*smop_max)(space_map_t *sm);
|
||||
boolean_t (*smop_fragmented)(space_map_t *sm);
|
||||
};
|
||||
|
||||
/*
|
||||
* debug entry
|
||||
*
|
||||
* 1 3 10 50
|
||||
* ,---+--------+------------+---------------------------------.
|
||||
* | 1 | action | syncpass | txg (lower bits) |
|
||||
* `---+--------+------------+---------------------------------'
|
||||
* 63 62 60 59 50 49 0
|
||||
*
|
||||
*
|
||||
*
|
||||
* non-debug entry
|
||||
*
|
||||
* 1 47 1 15
|
||||
* ,-----------------------------------------------------------.
|
||||
* | 0 | offset (sm_shift units) | type | run |
|
||||
* `-----------------------------------------------------------'
|
||||
* 63 62 17 16 15 0
|
||||
*/
|
||||
|
||||
/* All this stuff takes and returns bytes */
|
||||
#define SM_RUN_DECODE(x) (BF64_DECODE(x, 0, 15) + 1)
|
||||
#define SM_RUN_ENCODE(x) BF64_ENCODE((x) - 1, 0, 15)
|
||||
#define SM_TYPE_DECODE(x) BF64_DECODE(x, 15, 1)
|
||||
#define SM_TYPE_ENCODE(x) BF64_ENCODE(x, 15, 1)
|
||||
#define SM_OFFSET_DECODE(x) BF64_DECODE(x, 16, 47)
|
||||
#define SM_OFFSET_ENCODE(x) BF64_ENCODE(x, 16, 47)
|
||||
#define SM_DEBUG_DECODE(x) BF64_DECODE(x, 63, 1)
|
||||
#define SM_DEBUG_ENCODE(x) BF64_ENCODE(x, 63, 1)
|
||||
|
||||
#define SM_DEBUG_ACTION_DECODE(x) BF64_DECODE(x, 60, 3)
|
||||
#define SM_DEBUG_ACTION_ENCODE(x) BF64_ENCODE(x, 60, 3)
|
||||
|
||||
#define SM_DEBUG_SYNCPASS_DECODE(x) BF64_DECODE(x, 50, 10)
|
||||
#define SM_DEBUG_SYNCPASS_ENCODE(x) BF64_ENCODE(x, 50, 10)
|
||||
|
||||
#define SM_DEBUG_TXG_DECODE(x) BF64_DECODE(x, 0, 50)
|
||||
#define SM_DEBUG_TXG_ENCODE(x) BF64_ENCODE(x, 0, 50)
|
||||
|
||||
#define SM_RUN_MAX SM_RUN_DECODE(~0ULL)
|
||||
|
||||
#define SM_ALLOC 0x0
|
||||
#define SM_FREE 0x1
|
||||
|
||||
/*
|
||||
* The data for a given space map can be kept on blocks of any size.
|
||||
* Larger blocks entail fewer i/o operations, but they also cause the
|
||||
* DMU to keep more data in-core, and also to waste more i/o bandwidth
|
||||
* when only a few blocks have changed since the last transaction group.
|
||||
* This could use a lot more research, but for now, set the freelist
|
||||
* block size to 4k (2^12).
|
||||
*/
|
||||
#define SPACE_MAP_BLOCKSHIFT 12
|
||||
|
||||
typedef void space_map_func_t(space_map_t *sm, uint64_t start, uint64_t size);
|
||||
|
||||
extern void space_map_create(space_map_t *sm, uint64_t start, uint64_t size,
|
||||
uint8_t shift, kmutex_t *lp);
|
||||
extern void space_map_destroy(space_map_t *sm);
|
||||
extern void space_map_add(space_map_t *sm, uint64_t start, uint64_t size);
|
||||
extern void space_map_remove(space_map_t *sm, uint64_t start, uint64_t size);
|
||||
extern boolean_t space_map_contains(space_map_t *sm,
|
||||
uint64_t start, uint64_t size);
|
||||
extern void space_map_vacate(space_map_t *sm,
|
||||
space_map_func_t *func, space_map_t *mdest);
|
||||
extern void space_map_walk(space_map_t *sm,
|
||||
space_map_func_t *func, space_map_t *mdest);
|
||||
|
||||
extern void space_map_load_wait(space_map_t *sm);
|
||||
extern int space_map_load(space_map_t *sm, space_map_ops_t *ops,
|
||||
uint8_t maptype, space_map_obj_t *smo, objset_t *os);
|
||||
extern void space_map_unload(space_map_t *sm);
|
||||
|
||||
extern uint64_t space_map_alloc(space_map_t *sm, uint64_t size);
|
||||
extern void space_map_claim(space_map_t *sm, uint64_t start, uint64_t size);
|
||||
extern void space_map_free(space_map_t *sm, uint64_t start, uint64_t size);
|
||||
extern uint64_t space_map_maxsize(space_map_t *sm);
|
||||
|
||||
extern void space_map_sync(space_map_t *sm, uint8_t maptype,
|
||||
space_map_obj_t *smo, objset_t *os, dmu_tx_t *tx);
|
||||
extern void space_map_truncate(space_map_obj_t *smo,
|
||||
objset_t *os, dmu_tx_t *tx);
|
||||
|
||||
extern void space_map_ref_create(avl_tree_t *t);
|
||||
extern void space_map_ref_destroy(avl_tree_t *t);
|
||||
extern void space_map_ref_add_seg(avl_tree_t *t,
|
||||
uint64_t start, uint64_t end, int64_t refcnt);
|
||||
extern void space_map_ref_add_map(avl_tree_t *t,
|
||||
space_map_t *sm, int64_t refcnt);
|
||||
extern void space_map_ref_generate_map(avl_tree_t *t,
|
||||
space_map_t *sm, int64_t minref);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_SPACE_MAP_H */
|
131
uts/common/fs/zfs/sys/txg.h
Normal file
131
uts/common/fs/zfs/sys/txg.h
Normal file
@ -0,0 +1,131 @@
|
||||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2010 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_TXG_H
|
||||
#define _SYS_TXG_H
|
||||
|
||||
#include <sys/spa.h>
|
||||
#include <sys/zfs_context.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
#define TXG_CONCURRENT_STATES 3 /* open, quiescing, syncing */
|
||||
#define TXG_SIZE 4 /* next power of 2 */
|
||||
#define TXG_MASK (TXG_SIZE - 1) /* mask for size */
|
||||
#define TXG_INITIAL TXG_SIZE /* initial txg */
|
||||
#define TXG_IDX (txg & TXG_MASK)
|
||||
|
||||
/* Number of txgs worth of frees we defer adding to in-core spacemaps */
|
||||
#define TXG_DEFER_SIZE 2
|
||||
|
||||
#define TXG_WAIT 1ULL
|
||||
#define TXG_NOWAIT 2ULL
|
||||
|
||||
typedef struct tx_cpu tx_cpu_t;
|
||||
|
||||
typedef struct txg_handle {
|
||||
tx_cpu_t *th_cpu;
|
||||
uint64_t th_txg;
|
||||
} txg_handle_t;
|
||||
|
||||
typedef struct txg_node {
|
||||
struct txg_node *tn_next[TXG_SIZE];
|
||||
uint8_t tn_member[TXG_SIZE];
|
||||
} txg_node_t;
|
||||
|
||||
typedef struct txg_list {
|
||||
kmutex_t tl_lock;
|
||||
size_t tl_offset;
|
||||
txg_node_t *tl_head[TXG_SIZE];
|
||||
} txg_list_t;
|
||||
|
||||
struct dsl_pool;
|
||||
|
||||
extern void txg_init(struct dsl_pool *dp, uint64_t txg);
|
||||
extern void txg_fini(struct dsl_pool *dp);
|
||||
extern void txg_sync_start(struct dsl_pool *dp);
|
||||
extern void txg_sync_stop(struct dsl_pool *dp);
|
||||
extern uint64_t txg_hold_open(struct dsl_pool *dp, txg_handle_t *txghp);
|
||||
extern void txg_rele_to_quiesce(txg_handle_t *txghp);
|
||||
extern void txg_rele_to_sync(txg_handle_t *txghp);
|
||||
extern void txg_register_callbacks(txg_handle_t *txghp, list_t *tx_callbacks);
|
||||
|
||||
/*
|
||||
* Delay the caller by the specified number of ticks or until
|
||||
* the txg closes (whichever comes first). This is intended
|
||||
* to be used to throttle writers when the system nears its
|
||||
* capacity.
|
||||
*/
|
||||
extern void txg_delay(struct dsl_pool *dp, uint64_t txg, int ticks);
|
||||
|
||||
/*
|
||||
* Wait until the given transaction group has finished syncing.
|
||||
* Try to make this happen as soon as possible (eg. kick off any
|
||||
* necessary syncs immediately). If txg==0, wait for the currently open
|
||||
* txg to finish syncing.
|
||||
*/
|
||||
extern void txg_wait_synced(struct dsl_pool *dp, uint64_t txg);
|
||||
|
||||
/*
|
||||
* Wait until the given transaction group, or one after it, is
|
||||
* the open transaction group. Try to make this happen as soon
|
||||
* as possible (eg. kick off any necessary syncs immediately).
|
||||
* If txg == 0, wait for the next open txg.
|
||||
*/
|
||||
extern void txg_wait_open(struct dsl_pool *dp, uint64_t txg);
|
||||
|
||||
/*
|
||||
* Returns TRUE if we are "backed up" waiting for the syncing
|
||||
* transaction to complete; otherwise returns FALSE.
|
||||
*/
|
||||
extern boolean_t txg_stalled(struct dsl_pool *dp);
|
||||
|
||||
/* returns TRUE if someone is waiting for the next txg to sync */
|
||||
extern boolean_t txg_sync_waiting(struct dsl_pool *dp);
|
||||
|
||||
/*
|
||||
* Per-txg object lists.
|
||||
*/
|
||||
|
||||
#define TXG_CLEAN(txg) ((txg) - 1)
|
||||
|
||||
extern void txg_list_create(txg_list_t *tl, size_t offset);
|
||||
extern void txg_list_destroy(txg_list_t *tl);
|
||||
extern int txg_list_empty(txg_list_t *tl, uint64_t txg);
|
||||
extern int txg_list_add(txg_list_t *tl, void *p, uint64_t txg);
|
||||
extern int txg_list_add_tail(txg_list_t *tl, void *p, uint64_t txg);
|
||||
extern void *txg_list_remove(txg_list_t *tl, uint64_t txg);
|
||||
extern void *txg_list_remove_this(txg_list_t *tl, void *p, uint64_t txg);
|
||||
extern int txg_list_member(txg_list_t *tl, void *p, uint64_t txg);
|
||||
extern void *txg_list_head(txg_list_t *tl, uint64_t txg);
|
||||
extern void *txg_list_next(txg_list_t *tl, void *p, uint64_t txg);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_TXG_H */
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
x
Reference in New Issue
Block a user