First cut at a geom(4) manpage.
The mdoc markup and all spelling errors in this file are all legal game for anyone with more doc-clue than me.
This commit is contained in:
parent
dff418f166
commit
1fb6046185
@ -43,6 +43,7 @@ MAN= aac.4 \
|
||||
fdc.4 \
|
||||
fpa.4 \
|
||||
fxp.4 \
|
||||
geom.4 \
|
||||
gif.4 \
|
||||
gusc.4 \
|
||||
gx.4 \
|
||||
|
311
share/man/man4/geom.4
Normal file
311
share/man/man4/geom.4
Normal file
@ -0,0 +1,311 @@
|
||||
.\"
|
||||
.\" Copyright (c) 2002 Poul-Henning Kamp
|
||||
.\" Copyright (c) 2002 Networks Associates Technology, Inc.
|
||||
.\" All rights reserved.
|
||||
.\"
|
||||
.\" This software was developed for the FreeBSD Project by Poul-Henning Kamp
|
||||
.\" and NAI Labs, the Security Research Division of Network Associates, Inc.
|
||||
.\" under DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the
|
||||
.\" DARPA CHATS research program.
|
||||
.\"
|
||||
.\" Redistribution and use in source and binary forms, with or without
|
||||
.\" modification, are permitted provided that the following conditions
|
||||
.\" are met:
|
||||
.\" 1. Redistributions of source code must retain the above copyright
|
||||
.\" notice, this list of conditions and the following disclaimer.
|
||||
.\" 2. Redistributions in binary form must reproduce the above copyright
|
||||
.\" notice, this list of conditions and the following disclaimer in the
|
||||
.\" documentation and/or other materials provided with the distribution.
|
||||
.\" 3. The names of the authors may not be used to endorse or promote
|
||||
.\" products derived from this software without specific prior written
|
||||
.\" permission.
|
||||
.\"
|
||||
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
||||
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
||||
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
||||
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
||||
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
||||
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
||||
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
.\" SUCH DAMAGE.
|
||||
.\"
|
||||
.\" $FreeBSD$
|
||||
.\"
|
||||
.Dd March 27, 2002
|
||||
.Os FreeBSD 5.0
|
||||
.Dt GEOM 4
|
||||
.Sh NAME
|
||||
.Nm GEOM
|
||||
.Nd modular disk I/O request transformation framework.
|
||||
.Sh DESCRIPTION
|
||||
The GEOM framework provides an infrastructure in which modules
|
||||
can perform transformations on disk I/O requests on their path from
|
||||
the upper kernel to the device drivers and back.
|
||||
.Pp
|
||||
Transformations in a GEOM context ranges from the simple geometric
|
||||
displacement performed in typical disklabel modules over RAID
|
||||
algorithms and device multipath resolution to full blown cryptographic
|
||||
protection of the stored data.
|
||||
.Pp
|
||||
Compared to traditional "volume management", GEOM differs from most
|
||||
and in some cases all previous implementations in the following ways:
|
||||
.Bl -bullet
|
||||
.It
|
||||
GEOM is extensible. It is trivially simple to write a new class
|
||||
of transformation and it will not be given stepchild treatment. If
|
||||
someone for some reason wanted to mount IBM MVS diskpacks, a class
|
||||
recognizing and configuring their VTOC information would be a trivial
|
||||
matter.
|
||||
.It
|
||||
GEOM is topologically agnostic. Most volume management implementations
|
||||
have very strict notions of how classes can fit together, very often
|
||||
one fixed hierarchy is provided for instance subdisk - plex -
|
||||
volume.
|
||||
.El
|
||||
.Pp
|
||||
Being extensible means that new transformations are treated no differently
|
||||
than existing transformations.
|
||||
.Pp
|
||||
Fixed hierarchies are bad because they make it impossible to express
|
||||
the intent efficiently.
|
||||
In the fixed hierarchy above it is not possible to mirror two
|
||||
physical disks and then parition the mirror into subdisks, instead
|
||||
one is forced to make subdisks on the physical volumes and to mirror
|
||||
these two and two resulting in a much more complex configuration.
|
||||
GEOM on the other hand does not care in which order things are done,
|
||||
the only restriction is that cycles in the graph will not be allowed.
|
||||
.Pp
|
||||
.Sh "TERMINOLOGY and TOPOLOGY"
|
||||
Geom is quite object oriented and consequently the terminology
|
||||
borrows a lot of context and sematics from the OO vocabulary:
|
||||
.Pp
|
||||
A "class", represented by the data structure g_class implements one
|
||||
particular kind of transformation. Typical examples are MBR disk
|
||||
partition, BSD disklabel or RAID5 classes.
|
||||
.Pp
|
||||
An instance of a class is called a "geom" and represented by the
|
||||
data structure "g_geom". An in typical i386 FreeBSD system, there
|
||||
will be one geom of class MBR for each disk.
|
||||
.Pp
|
||||
A "provider", represented by the data structure "g_provider", is
|
||||
the front gate at which a geom offers service.
|
||||
A provider is "a disk-like thing which appear in /dev" - a logical
|
||||
disk in other words.
|
||||
All providers have three main properties: name, sectorsize and size. .
|
||||
.Pp
|
||||
A "consumer" is the backdoor through which a geom connects to another
|
||||
geoms provider and through which I/O requests are sent.
|
||||
.Pp
|
||||
The topological relationship between these entities are as follows:
|
||||
.Bl -bullet
|
||||
.It
|
||||
A class has zero or more geom instances.
|
||||
.It
|
||||
A geom has exactly one class it is derived from.
|
||||
.It
|
||||
A geom has zero or more consumers.
|
||||
.It
|
||||
A geom has zero or more provicers.
|
||||
.It
|
||||
A consumer can be attached to zero or one providers.
|
||||
.It
|
||||
A provider can have zero or more consumers attached.
|
||||
.El
|
||||
.Pp
|
||||
All geoms have a rank-number assigned which is used to detect and
|
||||
prevent loops in the acyclic directed graph, this rank number is
|
||||
assigned as follows:
|
||||
.Bl -enum
|
||||
.It
|
||||
A geom with no attached consumers has rank=1
|
||||
.It
|
||||
A geom with attached consumers has a rank one higher then the
|
||||
highest rank of the geoms of the providers its consumers are
|
||||
attached to.
|
||||
.El
|
||||
.Sh "SPECIAL TOPOLOGICAL MANEUVRES"
|
||||
In addition to the straightforward attach which attaches a consumer
|
||||
to a provider and dettach which breaks the bond, a number of special
|
||||
toplogical maneuvres exists to facilitate configuration and to
|
||||
improve the overall flexibility.
|
||||
.Pp
|
||||
.Em TASTING
|
||||
is a process which happens whenever a new class or new provider
|
||||
is created and it is the class' chance to automatically configure an
|
||||
instance on providers which it recognize as its own.
|
||||
A typical example is the MBR disk-parition class which will look for
|
||||
the MBR table in the first sector and if found and validated it will
|
||||
instantiate a geom to multiplex according to the contents of the MBR.
|
||||
.Pp
|
||||
A new class will be offered all existing providers in turn and a new
|
||||
provider will be offered to all classes in turn.
|
||||
.Pp
|
||||
Exactly what a class does to recognize if it should accept the offered
|
||||
provider is not defined by GEOM, but the sensible set of options are:
|
||||
.Bl -bullet
|
||||
.It
|
||||
Examine specific data structures on the disk.
|
||||
.It
|
||||
Examine properties like sectorsize or mediasize for the provider.
|
||||
.It
|
||||
Examine the rank number of the providers geom.
|
||||
.It
|
||||
Examine the method name of the providers geom.
|
||||
.El
|
||||
.Pp
|
||||
.Em ORPHANIZATION
|
||||
is the process by which a provider is removed while
|
||||
it potentially still being in used.
|
||||
.Pp
|
||||
When a geom makes a provider as orphan all future I/O requests will
|
||||
"bounce" on the provider with an error code set by the geom. Any
|
||||
consumers attached to the provider will receive notification about
|
||||
the orphanization and need to take appropriate action.
|
||||
.Pp
|
||||
A geom which came into being as result of a normal taste operation
|
||||
should selfdestruct unless it has an way to keep functioning. Geoms
|
||||
like disklabels and stripes should therefore selfdestruct whereas
|
||||
RAID5 or mirror geoms can continue to function as ong as they do
|
||||
not loose quorum.
|
||||
.Pp
|
||||
When a provider is orphaned, this does not result in any immediate
|
||||
change in the topology, any attached consumers are still attached,
|
||||
any opened paths are still open, it is the responsibility of the
|
||||
geoms above to close and dettach as soon as this can happen.
|
||||
.Pp
|
||||
The typical scenario is that a device driver notices a disk has
|
||||
gone and orphans the provider for it.
|
||||
The geoms on top receive the orphanization event and orphan all
|
||||
their providers in turn.
|
||||
Providers which are not attached to are destroyed right away.
|
||||
Eventually at the toplevel the geom which interfaces
|
||||
to the DEVFS received an orphan event on its consumer and it
|
||||
calls destroy_dev(9) and does an explicit close if the
|
||||
device was open and then dettaches its consumer.
|
||||
The provider below is now no longer attached to and can be
|
||||
destroyed, if the geom has no more providers it can dettach
|
||||
its consumer and selfdestruct and so the carnage passes back
|
||||
down the tree, until the original provider is dettached from
|
||||
and it can be destroyed by the geom serving the device driver.
|
||||
.Pp
|
||||
While this approach seens byzantine it does provide the maximum
|
||||
flexibility in handling disapparing devices.
|
||||
.Pp
|
||||
.Em SPOILING
|
||||
is a special case of orphanization used to protect
|
||||
against stale metadata.
|
||||
It is probably easiest to understand spoiling by going through
|
||||
an example.
|
||||
.Pp
|
||||
Imagine a disk, "da0" on top of which a MBR geom provides
|
||||
"da0s1" and "da0s2" and on top of "da0s1" a BSD geom provides
|
||||
"da0s1a" through "da0s1e", both the MBR and BSD geoms have
|
||||
autoconfigured based on data structures on the disk media.
|
||||
Now imagine the case where "da0" is opened for writing and those
|
||||
data structures are modified or overwritten: Now the geoms would
|
||||
be operating on stale metadata unless some notification system
|
||||
can inform them otherwise.
|
||||
To avoid this situation, when the open of "da0" for write happens,
|
||||
all attached consumers are told about this, and geoms like
|
||||
MBR and BSD will selfdestruct as a result.
|
||||
When "da0" is closed again, it will be offered for tasting again
|
||||
and if the data structures for MBR and BSD are still there, new
|
||||
geoms will instantiate themselves anew.
|
||||
.Pp
|
||||
Now for the fine print:
|
||||
.Pp
|
||||
If any of the paths through the MBR or BSD module were open, they
|
||||
would have opened downwards with an exclusive bit rendering it
|
||||
impossible to open "da0" for writing in that case and conversely
|
||||
the requested exclusive bit would render it impossible to open a
|
||||
path through the MBR geom while "da0" is open for writing.
|
||||
.Pp
|
||||
From this it also follows that changing the size of open geoms can
|
||||
only be done through their cooperation.
|
||||
.Pp
|
||||
Finally: the spoiling only happens when the write count goes from
|
||||
zero to non-zero and the retasting only when the write count goes
|
||||
back to zero.
|
||||
.Pp
|
||||
.Em INSERT/DELETE
|
||||
are a very special operation which allows a new geom
|
||||
to be instantiated between a consumer and a provider attached to
|
||||
each other and to remove it again.
|
||||
.Pp
|
||||
To understand the utility of this, imagine a provider with
|
||||
being mounted as a filesystem.
|
||||
Between the DEVFS geoms consumer and its provider we insert
|
||||
a mirror modules which configures itself with one mirror
|
||||
copy and consequently is transparent to the I/O requests
|
||||
on the path.
|
||||
We can now configure yet a mirror copy on the mirror geom,
|
||||
request a synchronization and finally drop the first mirror
|
||||
copy.
|
||||
We have now in essence moved a mounted filesystem from one
|
||||
disk to another while it was being used.
|
||||
At this point the mirror geom can be deleted from the path
|
||||
again, it has served its purpose.
|
||||
.Pp
|
||||
.Em CONFIGURE
|
||||
is the process where the administrator issues instructions
|
||||
for a particular class to instantiate itself. There are multiple
|
||||
ways to express intent in this case, a particular provider can be
|
||||
specified with a level of override forcing for instance a BSD
|
||||
disklabel module to attach to a provider which was not found palatable
|
||||
during the TASTE operation.
|
||||
.Pp
|
||||
Finally IO is the reason we even do this: it concerns itself with
|
||||
sending I/O requests through the graph.
|
||||
.Pp
|
||||
.Em "I/O REQUESTS
|
||||
represented by struct bio, originate at a consumer,
|
||||
are scheduled on its attached provider and when processed, returned
|
||||
to the consumer.
|
||||
It is important to realize that the struct bio which
|
||||
enters throuh the provider of a particular geom does not "come
|
||||
out on the other side".
|
||||
Even simple transformations like MBR and BSD will clone the
|
||||
struct bio, modify the clone and schedule the clone on their
|
||||
own consumer.
|
||||
Note that cloning the struct bio does not involve cloning the
|
||||
actual data area specified in the IO request.
|
||||
.Pp
|
||||
In total five different IO requests exist in GEOM: read, write,
|
||||
delete, format, get attribute and set attribute.
|
||||
.Pp
|
||||
Read and write are pretty self explanatory.
|
||||
.Pp
|
||||
Delete indicates that a certain range of data is no longer used
|
||||
and that it can be erased or freed as the underlying technology
|
||||
supports.
|
||||
Technologies like flash adaptation layers can arrange to erase
|
||||
the relevant blocks before they will become reassigned and
|
||||
crytographic devices may want to fill random bits into the
|
||||
range to reduce the amount of data available for attack.
|
||||
.Pp
|
||||
It is important to recognize that a delete indication is not a
|
||||
request and consequently there is no guarantee that the data actually
|
||||
will be erased or made unavailable unless guaranteed by specific
|
||||
geoms in the graph. If "secure delete" semantics are required, a
|
||||
geom should be pushed which converts delete indications into (a
|
||||
sequence of) write requests.
|
||||
.Pp
|
||||
Get attribute and set attribute supports inspection and manipulation
|
||||
of out-of-band attributes on a particular provider or path.
|
||||
Attributes are named by ascii strings and they will be discussed in
|
||||
a separate section below.
|
||||
.Pp
|
||||
(stay tuned while the author rests his brain and fingers: more to come.)
|
||||
.Sh HISTORY
|
||||
This software was developed for the FreeBSD Project by Poul-Henning Kamp
|
||||
and NAI Labs, the Security Research Division of Network Associates, Inc.
|
||||
under DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the
|
||||
DARPA CHATS research program.
|
||||
.Pp
|
||||
The first precursor for GEOM was a gruesome hack to Minix 1.2 and was
|
||||
never distributed. An earlier attempt to implement a less general scheme in FreeBSD never succeeded.
|
||||
.Sh AUTHORS
|
||||
.An "Poul-Henning Kamp" Aq phk@FreeBSD.org
|
Loading…
Reference in New Issue
Block a user