2481 lines
72 KiB
Groff
2481 lines
72 KiB
Groff
.\" Hey, Emacs, edit this file in -*- nroff-fill -*- mode
|
||
.\"-
|
||
.\" Copyright (c) 1997, 1998
|
||
.\" Nan Yang Computer Services Limited. All rights reserved.
|
||
.\"
|
||
.\" This software is distributed under the so-called ``Berkeley
|
||
.\" License'':
|
||
.\"
|
||
.\" Redistribution and use in source and binary forms, with or without
|
||
.\" modification, are permitted provided that the following conditions
|
||
.\" are met:
|
||
.\" 1. Redistributions of source code must retain the above copyright
|
||
.\" notice, this list of conditions and the following disclaimer.
|
||
.\" 2. Redistributions in binary form must reproduce the above copyright
|
||
.\" notice, this list of conditions and the following disclaimer in the
|
||
.\" documentation and/or other materials provided with the distribution.
|
||
.\" 3. All advertising materials mentioning features or use of this software
|
||
.\" must display the following acknowledgement:
|
||
.\" This product includes software developed by Nan Yang Computer
|
||
.\" Services Limited.
|
||
.\" 4. Neither the name of the Company nor the names of its contributors
|
||
.\" may be used to endorse or promote products derived from this software
|
||
.\" without specific prior written permission.
|
||
.\"
|
||
.\" This software is provided ``as is'', and any express or implied
|
||
.\" warranties, including, but not limited to, the implied warranties of
|
||
.\" merchantability and fitness for a particular purpose are disclaimed.
|
||
.\" In no event shall the company or contributors be liable for any
|
||
.\" direct, indirect, incidental, special, exemplary, or consequential
|
||
.\" damages (including, but not limited to, procurement of substitute
|
||
.\" goods or services; loss of use, data, or profits; or business
|
||
.\" interruption) however caused and on any theory of liability, whether
|
||
.\" in contract, strict liability, or tort (including negligence or
|
||
.\" otherwise) arising in any way out of the use of this software, even if
|
||
.\" advised of the possibility of such damage.
|
||
.\"
|
||
.\" $FreeBSD$
|
||
.\"
|
||
.Dd 28 March 1999
|
||
.Dt vinum 8
|
||
.Os
|
||
.Sh NAME
|
||
.Nm vinum
|
||
.Nd Logical Volume Manager control program
|
||
.Sh SYNOPSIS
|
||
.Nm
|
||
.Op command
|
||
.Op Fl options
|
||
.Sh COMMANDS
|
||
.Cd attach Ar plex Ar volume
|
||
.Op Nm rename
|
||
.Cd attach Ar subdisk Ar plex Ar [offset]
|
||
.Op Nm rename
|
||
.in +1i
|
||
Attach a plex to a volume, or a subdisk to a plex.
|
||
.in
|
||
.\" XXX remove this
|
||
.Nm concat
|
||
.Op Fl f
|
||
.Op Fl n Ar name
|
||
.Op Fl v
|
||
.Ar drives
|
||
.in +1i
|
||
Create a concatenated volume from the specified drives.
|
||
.in
|
||
.Cd create
|
||
.Op Fl f
|
||
.Ar description-file
|
||
.in +1i
|
||
Create a volume as described in
|
||
.Ar description-file
|
||
.in
|
||
.\" XXX remove this
|
||
.Cd debug
|
||
.in +1i
|
||
Cause the volume manager to enter the kernel debugger.
|
||
.in
|
||
.Cd debug
|
||
.Ar flags
|
||
.in +1i
|
||
Set debugging flags.
|
||
.in
|
||
.Cd detach
|
||
.Op Fl f
|
||
.Op Ar plex | subdisk
|
||
.in +1i
|
||
Detach a plex or subdisk from the volume or plex to which it is attached.
|
||
.in
|
||
.Cd info
|
||
.Op Fl v
|
||
.in +1i
|
||
List information about volume manager state.
|
||
.in
|
||
.Cd init
|
||
.Op Fl v
|
||
.Op Fl w
|
||
.Ar plex
|
||
.in +1i
|
||
.\" XXX
|
||
Initialize a plex by writing zeroes to all its subdisks.
|
||
.in
|
||
.Cd label
|
||
.Ar volume
|
||
.in +1i
|
||
Create a volume label
|
||
.in
|
||
.Cd list
|
||
.Op Fl r
|
||
.Op Fl s
|
||
.Op Fl v
|
||
.Op Fl V
|
||
.Op volume | plex | subdisk
|
||
.in +1i
|
||
List information about specified objects
|
||
.in
|
||
.Cd l
|
||
.Op Fl r
|
||
.Op Fl s
|
||
.Op Fl v
|
||
.Op Fl V
|
||
.Op volume | plex | subdisk
|
||
.in +1i
|
||
List information about specified objects (alternative to
|
||
.Cd list
|
||
command)
|
||
.in
|
||
.Cd ld
|
||
.Op Fl r
|
||
.Op Fl s
|
||
.Op Fl v
|
||
.Op Fl V
|
||
.Op volume
|
||
.in +1i
|
||
List information about drives
|
||
.in
|
||
.Cd ls
|
||
.Op Fl r
|
||
.Op Fl s
|
||
.Op Fl v
|
||
.Op Fl V
|
||
.Op subdisk
|
||
.in +1i
|
||
List information about subdisks
|
||
.in
|
||
.Cd lp
|
||
.Op Fl r
|
||
.Op Fl s
|
||
.Op Fl v
|
||
.Op Fl V
|
||
.Op plex
|
||
.in +1i
|
||
List information about plexes
|
||
.in
|
||
.Cd lv
|
||
.Op Fl r
|
||
.Op Fl s
|
||
.Op Fl v
|
||
.Op Fl V
|
||
.Op volume
|
||
.in +1i
|
||
List information about volumes
|
||
.in
|
||
.Cd makedev
|
||
.in +1i
|
||
Remake the device nodes in
|
||
.Ar /dev/vinum .
|
||
.in
|
||
.Nm mirror
|
||
.Op Fl f
|
||
.Op Fl n Ar name
|
||
.Op Fl s
|
||
.Op Fl v
|
||
.Ar drives
|
||
.in +1i
|
||
Create a mirrored volume from the specified drives.
|
||
.in
|
||
.Cd printconfig
|
||
.Op Pa file
|
||
.in +1i
|
||
Write a copy of the current configuration to
|
||
.Pa file .
|
||
.in
|
||
.Cd quit
|
||
.in +1i
|
||
Exit the
|
||
.Nm
|
||
program when running in interactive mode. Normally this would be done by
|
||
entering the
|
||
.Ar EOF
|
||
character.
|
||
.in
|
||
.Cd read
|
||
.Ar disk Op disk...
|
||
.in +1i
|
||
Read the
|
||
.Nm
|
||
configuration from the specified disks.
|
||
.in
|
||
.Cd rename Op Fl r
|
||
.Ar [ drive | subdisk | plex | volume ]
|
||
.Ar newname
|
||
.in +1i
|
||
Change the name of the specified object.
|
||
.ig
|
||
XXX
|
||
.in
|
||
.Cd replace
|
||
.Ar [ subdisk | plex ]
|
||
.Ar newobject
|
||
.in +1i
|
||
Replace the object with an identical other object. XXX not implemented yet.
|
||
..
|
||
.in
|
||
.Cd resetconfig
|
||
.in +1i
|
||
Reset the complete
|
||
.Nm
|
||
configuration.
|
||
.in
|
||
.Cd resetstats
|
||
.Op Fl r
|
||
.Op volume | plex | subdisk
|
||
.in +1i
|
||
Reset statistisc counters for the specified objects, or for all objects if none
|
||
are specified.
|
||
.in
|
||
.Cd rm
|
||
.Op Fl f
|
||
.Op Fl r
|
||
.Ar volume | plex | subdisk
|
||
.in +1i
|
||
Remove an object
|
||
.in
|
||
.Cd saveconfig
|
||
.in +1i
|
||
Save
|
||
.Nm
|
||
configuration to disk.
|
||
.in
|
||
.ig
|
||
XXX
|
||
.Cd set
|
||
.Op Fl f
|
||
.Ar state
|
||
.Ar volume | plex | subdisk | disk
|
||
.in +1i
|
||
Set the state of the object to \fIstate\fP\|
|
||
.in
|
||
..
|
||
.Cd setdaemon
|
||
.Op value
|
||
.in +1i
|
||
Set d<>mon configuration.
|
||
.in
|
||
.Cd setstate
|
||
.Ar state
|
||
.Op Ar volume | plex | subdisk | drive
|
||
.in +1i
|
||
Set state without influencing other objects, for diagnostic purposes only.
|
||
.in
|
||
.Cd start
|
||
.in +1i
|
||
Read configuration from all vinum drives.
|
||
.in
|
||
.Cd start
|
||
.Op Fl w
|
||
.Op volume | plex | subdisk
|
||
.in +1i
|
||
Allow the system to access the objects
|
||
.in
|
||
.Cd stop
|
||
.Op Fl f
|
||
.Op volume | plex | subdisk
|
||
.in +1i
|
||
Terminate access to the objects, or stop
|
||
.Nm
|
||
if no parameters are specified.
|
||
.in
|
||
.Nm stripe
|
||
.Op Fl f
|
||
.Op Fl n Ar name
|
||
.Op Fl v
|
||
.Ar drives
|
||
.in +1i
|
||
Create a striped volume from the specified drives.
|
||
.in
|
||
.Sh DESCRIPTION
|
||
.Nm
|
||
is a utility program to communicate with the \fBVinum\fP\| logical volume
|
||
manager. See
|
||
.Xr vinum 4
|
||
for more information about the volume manager.
|
||
.Xr vinum 8
|
||
is designed either for interactive use, when started without command line
|
||
arguments, or to execute a single command if the command is supplied on the
|
||
command line. In interactive mode,
|
||
.Nm
|
||
maintains a command line history.
|
||
.Ss OPTIONS
|
||
.Nm
|
||
commands may optionally be followed by an option. Any of the following options
|
||
may be specified with any command, but in some cases they do not make any
|
||
difference: cases, the options are ignored. For example, the
|
||
.Nm stop
|
||
command ignores the
|
||
.Fl v
|
||
and
|
||
.Fl V
|
||
options.
|
||
.Bl -hang
|
||
.It Fl f
|
||
The
|
||
.Fl f
|
||
.if t (``force'')
|
||
.if n ("force")
|
||
option overrides safety checks. Use with extreme care. This option is for
|
||
emergency use only. For example, the command
|
||
.Bd -unfilled -offset indent
|
||
rm -f myvolume
|
||
.Ed
|
||
.Pp
|
||
removes
|
||
.Ar myvolume
|
||
even if it is open. Any subsequent access to the volume will almost certainly
|
||
cause a panic.
|
||
.It Fl n Ar name
|
||
Use the
|
||
.Fl n
|
||
option to specify a volume name to the simplified configuration commands
|
||
.Nm concat ,
|
||
.Nm mirror
|
||
and
|
||
.Nm stripe .
|
||
.It Fl r
|
||
The
|
||
.Fl r
|
||
.if t (``recursive'')
|
||
.if n ("recursive")
|
||
option is used by the list commands to display information not
|
||
only about the specified objects, but also about subordinate objects. For
|
||
example, in conjnction with the
|
||
.Nm lv
|
||
command, the
|
||
.Fl r
|
||
option will also show information about the plexes and subdisks belonging to the
|
||
volume.
|
||
.It Fl s
|
||
The
|
||
.Fl s
|
||
.if t (``statistics'')
|
||
.if n ("statistics")
|
||
option is used by the list commands to display statistical information. The
|
||
.Nm mirror
|
||
command also uses this flag to specify that it should create striped plexes.
|
||
.It Fl v
|
||
The
|
||
.Fl v
|
||
.if t (``verbose'')
|
||
.if n ("verbose")
|
||
option can be used to request more detailed information.
|
||
.It Fl V
|
||
The
|
||
.Fl V
|
||
.if t (``Very verbose'')
|
||
.if n ("Very verbose")
|
||
option can be used to request more detailed information than the
|
||
.Fl v
|
||
option provides.
|
||
.It Fl w
|
||
The
|
||
.Fl w
|
||
.if t (``wait'')
|
||
.if n ("wait")
|
||
option tells
|
||
.Nm
|
||
to wait for completion of commands which normally run in the background, such as
|
||
.Nm init .
|
||
.El
|
||
.Pp
|
||
.Ss COMMANDS IN DETAIL
|
||
.Pp
|
||
.Nm
|
||
commands perform the following functions:
|
||
.Bl -hang
|
||
.It Nm attach Ar plex Ar volume
|
||
.Op Nm rename
|
||
.if n .sp -1v
|
||
.if t .sp -.6v
|
||
.It Nm attach Ar subdisk Ar plex Ar [offset]
|
||
.Op Nm rename
|
||
.sp
|
||
.Nm
|
||
.Ar attach
|
||
inserts the specified plex or subdisk in a volume or plex. In the case of a
|
||
subdisk, an offset in the plex may be specified. If it is not, the subdisk will
|
||
be attached at the first possible location. After attaching a plex to a
|
||
non-empty volume,
|
||
.Nm
|
||
reintegrates the plex.
|
||
.Pp
|
||
If the keyword
|
||
.Nm rename
|
||
is specified,
|
||
.Nm
|
||
renames the object (and in the case of a plex, any subordinate subdisks) to fit
|
||
in with the default
|
||
.Nm
|
||
naming convention.
|
||
.Pp
|
||
A number of considerations apply to attaching subdisks:
|
||
.Bl -bullet
|
||
.It
|
||
Subdisks can normally only be attached to concatenated plexes.
|
||
.It
|
||
If a striped or RAID-5 plex is missing a subdisk (for example after drive
|
||
failure), it should be replaced by a subdisk of the same size only.
|
||
.It
|
||
In order to add further subdisks to a striped or RAID-5 plex, use the
|
||
.Fl f
|
||
(force) option. This will corrupt the data in the plex.
|
||
.\"No other attachment of
|
||
.\"subdisks is currently allowed for striped and RAID-5 plexes.
|
||
.It
|
||
For concatenated plexes, the
|
||
.Ar offset
|
||
parameter specifies the offset in blocks from the beginning of the plex. For
|
||
striped and RAID-5 plexes, it specifies the offset of the first block of the
|
||
subdisk: in other words, the offset is the numerical position of the subdisk
|
||
multiplied by the stripe size. For example, in a plex of block size 256k, the
|
||
first subdisk will have offset 0, the second offset 256k, the third 512k, etc.
|
||
This calculation ignores parity blocks in RAID-5 plexes.
|
||
.El
|
||
.It Nm concat
|
||
.Op Fl f
|
||
.Op Fl n Ar name
|
||
.Op Fl v
|
||
.Ar drives
|
||
.br
|
||
The
|
||
.Nm concat
|
||
command provides a simplified alternative to the
|
||
.Nm create
|
||
command for creating volumes with a single concatenated plex. The largest
|
||
contiguous space available on each drive is used to create the subdisks for the
|
||
plexes.
|
||
.Pp
|
||
Normally, the
|
||
.Nm concat
|
||
command creates an arbitrary name for the volume and its components. The name
|
||
is composed of the text
|
||
.Ar vinum
|
||
and a small integer, for example
|
||
.Ar vinum3 .
|
||
You can override this with the
|
||
.Fl n Ar name
|
||
option, which assigns the name specified to the volume. The plexes and subdisks
|
||
are named after the volume in the default manner.
|
||
.Pp
|
||
There is no choice of name for the drives. If the drives have already been
|
||
initialized as
|
||
.Nm
|
||
drives, the name remains. Otherwise the drives are given names starting with
|
||
the text
|
||
.Ar vinumdrive
|
||
and a small integer, for example
|
||
.Ar vinumdrive7 .
|
||
As with the
|
||
.Nm create
|
||
command, the
|
||
.Fl f
|
||
option can be used to specify that a previous name should be overwritten. The
|
||
.Fl v
|
||
is used to specify verbose output.
|
||
.Pp
|
||
See the section SIMPLIFIED CONFIGURATION below for some examples of this
|
||
command.
|
||
.It Nm create Op Fl f Ar description-file
|
||
.sp
|
||
.Nm
|
||
.Ar create
|
||
is used to create any object. In view of the relatively complicated
|
||
relationship and the potential dangers involved in creating a
|
||
.Nm
|
||
object, there is no interactive interface to this function. If you do not
|
||
specify a file name,
|
||
.Nm
|
||
starts an editor on a temporary file. If the environment variable
|
||
.Ev EDITOR
|
||
is set,
|
||
.Nm
|
||
starts this editor. If not, it defaults to
|
||
.Nm vi .
|
||
See the section CONFIGURATION FILE below for more information on the format of
|
||
this file.
|
||
.Pp
|
||
Note that the
|
||
.Nm
|
||
.Ar create
|
||
function is additive: if you run it multiple times, you will create multiple
|
||
copies of all unnamed objects.
|
||
.Pp
|
||
Normally the
|
||
.Nm create
|
||
command will not change the names of existing
|
||
.Nm
|
||
drives, in order to avoid accidentally erasing them. The correct way to dispose
|
||
of no longer wanted
|
||
.Nm
|
||
drives is to reset the configuration with the
|
||
.Nm resetconfig
|
||
command. In some cases, however, it may be necessary to create new data on
|
||
.Nm
|
||
drives which can no longer be started. In this case, use the
|
||
.Nm create Fl f
|
||
command.
|
||
.It Nm debug
|
||
.Pp
|
||
.Nm
|
||
.Ar debug
|
||
is used to enter the remote kernel debugger. It is only activated if
|
||
.Nm
|
||
is built with the
|
||
.Ar VINUMDEBUG
|
||
option. This option will stop the execution of the operating system until the
|
||
kernel debugger is exited. If remote debugging is set and there is no remote
|
||
connection for a kernel debugger, it will be necessary to reset the system and
|
||
reboot in order to leave the debugger.
|
||
.It Nm debug
|
||
.Ar flags
|
||
.Pp
|
||
Set a bit mask of internal debugging flags. These will change without warning
|
||
as the product matures; to be certain, read the header file
|
||
.Pa sys/dev/vinumvar.h .
|
||
The bit mask is composed of the following values:
|
||
.Bl -hang
|
||
.It DEBUG_ADDRESSES (1)
|
||
.br
|
||
Show buffer information during requests
|
||
.It DEBUG_NUMOUTPUT (2)
|
||
.br
|
||
Show the value of
|
||
.Dv vp->v_numoutput.
|
||
.It DEBUG_RESID (4)
|
||
.br
|
||
Go into debugger in
|
||
.Fd complete_rqe.
|
||
.It DEBUG_LASTREQS (8)
|
||
.br
|
||
Keep a circular buffer of last requests.
|
||
.It DEBUG_REVIVECONFLICT (16)
|
||
.br
|
||
Print info about revive conflicts.
|
||
.It DEBUG_EOFINFO (32)
|
||
.br
|
||
Print information about internal state when returning an EOF on a striped plex.
|
||
.It DEBUG_MEMFREE (64)
|
||
.br
|
||
Maintain a circular list of the last memory areas freed by the memory allocator.
|
||
.It DEBUG_REMOTEGDB (256)
|
||
.br
|
||
Go into remote
|
||
.Ic gdb
|
||
when the
|
||
.Nm debug
|
||
command is issued.
|
||
.It DEBUG_WARNINGS (512)
|
||
.br
|
||
Print some warnings about minor problems in the implementation.
|
||
.El
|
||
.It Nm detach Op Fl f
|
||
.Ar plex
|
||
.if n .sp -1v
|
||
.if t .sp -.6v
|
||
.It Nm detach Op Fl f
|
||
.Ar subdisk
|
||
.sp
|
||
.Nm
|
||
.Ar detach
|
||
removes the specified plex or subdisk from the volume or plex to which it is
|
||
attached. If removing the object would impair the data integrity of the volume,
|
||
the operation will fail unless the
|
||
.Fl f
|
||
option is specified. If the object is named after the object above it (for
|
||
example, subdisk vol1.p7.s0 attached to plex vol1.p7), the name will be changed
|
||
by prepending the text
|
||
.if t ``ex-''
|
||
.if n "ex-"
|
||
(for example, ex-vol1.p7.s0). If necessary, the name will be truncated in the
|
||
process.
|
||
.Pp
|
||
.Nm detach
|
||
does not reduce the number of subdisks in a striped or RAID-5 plex. Instead,
|
||
the subdisk is marked absent, and can later be replaced with the
|
||
.Nm attach
|
||
command.
|
||
.It Nm info
|
||
.br
|
||
.Nm
|
||
.Ar info
|
||
displays information about
|
||
.Nm
|
||
memory usage. This is intended primarily for debugging. With the
|
||
.Fl v
|
||
option, it will give detailed information about the memory areas in use.
|
||
.Pp
|
||
With the
|
||
.Fl V
|
||
option,
|
||
.Ar info
|
||
displays information about the last up to 64 I/O requests handled by the
|
||
.Nm
|
||
driver. This information is only collected if debug flag 8 is set. The format
|
||
looks like:
|
||
.Pp
|
||
.Bd -literal
|
||
vinum -> info -V
|
||
Flags: 0x200 1 opens
|
||
Total of 38 blocks malloced, total memory: 16460
|
||
Maximum allocs: 56, malloc table at 0xf0f72dbc
|
||
|
||
Time Event Buf Dev Offset Bytes SD SDoff Doffset Goffset
|
||
|
||
14:40:00.637758 1VS Write 0xf2361f40 91.3 0x10 16384
|
||
14:40:00.639280 2LR Write 0xf2361f40 91.3 0x10 16384
|
||
14:40:00.639294 3RQ Read 0xf2361f40 4.39 0x104109 8192 19 0 0 0
|
||
14:40:00.639455 3RQ Read 0xf2361f40 4.23 0xd2109 8192 17 0 0 0
|
||
14:40:00.639529 3RQ Read 0xf2361f40 4.15 0x6e109 8192 16 0 0 0
|
||
14:40:00.652978 4DN Read 0xf2361f40 4.39 0x104109 8192 19 0 0 0
|
||
14:40:00.667040 4DN Read 0xf2361f40 4.15 0x6e109 8192 16 0 0 0
|
||
14:40:00.668556 4DN Read 0xf2361f40 4.23 0xd2109 8192 17 0 0 0
|
||
14:40:00.669777 6RP Write 0xf2361f40 4.39 0x104109 8192 19 0 0 0
|
||
14:40:00.685547 4DN Write 0xf2361f40 4.39 0x104109 8192 19 0 0 0
|
||
11:11:14.975184 Lock 0xc2374210 2 0x1f8001
|
||
11:11:15.018400 7VS Write 0xc2374210 0x7c0 32768 10
|
||
11:11:15.018456 8LR Write 0xc2374210 13.39 0xcc0c9 32768
|
||
11:11:15.046229 Unlock 0xc2374210 2 0x1f8001
|
||
.Ed
|
||
.Pp
|
||
The
|
||
.Ar Buf
|
||
field always contains the address of the user buffer header. This can be used
|
||
to identify the requests associated with a user request, though this is not 100%
|
||
reliable: theoretically two requests in sequence could use the same buffer
|
||
header, though this is not common. The beginning of a request can be identified
|
||
by the event
|
||
.Ar 1VS
|
||
or
|
||
.Ar 7VS .
|
||
The first example above shows the requests involved in a user request. The
|
||
second is a subdisk I/O request with locking.
|
||
.Pp
|
||
The
|
||
.Ar Event
|
||
field contains information related to the sequence of events in the request
|
||
chain. The digit
|
||
.Ar 1
|
||
to
|
||
.Ar 6
|
||
indicates the approximate sequence of events, and the two-letter abbreviation is
|
||
a mnemonic for the location
|
||
.Bl -hang
|
||
.It 1VS
|
||
(vinumstrategy) shows information about the user request on entry to
|
||
.Fn vinumstrategy .
|
||
The device number is the
|
||
.Nm
|
||
device, and offset and length are the user parameters. This is always the
|
||
beginning of a request sequence.
|
||
.It 2LR
|
||
(launch_requests) shows the user request just prior to launching the low-level
|
||
.Nm
|
||
requests in the function
|
||
.Fn launch_requests .
|
||
The parameters should be the same as in the
|
||
.Ar 1VS
|
||
information.
|
||
.Pp
|
||
In the following requests,
|
||
.Ar Dev
|
||
is the device number of the associated disk partition,
|
||
.Ar Offset
|
||
is the offset from the beginning of the partition,
|
||
.Ar SD
|
||
is the subdisk index in
|
||
.Dv vinum_conf ,
|
||
.Ar SDoff
|
||
is the offset from the beginning of the subdisk,
|
||
.Ar Doffset
|
||
is the offset of the associated data request, and
|
||
.Ar Goffset
|
||
is the offset of the associated group request, where applicable.
|
||
.It 3RQ
|
||
(request) shows one of possibly several low-level
|
||
.Nm
|
||
requests which are launched to satisfy the high-level request. This information
|
||
is also logged in
|
||
.Fn launch_requests .
|
||
.It 4DN
|
||
(done) is called from
|
||
.Fn complete_rqe ,
|
||
showing the completion of a request. This completion should match a request
|
||
launched either at stage
|
||
.Ar 4DN
|
||
from
|
||
.Fn launch_requests ,
|
||
or from
|
||
.Fn complete_raid5_write
|
||
at stage
|
||
.Ar 5RD
|
||
or
|
||
.Ar 6RP .
|
||
.It 5RD
|
||
(RAID-5 data) is called from
|
||
.Fn complete_raid5_write
|
||
and represents the data written to a RAID-5 data stripe after calculating
|
||
parity.
|
||
.It 6RP
|
||
(RAID-5 parity) is called from
|
||
.Fn complete_raid5_write
|
||
and represents the data written to a RAID-5 parity stripe after calculating
|
||
parity.
|
||
.It 7VS
|
||
shows a subdisk I/O request. These requests are usually internal to
|
||
.Nm
|
||
for operations like initialization or rebuilding plexes.
|
||
.It 8LR
|
||
shows the low--level operation generated for a subdisk I/O request.
|
||
.It Lockwait
|
||
specifies that the process is waiting for a range lock. The parameters are the
|
||
buffer header associated with the request, the plex number and the block number.
|
||
For internal reasons the block number is one higher than the address of the
|
||
beginning of the stripe.
|
||
.It Lock
|
||
specifies that a range lock has been obtained. The parameters are the same as
|
||
for the range lock.
|
||
.It Unlock
|
||
specifies that a range lock has been released. The parameters are the same as
|
||
for the range lock.
|
||
.El
|
||
.\" XXX
|
||
.It Nm init Op Fl w
|
||
.Ar plex
|
||
.Pp
|
||
.Nm
|
||
.Ar init
|
||
initializes a plex by writing zeroes to all its subdisks. This is the only way
|
||
to ensure consistent data in a plex. You must perform this initialization
|
||
before using a RAID-5 plex. It is also recommended for other new plexes.
|
||
.Nm
|
||
initializes all subdisks of a plex in parallel. Since this operation can take a
|
||
long time, it is normally performed in the background. If you want to wait for
|
||
completion of the command, use the
|
||
.Fl w
|
||
(wait) option.
|
||
.Nm
|
||
prints a console message when the initialization is complete.
|
||
.It Nm label
|
||
.Ar volume
|
||
.Pp
|
||
The
|
||
.Nm label
|
||
command writes a
|
||
.Ar ufs
|
||
style volume label on a volume. It is a simple alternative to an appropriate
|
||
call to
|
||
.Ar disklabel .
|
||
This is needed because some
|
||
.Ar ufs
|
||
commands still read the disk to find the label instead of using the correct
|
||
.Ar ioctl
|
||
call to access it.
|
||
.Nm
|
||
maintains a volume label separately from the volume data, so this command is not
|
||
needed for
|
||
.Ar newfs .
|
||
This command is deprecated.
|
||
.Pp
|
||
.It Nm list
|
||
.Op Fl r
|
||
.Op Fl V
|
||
.Op volume | plex | subdisk
|
||
.if n .sp -1v
|
||
.if t .sp -.6v
|
||
.It Nm l
|
||
.Op Fl r
|
||
.Op Fl V
|
||
.Op volume | plex | subdisk
|
||
.if n .sp -1v
|
||
.if t .sp -.6v
|
||
.It Nm ld
|
||
.Op Fl r
|
||
.Op Fl s
|
||
.Op Fl v
|
||
.Op Fl V
|
||
.Op volume
|
||
.if n .sp -1v
|
||
.if t .sp -.6v
|
||
.It Nm ls
|
||
.Op Fl r
|
||
.Op Fl s
|
||
.Op Fl v
|
||
.Op Fl V
|
||
.Op subdisk
|
||
.if n .sp -1v
|
||
.if t .sp -.6v
|
||
.It Nm lp
|
||
.Op Fl r
|
||
.Op Fl s
|
||
.Op Fl v
|
||
.Op Fl V
|
||
.Op plex
|
||
.if n .sp -1v
|
||
.if t .sp -.6v
|
||
.It Nm lv
|
||
.Op Fl r
|
||
.Op Fl s
|
||
.Op Fl v
|
||
.Op Fl V
|
||
.Op volume
|
||
.Pp
|
||
.Ar list
|
||
is used to show information about the specified object. If the argument is
|
||
omitted, information is shown about all objects known to
|
||
.Nm vinum .
|
||
The
|
||
.Ar l
|
||
command is a synonym for
|
||
.Ar list .
|
||
.Pp
|
||
The
|
||
.Fl r
|
||
option relates to volumes and plexes: if specified, it recursively lists
|
||
information for the subdisks and (for a volume) plexes subordinate to the
|
||
objects. The commands
|
||
.Ar lv ,
|
||
.Ar lp ,
|
||
.Ar ls
|
||
and
|
||
.Ar ld
|
||
commands list only volumes, plexes, subdisks and drives respectively. This is
|
||
particularly useful when used without parameters.
|
||
.Pp
|
||
The
|
||
.Fl s
|
||
option causes
|
||
.Nm
|
||
to output device statistics, the
|
||
.Op Fl v
|
||
(verbose) option causes some additional information to be output, and the
|
||
.Op Fl V
|
||
causes considerable additional information to be output.
|
||
.It Nm makedev
|
||
.br
|
||
The
|
||
.Nm makedev
|
||
command removes the directory /dev/vinum and recreates it with device nodes
|
||
which reflect the current configuration. This command is not intended for
|
||
general use, and is provided for emergency use only.
|
||
.Pp
|
||
.It Nm mirror
|
||
.Op Fl f
|
||
.Op Fl n Ar name
|
||
.Op Fl s
|
||
.Op Fl v
|
||
.Ar drives
|
||
.br
|
||
The
|
||
.Nm mirror
|
||
command provides a simplified alternative to the
|
||
.Nm create
|
||
command for creating mirrored volumes. Without any options, it creates a RAID-1
|
||
(mirrored) volume with two concatenated plexes. The largest contiguous space
|
||
available on each drive is used to create the subdisks for the plexes. The
|
||
first plex is built from the odd-numbered drives in the list, and the second
|
||
plex is built from the even-numbered drives. If the drives are of different
|
||
sizes, the plexes will be of different sizes.
|
||
.Pp
|
||
If the
|
||
.Fl s
|
||
option is provided,
|
||
.Nm mirror
|
||
builds striped plexes with a stripe size of 256 kB. The size of the subdisks in
|
||
each plex is the size of the smallest contiguous storage available on any of the
|
||
drives which form the plex. Again, the plexes may differ in size.
|
||
.Pp
|
||
Normally, the
|
||
.Nm mirror
|
||
command creates an arbitrary name for the volume and its components. The name
|
||
is composed of the text
|
||
.Ar vinum
|
||
and a small integer, for example
|
||
.Ar vinum3 .
|
||
You can override this with the
|
||
.Fl n Ar name
|
||
option, which assigns the name specified to the volume. The plexes and subdisks
|
||
are named after the volume in the default manner.
|
||
.Pp
|
||
There is no choice of name for the drives. If the drives have already been
|
||
initialized as
|
||
.Nm
|
||
drives, the name remains. Otherwise the drives are given names starting with
|
||
the text
|
||
.Ar vinumdrive
|
||
and a small integer, for example
|
||
.Ar vinumdrive7 .
|
||
As with the
|
||
.Nm create
|
||
command, the
|
||
.Fl f
|
||
option can be used to specify that a previous name should be overwritten. The
|
||
.Fl v
|
||
is used to specify verbose output.
|
||
.Pp
|
||
See the section SIMPLIFIED CONFIGURATION below for some examples of this
|
||
command.
|
||
.It Nm printconfig Op Pa file
|
||
Write a copy of the current configuration to
|
||
.Pa file
|
||
in a format that can be used to recreate the
|
||
.Nm
|
||
configuration. Unlike the configuration saved on disk, it includes definitions
|
||
of the drives. If you omit
|
||
.Pa file ,
|
||
.Nm
|
||
writes the list to
|
||
.Pa stdout .
|
||
.It Nm quit
|
||
Exit the
|
||
.Nm
|
||
program when running in interactive mode. Normally this would be done by
|
||
entering the
|
||
.Ar EOF
|
||
character.
|
||
.It Nm read
|
||
.Ar disk Op disk...
|
||
.Pp
|
||
The
|
||
.Nm read
|
||
command scans the specified disks for
|
||
.Nm
|
||
partitions containing previously created configuration information. It reads
|
||
the configuration in order from the most recently updated to least recently
|
||
updated configuration.
|
||
.Nm
|
||
maintains an up-to-date copy of all configuration information on each disk
|
||
partition. You must specify all of the slices in a configuration as the
|
||
parameter to this command.
|
||
.Pp
|
||
The
|
||
.Nm read
|
||
command is intended to selectively load a
|
||
.Nm
|
||
configuration on a system which has other
|
||
.Nm
|
||
partitions. If you want to start all partitions on the system, it is easier to
|
||
use the
|
||
.Nm start
|
||
command.
|
||
.Pp
|
||
If
|
||
.Nm
|
||
encounters any errors during this command, it will turn off automatic
|
||
configuration update to avoid corrupting the copies on disk. This will also
|
||
happen if the configuration on disk indicates a configuration error (for
|
||
example, subdisks which do not have a valid space specification). You can turn
|
||
the updates on again with the
|
||
.Nm setdaemon
|
||
and
|
||
.Nm saveconfig
|
||
commands. Reset bit 4 of the daemon options mask to re-enable configuration
|
||
saves.
|
||
.It Nm rename
|
||
.Op Fl r
|
||
.Ar [ drive | subdisk | plex | volume ]
|
||
.Ar newname
|
||
.Pp
|
||
Change the name of the specified object. If the
|
||
.Fl r
|
||
option is specified, subordinate objects will be named by the default rules:
|
||
plex names will be formed by appending .p\f(BInumber\fP to the volume name, and
|
||
subdisk names will be formed by appending .s\f(BInumber\fP to the plex name.
|
||
.It Nm replace
|
||
.Ar [ subdisk | plex ]
|
||
.Ar newobject
|
||
.Pp
|
||
Replace the object with an identical other object. This command has not yet
|
||
been implemented.
|
||
.It Nm resetconfig
|
||
.Pp
|
||
The
|
||
.Nm resetconfig
|
||
command completely obliterates the
|
||
.Nm
|
||
configuration on a system. Use this command only when you want to completely
|
||
delete the configuration.
|
||
.Nm
|
||
will ask for confirmation: you must type in the words NO FUTURE exactly
|
||
as shown:
|
||
.Bd -unfilled -offset indent
|
||
# \f(CBvinum resetconfig\f(CW
|
||
|
||
WARNING! This command will completely wipe out your vinum
|
||
configuration. All data will be lost. If you really want
|
||
to do this, enter the text
|
||
|
||
NO FUTURE
|
||
Enter text -> \f(BINO FUTURE\fP
|
||
Vinum configuration obliterated
|
||
.Ed
|
||
.ft R
|
||
.Pp
|
||
As the message suggests, this is a last-ditch command. Don't use it unless you
|
||
have an existing configuration which you never want to see again.
|
||
.It Nm resetstats
|
||
.Op Fl r
|
||
.Op volume | plex | subdisk
|
||
.Pp
|
||
.Nm
|
||
maintains a number of statistical counters for each object. See the header file
|
||
.Pa vinumvar.h
|
||
for more information.
|
||
.\" XXX put it in here when it's finalized
|
||
Use the
|
||
.Nm resetstats
|
||
command to reset these counters. In conjunction with the
|
||
.Fl r
|
||
option,
|
||
.Nm
|
||
also resets the counters of subordinate objects.
|
||
.It Nm rm
|
||
.Op Fl f
|
||
.Op Fl r
|
||
.Ar volume | plex | subdisk
|
||
.Pp
|
||
.Nm rm
|
||
removes an object from the
|
||
.Nm
|
||
configuration. Once an object has been removed, there is no way to recover it.
|
||
Normally
|
||
.Nm
|
||
performs a large amount of consistency checking before removing an object. The
|
||
.Fl f
|
||
option tells
|
||
.Nm
|
||
to omit this checking and remove the object anyway. Use this option with great
|
||
care: it can result in total loss of data on a volume.
|
||
.Pp
|
||
Normally,
|
||
.Nm
|
||
refuses to remove a volume or plex if it has subordinate plexes or subdisks
|
||
respectively. You can tell
|
||
.Nm
|
||
to remove the object anyway by using the
|
||
.Fl f
|
||
flag, or you can cause
|
||
.Nm
|
||
to remove the subordinate objects as well by using the
|
||
.Fl r
|
||
(recursive) flag. If you remove a volume with the
|
||
.Fl r
|
||
flag, it will remove both the plexes and the subdisks which belong to the
|
||
plexes.
|
||
.It Nm saveconfig
|
||
.Pp
|
||
Save the current configuration to disk. This is primarily a maintenance
|
||
function. For example, if an error occurs on startup, updates will be
|
||
disabled. When you reenable them, the configuration is not automatically saved
|
||
to disk. Use this command to save the configuration.
|
||
.ig
|
||
.It Nm set
|
||
.Op Fl f
|
||
.Ar state
|
||
.Ar volume | plex | subdisk | disk
|
||
.Nm set
|
||
sets the state of the specified object to one of the valid states (see OBJECT
|
||
STATES below). Normally
|
||
.Nm
|
||
performs a large amount of consistency checking before making the change. The
|
||
.Fl f
|
||
option tells
|
||
.Nm
|
||
to omit this checking and perform the change anyway. Use this option with great
|
||
care: it can result in total loss of data on a volume.
|
||
.\"XXX
|
||
.Nm This command has not yet been implemented.
|
||
..
|
||
.It Nm setdaemon
|
||
.Op value
|
||
.Pp
|
||
.Nm setdaemon
|
||
sets a variable bitmask for the
|
||
.Nm
|
||
d<EFBFBD>mon. This command is temporary and will be replaced. Currently, the bit mask
|
||
may contain the bits 1 (log every action to syslog) and 4 (don't update
|
||
configuration). Option bit 4 can be useful for error recovery.
|
||
.It Nm setstate
|
||
.Ar state
|
||
.Op Ar volume | plex | subdisk | drive
|
||
.Pp
|
||
.Nm setstate
|
||
sets the state of the specified objects to the specified state. This bypasses
|
||
the usual consistency mechanism of
|
||
.Nm
|
||
and should be used only for recovery purposes. It is possible to crash the
|
||
system by incorrect use of this command.
|
||
.It Nm start
|
||
.Op Fl w
|
||
.Op volume | plex | subdisk
|
||
.Pp
|
||
.Nm start
|
||
starts (brings into to the
|
||
.Ar up
|
||
state) one or more
|
||
.Nm
|
||
objects.
|
||
.Pp
|
||
If no object names are specified,
|
||
.Nm
|
||
scans the disks known to the system for
|
||
.Nm
|
||
drives and then reads in the configuration as described under the
|
||
.Nm read
|
||
commands. The
|
||
.Nm
|
||
drive contains a header with all information about the data stored on the drive,
|
||
including the names of the other drives which are required in order to represent
|
||
plexes and volumes.
|
||
.Pp
|
||
If
|
||
.Nm
|
||
encounters any errors during this command, it will turn off automatic
|
||
configuration update to avoid corrupting the copies on disk. This will also
|
||
happen if the configuration on disk indicates a configuration error (for
|
||
example, subdisks which do not have a valid space specification). You can turn
|
||
the updates on again with the
|
||
.Nm setdaemon
|
||
and
|
||
.Nm saveconfig
|
||
command. Reset bit 4 of the daemon options mask to re-enable configuration
|
||
saves.
|
||
.Pp
|
||
If object names are specified,
|
||
.Nm
|
||
starts them. Normally this operation is only of use with subdisks. The action
|
||
depends on the current state of the object:
|
||
.Bl -bullet
|
||
.It
|
||
If the
|
||
object is already in the
|
||
.Ar up
|
||
state,
|
||
.Nm
|
||
does nothing.
|
||
.It
|
||
If the object is a subdisk in the
|
||
.Ar down
|
||
or
|
||
.Ar reborn
|
||
states,
|
||
.Nm
|
||
changes it to the
|
||
.Ar up
|
||
state.
|
||
.It
|
||
If the object is a subdisk in the
|
||
.Ar empty
|
||
state, the change depends on the subdisk. If it is part of a plex which is part
|
||
of a volume which contains other plexes,
|
||
.Nm
|
||
places the subdisk in the
|
||
.Ar reviving
|
||
state and attempts to copy the data from the volume. When the operation
|
||
completes, the subdisk is set into the
|
||
.Ar up
|
||
state. If it is part of a plex which is part of a volume which contains no
|
||
other plexes, or if it is not part of a plex,
|
||
.Nm
|
||
brings it into the
|
||
.Ar up
|
||
state immediately.
|
||
.It
|
||
If the object is a subdisk in the
|
||
.Ar reviving
|
||
state,
|
||
.Nm
|
||
continues the
|
||
.Ar revive
|
||
operation offline. When the operation completes, the subdisk is set into the
|
||
.Ar up
|
||
state.
|
||
.El
|
||
.Pp
|
||
When a subdisk comes into the
|
||
.Ar up
|
||
state,
|
||
.Nm
|
||
automatically checks the state of any plex and volume to which it may belong and
|
||
changes their state where appropriate.
|
||
.Pp
|
||
If the object is a volume or a plex,
|
||
.Nm start
|
||
currently has no effect: it checks the state of the subordinate subdisks (and
|
||
plexes in the case of a volume) and sets the state of the object accordingly.
|
||
In a later version, this operation will cause the subdisks
|
||
.Pp
|
||
To start a plex in a multi-plex volume, the data must be copied from another
|
||
plex in the volume. Since this frequently takes a long time, it is normally
|
||
done in the background. If you want to wait for this operation to complete (for
|
||
example, if you are performing this operation in a script), use the
|
||
.Fl w
|
||
flag.
|
||
.It Nm stop
|
||
.Op Fl f
|
||
.Op volume | plex | subdisk
|
||
.Pp
|
||
If no parameters are specified,
|
||
.Nm stop
|
||
removes the
|
||
.Nm
|
||
kld and stops
|
||
.Xr vinum 8 .
|
||
This can only be done if no objects are active, In particular, the
|
||
.Fl f
|
||
flag does not override this requirement. This command can only work if
|
||
.Nm
|
||
has been loaded as a kld, since it is not possible to unload a statically
|
||
configured driver.
|
||
.Nm
|
||
.Nm stop
|
||
will fail if
|
||
.Nm
|
||
is statically configured.
|
||
.Pp
|
||
If object names are specified,
|
||
.Nm stop
|
||
disables access to the objects. If the objects have subordinate objects, they
|
||
subordinate objects must either already be inactive (stopped or in error), or
|
||
the
|
||
.Fl r
|
||
and
|
||
.Fl f
|
||
flags must be specified. This command does not remove the objects from the
|
||
configuration. They can be accessed again after a
|
||
.Nm start
|
||
command.
|
||
.Pp
|
||
By default,
|
||
.Nm
|
||
does not stop active objects. For example, you cannot stop a plex which is
|
||
attached to an active volume, and you cannot stop a volume which is open. The
|
||
.Fl f
|
||
option tells
|
||
.Nm
|
||
to omit this checking and remove the object anyway. Use this option with great
|
||
care and understanding: used incorrectly, it can result in serious data
|
||
corruption.
|
||
.It Nm stripe
|
||
.Op Fl f
|
||
.Op Fl n Ar name
|
||
.Op Fl v
|
||
.Ar drives
|
||
.br
|
||
The
|
||
.Nm stripe
|
||
command provides a simplified alternative to the
|
||
.Nm create
|
||
command for creating volumes with a single striped plex. The size of the
|
||
subdisks is the size of the largest contiguous space available on all the
|
||
specified drives. The stripe size is fixed at 256 kB.
|
||
.Pp
|
||
Normally, the
|
||
.Nm stripe
|
||
command creates an arbitrary name for the volume and its components. The name
|
||
is composed of the text
|
||
.Ar vinum
|
||
and a small integer, for example
|
||
.Ar vinum3 .
|
||
You can override this with the
|
||
.Fl n Ar name
|
||
option, which assigns the name specified to the volume. The plexes and subdisks
|
||
are named after the volume in the default manner.
|
||
.Pp
|
||
There is no choice of name for the drives. If the drives have already been
|
||
initialized as
|
||
.Nm
|
||
drives, the name remains. Otherwise the drives are given names starting with
|
||
the text
|
||
.Ar vinumdrive
|
||
and a small integer, for example
|
||
.Ar vinumdrive7 .
|
||
As with the
|
||
.Nm create
|
||
command, the
|
||
.Fl f
|
||
option can be used to specify that a previous name should be overwritten. The
|
||
.Fl v
|
||
is used to specify verbose output.
|
||
.Pp
|
||
See the section SIMPLIFIED CONFIGURATION below for some examples of this
|
||
command.
|
||
.El
|
||
.Sh SIMPLIFIED CONFIGURATION
|
||
This section describes a simplified interface to
|
||
.Nm
|
||
configuration using the
|
||
.Nm concat ,
|
||
.Nm mirror
|
||
and
|
||
.Nm stripe
|
||
commands. These commands create convenient configurations for some more normal
|
||
situations, but they are not as flexible as the
|
||
.Nm create
|
||
command.
|
||
.Pp
|
||
See above for the description of the commands. Here are some examples, all
|
||
performed with the same collection of disks. Note that the first drive,
|
||
.Pa /dev/da1h ,
|
||
is smaller than the others. This has an effect on the sizes chosen for each
|
||
kind of subdisk.
|
||
.Pp
|
||
The following examples all use the
|
||
.Fl v
|
||
option to show the commands passed to the system, and also to list the structure
|
||
of the volume. Without the
|
||
.Fl v
|
||
option, these commands produce no output.
|
||
.Ss Volume with a single concatenated plex
|
||
Use a volume with a single concatenated plex for the largest possible storage
|
||
without resilience to drive failures:
|
||
.Bd -literal
|
||
vinum -> concat -v /dev/da1h /dev/da2h /dev/da3h /dev/da4h
|
||
volume vinum0
|
||
plex name vinum0.p0 org concat
|
||
drive vinumdrive0 device /dev/da1h
|
||
sd name vinum0.p0.s0 drive vinumdrive0 size 0
|
||
drive vinumdrive1 device /dev/da2h
|
||
sd name vinum0.p0.s1 drive vinumdrive1 size 0
|
||
drive vinumdrive2 device /dev/da3h
|
||
sd name vinum0.p0.s2 drive vinumdrive2 size 0
|
||
drive vinumdrive3 device /dev/da4h
|
||
sd name vinum0.p0.s3 drive vinumdrive3 size 0
|
||
V vinum0 State: up Plexes: 1 Size: 2134 MB
|
||
P vinum0.p0 C State: up Subdisks: 4 Size: 2134 MB
|
||
S vinum0.p0.s0 State: up PO: 0 B Size: 414 MB
|
||
S vinum0.p0.s1 State: up PO: 414 MB Size: 573 MB
|
||
S vinum0.p0.s2 State: up PO: 988 MB Size: 573 MB
|
||
S vinum0.p0.s3 State: up PO: 1561 MB Size: 573 MB
|
||
.Ed
|
||
.Pp
|
||
In this case, the complete space on all four disks was used, giving a volume
|
||
2134 MB in size.
|
||
.Ss Volume with a single striped plex
|
||
A volume with a single striped plex may give better performance than a
|
||
concatenated plex, but restrictions on striped plexes can mean that the volume
|
||
is smaller. It will also not be resilient to a drive failure:
|
||
.Bd -literal
|
||
vinum -> stripe -v /dev/da1h /dev/da2h /dev/da3h /dev/da4h
|
||
drive vinumdrive0 device /dev/da1h
|
||
drive vinumdrive1 device /dev/da2h
|
||
drive vinumdrive2 device /dev/da3h
|
||
drive vinumdrive3 device /dev/da4h
|
||
volume vinum0
|
||
plex name vinum0.p0 org striped 256k
|
||
sd name vinum0.p0.s0 drive vinumdrive0 size 849825b
|
||
sd name vinum0.p0.s1 drive vinumdrive1 size 849825b
|
||
sd name vinum0.p0.s2 drive vinumdrive2 size 849825b
|
||
sd name vinum0.p0.s3 drive vinumdrive3 size 849825b
|
||
V vinum0 State: up Plexes: 1 Size: 1659 MB
|
||
P vinum0.p0 S State: up Subdisks: 4 Size: 1659 MB
|
||
S vinum0.p0.s0 State: up PO: 0 B Size: 414 MB
|
||
S vinum0.p0.s1 State: up PO: 256 kB Size: 414 MB
|
||
S vinum0.p0.s2 State: up PO: 512 kB Size: 414 MB
|
||
S vinum0.p0.s3 State: up PO: 768 kB Size: 414 MB
|
||
.Ed
|
||
.Pp
|
||
In this case, the size of the subdisks has been limited to the smallest
|
||
available disk, so the resulting volume is only 1659 MB in size.
|
||
.Ss Mirrored volume with two concatenated plexes
|
||
For more reliability, use a mirrored, concatenated volume:
|
||
.Bd -literal
|
||
vinum -> mirror -v -n mirror /dev/da1h /dev/da2h /dev/da3h /dev/da4h
|
||
drive vinumdrive0 device /dev/da1h
|
||
drive vinumdrive1 device /dev/da2h
|
||
drive vinumdrive2 device /dev/da3h
|
||
drive vinumdrive3 device /dev/da4h
|
||
volume mirror setupstate
|
||
plex name mirror.p0 org concat
|
||
sd name mirror.p0.s0 drive vinumdrive0 size 0b
|
||
sd name mirror.p0.s1 drive vinumdrive2 size 0b
|
||
plex name mirror.p1 org concat
|
||
sd name mirror.p1.s0 drive vinumdrive1 size 0b
|
||
sd name mirror.p1.s1 drive vinumdrive3 size 0b
|
||
V mirror State: up Plexes: 2 Size: 1146 MB
|
||
P mirror.p0 C State: up Subdisks: 2 Size: 988 MB
|
||
P mirror.p1 C State: up Subdisks: 2 Size: 1146 MB
|
||
S mirror.p0.s0 State: up PO: 0 B Size: 414 MB
|
||
S mirror.p0.s1 State: up PO: 414 MB Size: 573 MB
|
||
S mirror.p1.s0 State: up PO: 0 B Size: 573 MB
|
||
S mirror.p1.s1 State: up PO: 573 MB Size: 573 MB
|
||
.Ed
|
||
.Pp
|
||
This example specifies the name of the volume:
|
||
.Ar mirror .
|
||
Since one drive is smaller than the others, the two plexes are of different
|
||
size, and the last 158 MB of the volume is non-resilient. To ensure complete
|
||
reliability in such a situation, use the
|
||
.Nm create
|
||
command to create a volume with 988 MB.
|
||
.Ss Mirrored volume with two striped plexes
|
||
Alternatively, use the
|
||
.Fl s
|
||
option to create a mirrored volume with two striped plexes:
|
||
.Bd -literal
|
||
vinum -> mirror -v -n raid10 -s /dev/da1h /dev/da2h /dev/da3h /dev/da4h
|
||
drive vinumdrive0 device /dev/da1h
|
||
drive vinumdrive1 device /dev/da2h
|
||
drive vinumdrive2 device /dev/da3h
|
||
drive vinumdrive3 device /dev/da4h
|
||
volume raid10 setupstate
|
||
plex name raid10.p0 org striped 256k
|
||
sd name raid10.p0.s0 drive vinumdrive0 size 849825b
|
||
sd name raid10.p0.s1 drive vinumdrive2 size 849825b
|
||
plex name raid10.p1 org striped 256k
|
||
sd name raid10.p1.s0 drive vinumdrive1 size 1173665b
|
||
sd name raid10.p1.s1 drive vinumdrive3 size 1173665b
|
||
V raid10 State: up Plexes: 2 Size: 1146 MB
|
||
P raid10.p0 S State: up Subdisks: 2 Size: 829 MB
|
||
P raid10.p1 S State: up Subdisks: 2 Size: 1146 MB
|
||
S raid10.p0.s0 State: up PO: 0 B Size: 414 MB
|
||
S raid10.p0.s1 State: up PO: 256 kB Size: 414 MB
|
||
S raid10.p1.s0 State: up PO: 0 B Size: 573 MB
|
||
S raid10.p1.s1 State: up PO: 256 kB Size: 573 MB
|
||
.Ed
|
||
.Pp
|
||
In this case, the usable part of the volume is even smaller, since the first
|
||
plex has shrunken to match the smallest drive.
|
||
.Ss CONFIGURATION FILE
|
||
.Nm
|
||
requires that all parameters to the
|
||
.Nm create
|
||
commands must be in a configuration file. Entries in the configuration file
|
||
define volumes, plexes and subdisks, and may be in free format, except that each
|
||
entry must be on a single line.
|
||
.Pp
|
||
.Ss Scale factors
|
||
Some configuration file parameters specify a size (lengths, stripe sizes).
|
||
These values can be specified as bytes, or one of the following scale factors
|
||
may be appended:
|
||
.Bl -hang
|
||
.It s
|
||
specifies that the value is a number of sectors of 512 bytes.
|
||
.It k
|
||
specifies that the value is a number of kilobytes (1024 bytes).
|
||
.It m
|
||
specifies that the value is a number of megabytes (1048576 bytes).
|
||
.It g
|
||
specifies that the value is a number of gigabytes (1073741824 bytes).
|
||
.It b
|
||
is used for compatibility with VERITAS. It stands for blocks of 512 bytes.
|
||
This abbreviation is confusing, since the word ``block'' is used in different
|
||
meanings, and its use is deprecated.
|
||
.El
|
||
.Pp
|
||
For example, the value 16777216 bytes can also be written as
|
||
.Nm 16m ,
|
||
.Nm 16384k
|
||
or
|
||
.Nm 32768s .
|
||
.Pp
|
||
The configuration file can contain the following entries:
|
||
.Pp
|
||
.Bl -hang -width 4n
|
||
.It Nm drive Ar name devicename
|
||
.Op options
|
||
.Pp
|
||
Define a drive. The options are:
|
||
.Pp
|
||
.Bl -hang -width 18n
|
||
.It Nm device Ar devicename
|
||
Specify the device on which the drive resides.
|
||
.Ar devicename
|
||
must be the name of a disk partition, for example
|
||
.Pa /dev/da1e
|
||
or
|
||
.Pa /dev/wd3s2h ,
|
||
and it must be of type
|
||
.Nm vinum .
|
||
Do not use the
|
||
.Nm c
|
||
partition, which is reserved for the complete disk.
|
||
.It Nm hotspare
|
||
Define the drive to be a
|
||
.Do
|
||
hot spare
|
||
.Dc
|
||
drive, which is maintained to automatically replace a failed drive.
|
||
.Nm
|
||
does not allow this drive to be used for any other purpose. In particular, it
|
||
is not possible to create subdisks on it. This functionality has not been
|
||
completely implemented.
|
||
.El
|
||
.It Nm volume
|
||
.Ar name
|
||
.Op options
|
||
.Pp
|
||
Define a volume with name
|
||
.Ar name .
|
||
.Pp
|
||
Options are:
|
||
.Pp
|
||
.Bl -hang -width 18n
|
||
.It Nm plex Ar plexname
|
||
Add the specified plex to the volume. If
|
||
.Ar plexname
|
||
is specified as
|
||
.Ar * ,
|
||
.Nm
|
||
will look for the definition of the plex as the next possible entry in the
|
||
configuration file after the definition of the volume.
|
||
.It Nm readpol Ar policy
|
||
Define a
|
||
.Ar read policy
|
||
for the volume.
|
||
.Ar policy
|
||
may be either
|
||
.Nm round
|
||
or
|
||
.Nm prefer Ar plexname .
|
||
.Nm
|
||
satisfies a read request from only one of the plexes. A
|
||
.Ar round
|
||
read policy specifies that each read should be performed from a different plex
|
||
in \fIround-robin\fR\| fashion. A
|
||
.Ar prefer
|
||
read policy reads from the specified plex every time.
|
||
.It Nm setupstate
|
||
.Pp
|
||
When creating a multi-plex volume, assume that the contents of all the plexes
|
||
are consistent. This is normally not the case, and correctly you should use the
|
||
.Nm init
|
||
command to first bring them to a consistent state. In the case of striped and
|
||
concatenated plexes, however, it does not normally cause problems to leave them
|
||
inconsistent: when using a volume for a file system or a swap partition, the
|
||
previous contents of the disks are not of interest, so they may be ignored.
|
||
If you want to take this risk, use this keyword. It will only apply to the
|
||
plexes defined immediately after the volume in the configuration file. If you
|
||
add plexes to a volume at a later time, you must integrate them.
|
||
.Pp
|
||
Note that you \fImust\fP\| use the
|
||
.Nm init
|
||
command with RAID-5 plexes: otherwise extreme data corruption will result if one
|
||
subdisk fails.
|
||
.fi
|
||
.El
|
||
.It Nm plex Op options
|
||
.Pp
|
||
Define a plex. Unlike a volume, a plex does not need a name. The options may
|
||
be:
|
||
.Pp
|
||
.Bl -hang -width 18n
|
||
.It Nm name Ar plexname
|
||
Specify the name of the plex. Note that you must use the keyword
|
||
.Ar name
|
||
when naming a plex or subdisk.
|
||
.sp
|
||
.It Nm org Ar organization Op stripesize
|
||
.Pp
|
||
Specify the organization of the plex.
|
||
.Ar organization
|
||
can be one of
|
||
.Ar concat ,
|
||
.Ar striped
|
||
or
|
||
.Ar raid5 .
|
||
For
|
||
.Ar striped
|
||
and
|
||
.Ar raid5
|
||
plexes, the parameter
|
||
.Ar stripesize
|
||
must be specified, while for
|
||
.Ar concat
|
||
it must be omitted. For type
|
||
.Ar striped ,
|
||
it specifies the width of each stripe. For type
|
||
.Ar raid5 ,
|
||
it specifies the size of a group. A group is a portion of a plex which
|
||
stores the parity bits all in the same subdisk. It must be a factor of the plex size (in
|
||
other words, the result of dividing the plex size by the stripe size must be an
|
||
integer), and it must be a multiple of a disk sector (512 bytes).
|
||
.sp
|
||
For optimum performance, stripes should be at least 128 kB in size: anything
|
||
smaller will result in a significant increase in I/O activity due to mapping of
|
||
individual requests over multiple disks. The performance improvement due to the
|
||
increased number of concurrent transfers caused by this mapping will not make up
|
||
for the performance drop due to the increase in latency. A good guideline for
|
||
stripe size is between 256 kB and 512 kB.
|
||
.Pp
|
||
A striped plex must have at least two subdisks (otherwise it is a concatenated
|
||
plex), and each must be the same size. A RAID-5 plex must have at least three
|
||
subdisks, and each must be the same size. In practice, a RAID-5 plex should
|
||
have at least 5 subdisks.
|
||
.Pp
|
||
.It Nm volume Ar volname
|
||
Add the plex to the specified volume. If no
|
||
.Nm volume
|
||
keyword is specified, the plex will be added to the last volume mentioned in the
|
||
configuration file.
|
||
.sp
|
||
.It Nm sd Ar sdname Ar offset
|
||
Add the specified subdisk to the plex at offset
|
||
.Ar offset .
|
||
.br
|
||
.fi
|
||
.El
|
||
.It Nm subdisk Op options
|
||
.Pp
|
||
Define a subdisk. Options may be:
|
||
.Pp
|
||
.Bl -hang -width 18n
|
||
.nf
|
||
.sp
|
||
.It Nm name Ar name
|
||
Specify the name of a subdisk. It is not necessary to specify a name for a
|
||
subdisk\(emsee OBJECT NAMING above. Note that you must specify the keyword
|
||
.Ar name
|
||
if you wish to name a subdisk.
|
||
.sp
|
||
.It Nm plexoffset Ar offset
|
||
Specify the starting offset of the subdisk in the plex. If not specified,
|
||
.Nm
|
||
allocates the space immediately after the previous subdisk, if any, or otherwise
|
||
at the beginning of the plex.
|
||
.sp
|
||
.It Nm driveoffset Ar offset
|
||
Specify the starting offset of the subdisk in the drive. If not specified,
|
||
.Nm
|
||
allocates the first contiguous
|
||
.Ar length
|
||
bytes of free space on the drive.
|
||
.sp
|
||
.It Nm length Ar length
|
||
Specify the length of the subdisk. This keyword must be specified. There is no
|
||
default, but the value 0 may be specified to mean
|
||
.if t ``use the largest available contiguous free area on the drive''.
|
||
.if n "use the largest available contiguous free area on the drive".
|
||
If the drive is empty, this means that the entire drive will be used for the
|
||
subdisk.
|
||
.Nm length
|
||
may be shortened to
|
||
.Nm len .
|
||
.sp
|
||
.It Nm plex Ar plex
|
||
Specify the plex to which the subdisk belongs. By default, the subdisk belongs
|
||
to the last plex specified.
|
||
.sp
|
||
.It Nm drive Ar drive
|
||
Specify the drive on which the subdisk resides. By default, the subdisk resides
|
||
on the last drive specified.
|
||
.br
|
||
.fi
|
||
.El
|
||
.El
|
||
.Sh EXAMPLE CONFIGURATION FILE
|
||
.Bd -literal
|
||
# Sample vinum configuration file
|
||
#
|
||
# Our drives
|
||
drive drive1 device /dev/da1h
|
||
drive drive2 device /dev/da2h
|
||
drive drive3 device /dev/da3h
|
||
drive drive4 device /dev/da4h
|
||
drive drive5 device /dev/da5h
|
||
drive drive6 device /dev/da6h
|
||
# A volume with one striped plex
|
||
volume tinyvol
|
||
plex org striped 512b
|
||
sd length 64m drive drive2
|
||
sd length 64m drive drive4
|
||
volume stripe
|
||
plex org striped 512b
|
||
sd length 512m drive drive2
|
||
sd length 512m drive drive4
|
||
# Two plexes
|
||
volume concat
|
||
plex org concat
|
||
sd length 100m drive drive2
|
||
sd length 50m drive drive4
|
||
plex org concat
|
||
sd length 150m drive drive4
|
||
# A volume with one striped plex and one concatenated plex
|
||
volume strcon
|
||
plex org striped 512b
|
||
sd length 100m drive drive2
|
||
sd length 100m drive drive4
|
||
plex org concat
|
||
sd length 150m drive drive2
|
||
sd length 50m drive drive4
|
||
# a volume with a RAID-5 and a striped plex
|
||
# note that the RAID-5 volume is longer by
|
||
# the length of one subdisk
|
||
volume vol5
|
||
plex org striped 64k
|
||
sd length 1000m drive drive2
|
||
sd length 1000m drive drive4
|
||
plex org raid5 32k
|
||
sd length 500m drive drive1
|
||
sd length 500m drive drive2
|
||
sd length 500m drive drive3
|
||
sd length 500m drive drive4
|
||
sd length 500m drive drive5
|
||
.Ed
|
||
.Ss DRIVE LAYOUT CONSIDERATIONS
|
||
.Nm
|
||
drives are currently BSD disk partitions. They must be of type
|
||
.Ar vinum
|
||
in order to avoid overwriting data used for other purposes. Use
|
||
.Nm disklabel
|
||
.Ar -e
|
||
to edit a partition type definition. The following display shows a typical
|
||
partition layout as shown by
|
||
.Nm disklabel:
|
||
.Bd -literal
|
||
8 partitions:
|
||
# size offset fstype [fsize bsize bps/cpg]
|
||
a: 81920 344064 4.2BSD 0 0 0 # (Cyl. 240*- 297*)
|
||
b: 262144 81920 swap # (Cyl. 57*- 240*)
|
||
c: 4226725 0 unused 0 0 # (Cyl. 0 - 2955*)
|
||
e: 81920 0 4.2BSD 0 0 0 # (Cyl. 0 - 57*)
|
||
f: 1900000 425984 4.2BSD 0 0 0 # (Cyl. 297*- 1626*)
|
||
g: 1900741 2325984 vinum 0 0 0 # (Cyl. 1626*- 2955*)
|
||
.Ed
|
||
.sp
|
||
In this example, partition
|
||
.Nm g
|
||
may be used as a
|
||
.Nm
|
||
partition. Partitions
|
||
.Nm a ,
|
||
.Nm e
|
||
and
|
||
.Nm f
|
||
may be used as
|
||
.Nm UFS
|
||
file systems or
|
||
.Nm ccd
|
||
partitions. Partition
|
||
.Nm b
|
||
is a swap partition, and partition
|
||
.Nm c
|
||
represents the whole disk and should not be used for any other purpose.
|
||
.Pp
|
||
.Nm
|
||
uses the first 265 sectors on each partition for configuration information, so
|
||
the maximum size of a subdisk is 265 sectors smaller than the drive.
|
||
.Sh LOG FILE
|
||
.Nm
|
||
maintains a log file, by default
|
||
.Pa /var/tmp/vinum_history ,
|
||
in which it keeps track of the commands issued to
|
||
.Nm vinum .
|
||
You can override the name of this file by setting the environment variable
|
||
.Ev VINUM_HISTORY
|
||
to the name of the file.
|
||
.Pp
|
||
Each message in the log file is preceded by a date. The default format is
|
||
.Li %e %b %Y %H:%M:%S
|
||
See
|
||
.Xr strftime 3
|
||
for further details of the format string. It can be overridden by the
|
||
environment variable
|
||
.Ev VINUM_DATEFORMAT .
|
||
.Sh HOW TO SET UP VINUM
|
||
This section gives practical advice about how to implement a
|
||
.Nm
|
||
system.
|
||
.Ss Where to put the data
|
||
The first choice you need to make is where to put the data. You need dedicated
|
||
disk partitions for
|
||
.Nm vinum .
|
||
They should be partitions, not devices, and they should not be partition
|
||
.Nm c .
|
||
For example, good names are
|
||
.Pa /dev/da0e
|
||
or
|
||
.Pa /dev/wd3s4a .
|
||
Bad names are
|
||
.Pa /dev/da0
|
||
and
|
||
.Pa /dev/da0s1 ,
|
||
both of which represent a device, not a partition,
|
||
.Pa /dev/wd1c ,
|
||
which represents a complete disk and should be of type
|
||
.Nm unused .
|
||
See the example under DRIVE LAYOUT CONSIDERATIONS above.
|
||
.Ss Designing volumes
|
||
The way you set up
|
||
.Nm
|
||
volumes depends on your intentions. There are a number of possibilities:
|
||
.Bl -enum
|
||
.It
|
||
You may want to join up a number of small disks to make a reasonable sized file
|
||
system. For example, if you had five small drives and wanted to use all the
|
||
space for a single volume, you might write a configuration file like:
|
||
.Bd -literal -offset 4n
|
||
drive d1 device /dev/da2e
|
||
drive d2 device /dev/da3e
|
||
drive d3 device /dev/da4e
|
||
drive d4 device /dev/da5e
|
||
drive d5 device /dev/da6e
|
||
volume bigger
|
||
plex org concat
|
||
sd length 0 drive d1
|
||
sd length 0 drive d2
|
||
sd length 0 drive d3
|
||
sd length 0 drive d4
|
||
sd length 0 drive d5
|
||
.Ed
|
||
.Pp
|
||
In this case, you specify the length of the subdisks as 0, which means
|
||
.if t ``use the largest area of free space that you can find on the drive''.
|
||
.if n "use the largest area of free space that you can find on the drive".
|
||
If the subdisk is the only subdisk on the drive, it will use all available
|
||
space.
|
||
.It
|
||
You want to set up
|
||
.Nm
|
||
to obtain additional resilience against disk failures. You have the choice of
|
||
RAID-1, also called
|
||
.if t ``mirroring'', or RAID-5, also called ``parity''.
|
||
.if n "mirroring", or RAID-5, also called "parity".
|
||
.Pp
|
||
To set up mirroring, create multiple plexes in a volume. For example, to create
|
||
a mirrored volume of 2 GB, you might create the following configuration file:
|
||
.Bd -literal -offset 4n
|
||
drive d1 device /dev/da2e
|
||
drive d2 device /dev/da3e
|
||
volume mirror
|
||
plex org concat
|
||
sd length 2g drive d1
|
||
plex org concat
|
||
sd length 2g drive d2
|
||
.Ed
|
||
.Pp
|
||
When creating mirrored drives, it is important to ensure that the data from each
|
||
plex is on a different physical disk so that
|
||
.Nm
|
||
can access the complete address space of the volume even if a drive fails.
|
||
Note that each plex requires as much data as the complete volume: in this
|
||
example, the volume has a size of 2 GB, but each plex (and each subdisk)
|
||
requires 2 GB, so the total disk storage requirement is 4 GB.
|
||
.Pp
|
||
To set up RAID-5, create a single plex of type
|
||
.Ar raid5 .
|
||
For example, to create an equivalent resilient volume of 2 GB, you might use the
|
||
following configuration file:
|
||
.Bd -literal -offset 4n
|
||
drive d1 device /dev/da2e
|
||
drive d2 device /dev/da3e
|
||
drive d3 device /dev/da4e
|
||
drive d4 device /dev/da5e
|
||
drive d5 device /dev/da6e
|
||
volume raid
|
||
plex org raid5 512k
|
||
sd length 512m drive d1
|
||
sd length 512m drive d2
|
||
sd length 512m drive d3
|
||
sd length 512m drive d4
|
||
sd length 512m drive d5
|
||
.Ed
|
||
.Pp
|
||
RAID-5 plexes require at least three subdisks, one of which is used for storing
|
||
parity information and is lost for data storage. The more disks you use, the
|
||
greater the proportion of the disk storage can be used for data storage. In
|
||
this example, the total storage usage is 2.5 GB, compared to 4 GB for a mirrored
|
||
configuration. If you were to use the minimum of only three disks, you would
|
||
require 3 GB to store the information, for example:
|
||
.Bd -literal -offset 4n
|
||
drive d1 device /dev/da2e
|
||
drive d2 device /dev/da3e
|
||
drive d3 device /dev/da4e
|
||
volume raid
|
||
plex org raid5 512k
|
||
sd length 1g drive d1
|
||
sd length 1g drive d2
|
||
sd length 1g drive d3
|
||
.Ed
|
||
.Pp
|
||
As with creating mirrored drives, it is important to ensure that the data from
|
||
each subdisk is on a different physical disk so that
|
||
.Nm
|
||
can access the complete address space of the volume even if a drive fails.
|
||
.It
|
||
You want to set up
|
||
.Nm
|
||
to allow more concurrent access to a file system. In many cases, access to a
|
||
file system is limited by the speed of the disk. By spreading the volume across
|
||
multiple disks, you can increase the throughput in multi-access environments.
|
||
This technique shows little or no performance improvement in single-access
|
||
environments.
|
||
.Nm
|
||
uses a technique called
|
||
.if t ``striping'',
|
||
.if n "striping",
|
||
or sometimes RAID-0, to increase this concurrency of access. The name RAID-0 is
|
||
misleading: striping does not provide any redundancy or additional reliability.
|
||
In fact, it decreases the reliability, since the failure of a single disk will
|
||
render the volume useless, and the more disks you have, the more likely it is
|
||
that one of them will fail.
|
||
.Pp
|
||
To implement striping, use a
|
||
.Ar striped
|
||
plex:
|
||
.Bd -literal -offset 4n
|
||
drive d1 device /dev/da2e
|
||
drive d2 device /dev/da3e
|
||
drive d3 device /dev/da4e
|
||
drive d4 device /dev/da5e
|
||
volume raid
|
||
plex org striped 512k
|
||
sd length 512m drive d1
|
||
sd length 512m drive d2
|
||
sd length 512m drive d3
|
||
sd length 512m drive d4
|
||
.Ed
|
||
.Pp
|
||
A striped plex must have at least two subdisks, but the increase in performance
|
||
is greater if you have a larger number of disks.
|
||
.It
|
||
You may want to have the best of both worlds and have both resilience and
|
||
performance. This is sometimes called RAID-10 (a combination of RAID-1 and
|
||
RAID-0), though again this name is misleading. With
|
||
.Nm
|
||
you can do this with the following configuration file:
|
||
.Bd -literal -offset 4n
|
||
drive d1 device /dev/da2e
|
||
drive d2 device /dev/da3e
|
||
drive d3 device /dev/da4e
|
||
drive d4 device /dev/da5e
|
||
volume raid
|
||
plex org striped 512k
|
||
sd length 512m drive d1
|
||
sd length 512m drive d2
|
||
sd length 512m drive d3
|
||
sd length 512m drive d4
|
||
plex org striped 512k
|
||
sd length 512m drive d4
|
||
sd length 512m drive d3
|
||
sd length 512m drive d2
|
||
sd length 512m drive d1
|
||
.Ed
|
||
.Pp
|
||
Here the plexes are striped, increasing performance, and there are two of them,
|
||
increasing reliablity. Note that this example shows the subdisks of the second
|
||
plex in reverse order from the first plex. This is for performance reasons and
|
||
will be discussed below.
|
||
.El
|
||
.Ss Creating the volumes
|
||
Once you have created your configuration files, start
|
||
.Nm
|
||
and create the volumes. In this example, the configuration is in the file
|
||
.Pa configfile :
|
||
.Bd -literal
|
||
# vinum create -v configfile
|
||
1: drive d1 device /dev/da2e
|
||
2: drive d2 device /dev/da3e
|
||
3: volume mirror
|
||
4: plex org concat
|
||
5: sd length 2g drive d1
|
||
6: plex org concat
|
||
7: sd length 2g drive d2
|
||
Configuration summary
|
||
|
||
Drives: 2 (4 configured)
|
||
Volumes: 1 (4 configured)
|
||
Plexes: 2 (8 configured)
|
||
Subdisks: 2 (16 configured)
|
||
|
||
Drive d1: Device /dev/da2e
|
||
Created on vinum.lemis.com at Tue Mar 23 12:30:31 1999
|
||
Config last updated Tue Mar 23 14:30:32 1999
|
||
Size: 60105216000 bytes (57320 MB)
|
||
Used: 2147619328 bytes (2048 MB)
|
||
Available: 57957596672 bytes (55272 MB)
|
||
State: up
|
||
Last error: none
|
||
Drive d2: Device /dev/da3e
|
||
Created on vinum.lemis.com at Tue Mar 23 12:30:32 1999
|
||
Config last updated Tue Mar 23 14:30:33 1999
|
||
Size: 60105216000 bytes (57320 MB)
|
||
Used: 2147619328 bytes (2048 MB)
|
||
Available: 57957596672 bytes (55272 MB)
|
||
State: up
|
||
Last error: none
|
||
|
||
Volume mirror: Size: 2147483648 bytes (2048 MB)
|
||
State: up
|
||
Flags:
|
||
2 plexes
|
||
Read policy: round robin
|
||
|
||
Plex mirror.p0: Size: 2147483648 bytes (2048 MB)
|
||
Subdisks: 1
|
||
State: up
|
||
Organization: concat
|
||
Part of volume mirror
|
||
Plex mirror.p1: Size: 2147483648 bytes (2048 MB)
|
||
Subdisks: 1
|
||
State: up
|
||
Organization: concat
|
||
Part of volume mirror
|
||
|
||
Subdisk mirror.p0.s0:
|
||
Size: 2147483648 bytes (2048 MB)
|
||
State: up
|
||
Plex mirror.p0 at offset 0
|
||
|
||
Subdisk mirror.p1.s0:
|
||
Size: 2147483648 bytes (2048 MB)
|
||
State: up
|
||
Plex mirror.p1 at offset 0
|
||
.Ed
|
||
.Pp
|
||
The
|
||
.Fl v
|
||
flag tells
|
||
.Nm
|
||
to list the file as it configures. Subsequently it lists the current
|
||
configuration in the same format as the
|
||
.Nm list Fl v
|
||
command.
|
||
.Ss Creating more volumes
|
||
Once you have created the
|
||
.Nm
|
||
volumes,
|
||
.Nm
|
||
keeps track of them in its internal configuration files. You do not need to
|
||
create them again. In particular, if you run the
|
||
.Nm create
|
||
command again, you will create additional objects:
|
||
.Bd -literal
|
||
.if t .ps -2
|
||
# vinum create sampleconfig
|
||
Configuration summary
|
||
|
||
Drives: 2 (4 configured)
|
||
Volumes: 1 (4 configured)
|
||
Plexes: 4 (8 configured)
|
||
Subdisks: 4 (16 configured)
|
||
|
||
D d1 State: up Device /dev/da2e Avail: 53224/57320 MB (92%)
|
||
D d2 State: up Device /dev/da3e Avail: 53224/57320 MB (92%)
|
||
|
||
V mirror State: up Plexes: 4 Size: 2048 MB
|
||
|
||
P mirror.p0 C State: up Subdisks: 1 Size: 2048 MB
|
||
P mirror.p1 C State: up Subdisks: 1 Size: 2048 MB
|
||
P mirror.p2 C State: up Subdisks: 1 Size: 2048 MB
|
||
P mirror.p3 C State: up Subdisks: 1 Size: 2048 MB
|
||
|
||
S mirror.p0.s0 State: up PO: 0 B Size: 2048 MB
|
||
S mirror.p1.s0 State: up PO: 0 B Size: 2048 MB
|
||
S mirror.p2.s0 State: up PO: 0 B Size: 2048 MB
|
||
S mirror.p3.s0 State: up PO: 0 B Size: 2048 MB
|
||
.if t .ps
|
||
.Ed
|
||
.Pp
|
||
As this example (this time with the
|
||
.Fl f
|
||
flag) shows, re-running the
|
||
.Nm create
|
||
has created four new plexes, each with a new subdisk. If you want to add other
|
||
volumes, create new configuration files for them. They do not need to reference
|
||
the drives that
|
||
.Nm
|
||
already knows about. For example, to create a volume
|
||
.Pa raid
|
||
on the four drives
|
||
.Pa /dev/da1e ,
|
||
.Pa /dev/da2e ,
|
||
.Pa /dev/da3e
|
||
and
|
||
.Pa /dev/da4e ,
|
||
you only need to mention the other two:
|
||
.Bd -literal
|
||
drive d3 device /dev/da1e
|
||
drive d4 device /dev/da4e
|
||
volume raid
|
||
plex org raid5 512k
|
||
sd size 2g drive d1
|
||
sd size 2g drive d2
|
||
sd size 2g drive d3
|
||
sd size 2g drive d4
|
||
.Ed
|
||
.Pp
|
||
With this configuration file, we get:
|
||
.Bd -literal
|
||
# vinum create newconfig
|
||
Configuration summary
|
||
|
||
Drives: 4 (4 configured)
|
||
Volumes: 2 (4 configured)
|
||
Plexes: 5 (8 configured)
|
||
Subdisks: 8 (16 configured)
|
||
|
||
D d1 State: up Device /dev/da2e Avail: 51176/57320 MB (89%)
|
||
D d2 State: up Device /dev/da3e Avail: 53220/57320 MB (89%)
|
||
D d3 State: up Device /dev/da1e Avail: 53224/57320 MB (92%)
|
||
D d4 State: up Device /dev/da4e Avail: 53224/57320 MB (92%)
|
||
|
||
V mirror State: down Plexes: 4 Size: 2048 MB
|
||
V raid State: down Plexes: 1 Size: 6144 MB
|
||
|
||
P mirror.p0 C State: init Subdisks: 1 Size: 2048 MB
|
||
P mirror.p1 C State: init Subdisks: 1 Size: 2048 MB
|
||
P mirror.p2 C State: init Subdisks: 1 Size: 2048 MB
|
||
P mirror.p3 C State: init Subdisks: 1 Size: 2048 MB
|
||
P raid.p0 R5 State: init Subdisks: 4 Size: 6144 MB
|
||
|
||
S mirror.p0.s0 State: up PO: 0 B Size: 2048 MB
|
||
S mirror.p1.s0 State: up PO: 0 B Size: 2048 MB
|
||
S mirror.p2.s0 State: up PO: 0 B Size: 2048 MB
|
||
S mirror.p3.s0 State: up PO: 0 B Size: 2048 MB
|
||
S raid.p0.s0 State: empty PO: 0 B Size: 2048 MB
|
||
S raid.p0.s1 State: empty PO: 512 kB Size: 2048 MB
|
||
S raid.p0.s2 State: empty PO: 1024 kB Size: 2048 MB
|
||
S raid.p0.s3 State: empty PO: 1536 kB Size: 2048 MB
|
||
.Ed
|
||
.Pp
|
||
Note the size of the RAID-5 plex: it is only 6 GB, although together its
|
||
components use 8 GB of disk space. This is because the equivalent of one
|
||
subdisk is used for storing parity data.
|
||
.Ss Restarting Vinum
|
||
On rebooting the system, start
|
||
.Nm
|
||
with the
|
||
.Nm start
|
||
command:
|
||
.Bd -literal
|
||
# vinum start
|
||
.Ed
|
||
.Pp
|
||
This will start all the
|
||
.Nm
|
||
drives in the system. If for some reason you wish to start only some of them,
|
||
use the
|
||
.Nm read
|
||
command.
|
||
.Ss Performance considerations
|
||
A number of misconceptions exist about how to set up a RAID array for best
|
||
performance. In particular, most systems use far too small a stripe size. The
|
||
following discussion applies to all RAID systems, not just to
|
||
.Nm vinum .
|
||
.Pp
|
||
The FreeBSD block I/O system issues requests of between .5kB and 60 kB; a
|
||
typical mix is somewhere round 8 kB. You can't stop any striping system from
|
||
breaking a request into two physical requests, and if you do it wrong it can be
|
||
broken into several. This will result in a significant drop in performance: the
|
||
decrease in transfer time per disk is offset by the order of magnitude greater
|
||
increase in latency.
|
||
.Pp
|
||
With modern disk sizes and the FreeBSD block I/O system, you can expect to have
|
||
a reasonably small number of fragmented requests with a stripe size between 256
|
||
kB and 512 kB; with correct RAID implementations there is no obvious reason not
|
||
to increase the size to 2 or 4 MB on a large disk.
|
||
.Pp
|
||
The easiest way to consider the impact of any transfer in a multi-access system
|
||
is to look at it from the point of view of the potential bottleneck, the disk
|
||
subsystem: how much total disk time does the transfer use? Since just about
|
||
everything is cached, the time relationship between the request and its
|
||
completion is not so important: the important parameter is the total time that
|
||
the request keeps the disks active, the time when the disks are not available to
|
||
perform other transfers. As a result, it doesn't really matter if the transfers
|
||
are happening at the same time or different times. In practical terms, the time
|
||
we're looking at is the sum of the total latency (positioning time and
|
||
rotational latency, or the time it takes for the data to arrive under the disk
|
||
heads) and the total transfer time. For a given transfer to disks of the same
|
||
speed, the transfer time depends only on the total size of the transfer.
|
||
.Pp
|
||
Consider a typical news article or web page of 24 kB, which will probably be
|
||
read in a single I/O. Take disks with a transfer rate of 6 MB/s and an average
|
||
positioning time of 8 ms, and a file system with 4 kB blocks. Since it's 24 kB,
|
||
we don't have to worry about fragments, so the file will start on a 4 kB
|
||
boundary. The number of transfers required depends on where the block starts:
|
||
it's (S + F - 1) / S, where S is the stripe size in file system blocks, and F is
|
||
the file size in file system blocks.
|
||
.Pp
|
||
.Bl -enum
|
||
.It
|
||
Stripe size of 4 kB. You'll have 6 transfers. Total subsystem load: 48 ms
|
||
latency, 2 ms transfer, 50 ms total.
|
||
.It
|
||
Stripe size of 8 kB. On average, you'll have 3.5 transfers. Total subsystem
|
||
load: 28 ms latency, 2 ms transfer, 30 ms total.
|
||
.It
|
||
Stripe size of 16 kB. On average, you'll have 2.25 transfers. Total subsystem
|
||
load: 18 ms latency, 2 ms transfer, 20 ms total.
|
||
.It
|
||
Stripe size of 256 kB. On average, you'll have 1.08 transfers. Total subsystem
|
||
load: 8.6 ms latency, 2 ms transfer, 10.6 ms total.
|
||
.It
|
||
Stripe size of 4 MB. On average, you'll have 1.0009 transfers. Total subsystem
|
||
load: 8.01 ms latency, 2 ms transfer, 10.01 ms total.
|
||
.El
|
||
.Pp
|
||
It appears that some hardware RAID systems have problems with large stripes:
|
||
they appear to always transfer a complete stripe to or from disk, so that a
|
||
large stripe size will have an adverse effect on performance.
|
||
.Nm
|
||
does not suffer from this problem: it optimizes all disk transfers and does not
|
||
transfer unneeded data.
|
||
.Pp
|
||
Note that no well-known benchmark program tests true multi-access conditions
|
||
(more than 100 concurrent users), so it is difficult to demonstrate the validity
|
||
of these statements.
|
||
.Pp
|
||
Given these considerations, the following factors affect the performance of a
|
||
.Nm
|
||
volume:
|
||
.Bl -bullet
|
||
.It
|
||
Striping improves performance for multiple access only, since it increases the
|
||
chance of individual requests being on different drives.
|
||
.It
|
||
Concatenating UFS file systems across multiple drives can also improve
|
||
performance for multiple file access, since UFS divides a file system into
|
||
cylinder groups and attempts to keep files in a single cylinder group. In
|
||
general, it is not as effective as striping.
|
||
.It
|
||
Mirroring can improve multi-access performance for reads, since by default
|
||
.Nm
|
||
issues consecutive reads to consecutive plexes.
|
||
.It
|
||
Mirroring decreases performance for all writes, whether multi-access or single
|
||
access, since the data must be written to both plexes. This explains the
|
||
subdisk layout in the example of a mirroring configuration above: if the
|
||
corresponding subdisk in each plex is on a different physical disk, the write
|
||
commands can be issued in parallel, whereas if they are on the same physical
|
||
disk, they will be performed sequentially.
|
||
.It
|
||
RAID-5 reads have essentially the same considerations as striped reads, unless
|
||
the striped plex is part of a mirrored volume, in which case the performance of
|
||
the mirrored volume will be better.
|
||
.It
|
||
RAID-5 writes are approximately 25% of the speed of striped writes: to perform
|
||
the write,
|
||
.Nm
|
||
must first read the data block and the corresponding parity block, perform some
|
||
calculations and write back the parity block and the data block, four times as
|
||
many transfers as for writing a striped plex. On the other hand, this is offset
|
||
by the cost of mirroring, so writes to a volume with a single RAID-5 plex are
|
||
approximately half the speed of writes to a correctly configured volume with two
|
||
striped plexes.
|
||
.It
|
||
When the
|
||
.Nm
|
||
configuration changes (for example, adding or removing objects, or the change of
|
||
state of one of the objects),
|
||
.Nm
|
||
writes up to 128 kB of updated configuration to each drive. The larger the
|
||
number of drives, the longer this takes.
|
||
.El
|
||
.Ss Creating file systems on Vinum volumes
|
||
You do not need to run
|
||
.Nm disklabel
|
||
before creating a file system on a
|
||
.Nm
|
||
volume. Just run
|
||
.Nm newfs
|
||
against the raw device. Use the
|
||
.Fl v
|
||
option to state that the device is not divided into partitions. For example, to
|
||
create a file system on volume
|
||
.Pa mirror ,
|
||
enter the following command:
|
||
.Bd -literal -offset 4n
|
||
# newfs -v /dev/vinum/rmirror
|
||
.Ed
|
||
.Pp
|
||
Note the name
|
||
.Pa rmirror ,
|
||
indicating the raw device.
|
||
.Sh Other considerations
|
||
A number of other considerations apply to
|
||
.Nm
|
||
configuration:
|
||
.Bl -bullet
|
||
.It
|
||
There is no advantage in creating multiple drives on a single disk. Each drive
|
||
uses 131.5 kB of data for label and configuration information, and performance
|
||
will suffer when the configuration changes. Use appropriately sized subdisks instead.
|
||
.It
|
||
It is possible to increase the size of a concatenated
|
||
.Nm
|
||
plex, but currently the size of striped and RAID-5 plexes cannot be increased.
|
||
Currently the size of an existing UFS file system also cannot be increased, but
|
||
it is planned to make both plexes and file systems extensible.
|
||
.El
|
||
.Sh GOTCHAS
|
||
The following points are not bugs, and they have good reasons for existing, but
|
||
they have shown to cause confusion. Each is discussed in the appropriate
|
||
section above.
|
||
.Bl -enum
|
||
.It
|
||
.Nm
|
||
will not create a device on UFS partitions. Instead, it will return an error
|
||
message
|
||
.if t ``wrong partition type''.
|
||
.if n "wrong partition type".
|
||
The partition type must be
|
||
.Ar vinum .
|
||
.It
|
||
When you create a volume with multiple plexes,
|
||
.Nm
|
||
does not automatically initialize the plexes. This means that the contents are
|
||
not known, but they are certainly not consistent. As a result, by default
|
||
.Nm
|
||
sets the state of all newly-created plexes except the first to
|
||
.Ar stale .
|
||
In order to synchronize them with the first plex, you must
|
||
.Nm start
|
||
their subdisks, which causes
|
||
.Nm
|
||
to copy the data from a plex which is in the
|
||
.Ar up
|
||
state. Depending on the size of the subdisks involved, this can take a long
|
||
time.
|
||
.Pp
|
||
In practice, people aren't too interested in what was in the plex when it was
|
||
created, and other volume managers cheat by setting them
|
||
.Ar up
|
||
anyway.
|
||
.Nm
|
||
provides two ways to ensure that newly created plexes are
|
||
.Ar up :
|
||
.Bl -bullet
|
||
.It
|
||
Create the plexes and then synchronize them with
|
||
.Nm vinum start .
|
||
.It
|
||
Create the volume (not the plex) with the keyword
|
||
.Ar setupstate ,
|
||
which tells
|
||
.Nm
|
||
to ignore any possible inconsistency and set the plexes to be
|
||
.Ar up .
|
||
.El
|
||
.It
|
||
Some of the commands currently supported by
|
||
.Nm
|
||
are not really needed. For reasons which I don't understand, however, I find
|
||
that users frequently try the
|
||
.Nm label
|
||
and
|
||
.Nm resetconfig
|
||
commands, though especially
|
||
.Nm resetconfig
|
||
outputs all sort of dire warnings. Don't use these commands unless you have a
|
||
good reason to do so.
|
||
.It
|
||
Some state transitions are not very intuitive. In fact, it's not clear whether
|
||
this is a bug or a feature. If you find that you can't start an object in some
|
||
strange state, such as a
|
||
.Ar reborn
|
||
subdisk, try first to get it into
|
||
.Ar stopped
|
||
state, with the
|
||
.Nm stop
|
||
or
|
||
.Nm stop Ar -f
|
||
commands. If that works, you should then be able to start it. If you find
|
||
that this is the only way to get out of a position where easier methods fail,
|
||
please report the situation.
|
||
.It
|
||
If you build the kernel module with the
|
||
.Ar -DVINUMDEBUG
|
||
option, you must also build
|
||
.Nm vinum(8)
|
||
with the
|
||
.Ar -DVINUMDEBUG
|
||
option, since the size of some data objects used by both components depends on
|
||
this option. If you don't do so, commands will fail with the message
|
||
.Ar Invalid argument ,
|
||
and a console message will be logged such as
|
||
.Pp
|
||
.Bd -literal
|
||
vinumioctl: invalid ioctl from process 247 (vinum): c0e44642
|
||
.Ed
|
||
.Pp
|
||
This error may also occur if you use old versions of kld or userland program.
|
||
.It
|
||
.Nm
|
||
drives are UNIX disk partitions and should have the partition type
|
||
.Ar vinum .
|
||
This is different from
|
||
.Nm ccd ,
|
||
which expects partitions of type
|
||
.Ar 4.2BSD .
|
||
This behaviour of ccd is an invitation to shoot yourself in the foot: with
|
||
.Nm ccd
|
||
you can easily overwrite a file system.
|
||
.Nm
|
||
will not permit this.
|
||
.Pp
|
||
For similar reasons, the
|
||
.Nm vinum Ar start
|
||
command will not accept a drive on partition
|
||
.Ar c .
|
||
Partition
|
||
.Ar c
|
||
is used by the system to represent the whole disk, and must be of type
|
||
.Ar unused .
|
||
Clearly there is a conflict here, which
|
||
.Nm
|
||
resolves by not using the
|
||
.Ar c
|
||
partition.
|
||
.It
|
||
The
|
||
.Nm vinum Ar read
|
||
command has a particularly emetic syntax. Once it was the only way to start
|
||
.Nm vinum ,
|
||
but now the preferred method is with
|
||
.Nm vinum Ar start .
|
||
.Nm vinum Ar read
|
||
should be used for maintenance purposes only. Note that its syntax has changed,
|
||
and the arguments must be disk slices, such as
|
||
.Pa /dev/da0 ,
|
||
not partitions such as
|
||
.Pa /dev/da0e .
|
||
.El
|
||
.\"XXX.Sh BUGS
|
||
.Sh FILES
|
||
.Ar /dev/vinum
|
||
- directory with device nodes for
|
||
.Nm
|
||
objects.
|
||
.br
|
||
.Ar /dev/vinum/control
|
||
- control device for
|
||
.Nm vinum
|
||
.br
|
||
.Ar /dev/vinum/plex
|
||
- directory containing device nodes for
|
||
.Nm
|
||
plexes.
|
||
.br
|
||
.Ar /dev/vinum/sd
|
||
- directory containing device nodes for
|
||
.Nm
|
||
subdisks.
|
||
.Sh ENVIRONMENT
|
||
.Bl -hang
|
||
.It VINUM_HISTORY
|
||
The name of the log file, by default /var/log/vinum_history.
|
||
.It VINUM_DATEFORMAT
|
||
The format of dates in the log file, by default %e %b %Y %H:%M:%S.
|
||
.It EDITOR
|
||
The name of the editor to use for editing configuration files, by default
|
||
.Nm vi .
|
||
.El
|
||
.Sh SEE ALSO
|
||
.Xr strftime 3 ,
|
||
.Xr vinum 4 ,
|
||
.Xr disklabel 8 ,
|
||
.Xr newfs 8 ,
|
||
.Pa http://www.lemis.com/vinum.html ,
|
||
.Pa http://www.lemis.com/vinum-debugging.html .
|
||
.Sh AUTHOR
|
||
.An Greg Lehey Aq grog@lemis.com .
|
||
.Sh HISTORY
|
||
The
|
||
.Nm
|
||
command first appeared in
|
||
.Fx 3.0 .
|
||
The RAID-5 component of
|
||
.Nm
|
||
was developed for Cybernet Inc.
|
||
.Pa www.cybernet.com
|
||
for its NetMAX product.
|