defined by the SNIA Common RAID Disk Data Format Specification v2.0.
Supports multiple volumes per array and multiple partitions per disk.
Supports standard big-endian and Adaptec's little-endian byte ordering.
Supports all single-layer RAID levels. Dual-layer RAID levels except
RAID10 are not supported now because of GEOM RAID design limitations.
Some work is still to be done, but the present code already manages basic
interoperation with RAID BIOS of the Adaptec 1430SA SATA RAID controller.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
- Implement "configure" command to allow switching operation mode of
running device on-fly without destroying and recreation.
- Implement Active/Read mode as hybrid of Active/Active and Active/Passive.
In this mode all paths not marked FAIL may handle reads same time,
but unlike Active/Active only one path handles write requests at any
point in time. It allows to closer follow original write request order
if above layers need it for data consistency (not waiting for requisite
write completion before sending dependent write).
- Hide duplicate messages about device status change.
- Remove periodic thread wake up with 10Hz rate.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
reduce the size of the partition in the example from 128 blocks to 94
blocks so it will end on a 128-block boundary. Also remove the -b
option from the next example.
MFC after: 3 weeks
existing sections to refer to the new one. Rearrange partitioning scheme
list so MBR and EBR types are together. Also add several corrections for
grammar, clarity, and consistency.
Approved by: gjb (mentor)
MFC after: 1 week
- Improved locking and destruction process to fix crashes.
- Improved "automatic" configuration method to make it consistent and safe
by reading metadata back from all specified paths after writing to one.
- Added provider size check to reduce chance of ordering conflict with
other GEOM classes.
- Added "manual" configuration method without using on-disk metadata.
- Added "add" and "remove" commands to allow manage paths manually.
- Failed paths are no longer dropped from geom, but only marked as FAIL
and excluded from I/O operations.
- Automatically restore failed paths when all others paths are marked
as failed, for example, because of device-caused (not transport) errors.
- Added "fail" and "restore" commands to manually control FAIL flag.
- geom is now destroyed on last path disconnection.
- Added optional Active/Active mode support. Unlike Active/Passive
mode, load evenly distributed between all working paths. If supported by
the device, it allows to significantly improve performance, utilizing
bandwidth of all paths. It is controlled by -A option during creation.
Disabled by default now.
- Improved `status` and `list` commands output.
Sponsored by: iXsystems, inc.
MFC after: 1 month
- add support for volumes above 2TiB with Promise metadata format;
- enforse and document other limitations:
- Intel and Promise metadata formats do not support disks above 2TiB;
- NVIDIA metadata format does not support volumes above 2TiB.
Sponsored by: iXsystems, Inc.
MFC after: 2 weeks
with older FreeBSD versions:
- Add -V option to 'geli init' to specify version number. If no -V is given
the most recent version is used.
- If -V is given don't allow to use features not supported by this version.
- Print version in 'geli list' output.
- Update manual page and add table describing which GELI version is
supported by which FreeBSD version, so one can use it when preparing GELI
device for older FreeBSD version.
Inspired by: Garrett Cooper <yanegomi@gmail.com>
MFC after: 3 days
given GEOM provider or if not providers are given it will print versions
supported by userland geli(8) utility and by ELI GEOM class.
MFC after: 3 days
on a disk with non zero stripesize (e.g. disks with 4k sector size)[1].
Also do not use automatic alignment when size is exactly specified, but
an alignment is not. Use automatic alignment only for case when user
omits both "-s" and "-a" options.
Reported by: Mikael Fridh <frimik at gmail> [1]
Approved by: re (kib)
MFC after: 1 week
bootstrap code images used to boot from MBR, GPT, BSD and VTOC8
schemes.
Reviewed by: marius (previous version)
Approved by: re (kib)
MFC after: 1 week
gpart_write_partcode_vtoc8 does access out of range of allocated memory.
Check size of bootcode before writing it.
Pointed out by: ru
MFC after: 1 week
When user wants have specific alignment - do what user wants.
Use stripesize as alignment value in case, when some of gpart's
arguments are ommitted for automatic calculation.
Suggested by: mav
partition offsets. If user requests specific alignment and
provider's stripesize is not zero, then use a least common multiple
from the stripesize and user specified value.
Also fix "gpart resize" implementation: do not try to align the partition
size, because the start offset may be not aligned. Instead align the
end offset and then calculate size. Also use stripesize and stripeoffset
for "gpart resize" command.
with geometry. And they do recalculation of user specified parameters.
MBR, PC98, VTOC8, EBR schemes are doing that. For these schemes an
auto alignment feature (ie. gpart add -a alignment) would not work.
But it can work for GPT and BSD schemes. BSD scheme usualy is created
inside MBR, so we can use knowledge about offset of MBR partition to
calculate aligned values for BSD partitions.
Use "offset" attribute of the parent provider for better alignment.
MFC after: 2 weeks
Add new RAID GEOM class, that is going to replace ataraid(4) in supporting
various BIOS-based software RAIDs. Unlike ataraid(4) this implementation
does not depend on legacy ata(4) subsystem and can be used with any disk
drivers, including new CAM-based ones (ahci(4), siis(4), mvs(4), ata(4)
with `options ATA_CAM`). To make code more readable and extensible, this
implementation follows modular design, including core part and two sets
of modules, implementing support for different metadata formats and RAID
levels.
Support for such popular metadata formats is now implemented:
Intel, JMicron, NVIDIA, Promise (also used by AMD/ATI) and SiliconImage.
Such RAID levels are now supported:
RAID0, RAID1, RAID1E, RAID10, SINGLE, CONCAT.
For any all of these RAID levels and metadata formats this class supports
full cycle of volume operations: reading, writing, creation, deletion,
disk removal and insertion, rebuilding, dirty shutdown detection
and resynchronization, bad sector recovery, faulty disks tracking,
hot-spare disks. For Intel and Promise formats there is support multiple
volumes per disk set.
Look graid(8) manual page for additional details.
Co-authored by: imp
Sponsored by: Cisco Systems, Inc. and iXsystems, Inc.
Make `geom XXX list` and `geom XXX status` outputs more consistent:
Add -a options to print all geoms, not only ones with providers.
Add -g option for `status` to report geom's names, not provider's.
Make `status` by default report provider's status (if present), not geom's.
Make `status` report consumer's statuses, not only "synchronized" field.
partitions instead of partition's indexes. This may be useful with
GPT partitioning scheme or EBR without GEOM_PART_EBR_COMPAT option.
MFC after: 2 weeks
does restore them only when -l option is specified [1]. Make number of
entries field in backup format optional. Document -l and -r options of
`gpart show` action.
Suggested by: pjd [1]
MFC after: 1 week
big sector size. When gctl error is set gctl_has_param() always returns
'false', which prevents geli(8) from finding some arguments and also masks
an error, which is generates in such case.
MFC after: 3 days
This was needed for recover implementation.
Implement the recover command for GPT. Now GPT will marked as
corrupt when any of three types of corruption will be detected:
1. Damaged primary GPT header or table
2. Damaged secondary GPT header or table
3. Secondary header is not located in the last LBA
Marked GPT becomes read-only. Any changes with corrupt table
are prohibited. Only "destroy" and "recover" commands are allowed.
Discussed with: geom@ (mostly silence)
Tested by: Ilya A. Arhipov
Approved by: mav (mentor)
MFC after: 2 weeks
Before this change if you wanted to suspend your laptop and be sure that your
encryption keys are safe, you had to stop all processes that use file system
stored on encrypted device, unmount the file system and detach geli provider.
This isn't very handy. If you are a lucky user of a laptop where suspend/resume
actually works with FreeBSD (I'm not!) you most likely want to suspend your
laptop, because you don't want to start everything over again when you turn
your laptop back on.
And this is where geli suspend/resume steps in. When you execute:
# geli suspend -a
geli will wait for all in-flight I/O requests, suspend new I/O requests, remove
all geli sensitive data from the kernel memory (like encryption keys) and will
wait for either 'geli resume' or 'geli detach'.
Now with no keys in memory you can suspend your laptop without stopping any
processes or unmounting any file systems.
When you resume your laptop you have to resume geli devices using 'geli resume'
command. You need to provide your passphrase, etc. again so the keys can be
restored and suspended I/O requests released.
Of course you need to remember that 'geli suspend' won't clear file system
cache and other places where data from your geli-encrypted file system might be
present. But to get rid of those stopping processes and unmounting file system
won't help either - you have to turn your laptop off. Be warned.
Also note, that suspending geli device which contains file system with geli
utility (or anything used by 'geli resume') is not very good idea, as you won't
be able to resume it - when you execute geli(8), the kernel will try to read it
and this read I/O request will be suspended.
This is especially useful for things like installers, where regular
geli prompt can't be used.
- Add support for specifing multiple -K or -k options, so there is no
need to cat all keyfiles and read them from standard input.
Requested by: Kris Moore <kris@pcbsd.org>, thompsa
MFC after: 2 weeks
This option doesn't passed to kernel and handled in user-space.
With -F option gpart creates new "delete" request for each
partition in table. Each request has flags="X" that disables
auto-commit feature. Last request is the original "destroy" request.
It has own flags and can have disabled or enabled auto-commit feature.
If error is occurred when deleting partitions, then new "undo" request
is created and all changes will be rolled back.
Approved by: kib (mentor)
to growing the filesystem.
Refuse to attach providers where the metadata provider size is
wrong. This makes post-boot attaches behave consistently with
pre-boot attaches. Also refuse to restore metadata to a provider
of the wrong size without the new -f switch. The new -f switch
forces the metadata restoration despite the provider size, and
updates the provider size in the restored metadata to the correct
value.
Helped by: pjd
Reviewed by: pjd
allow the option to be specified multiple times. This will help to
implement things like passing multiple keyfiles to geli(8) instead of
cat(1)ing them all into stdin and reading from there using one '-k -'
option.
understand everything correctly, we don't really need it.
- Provide default numeric value as strings. This allows to simplify
a lot of code.
- Bump version number.
- ad0 was referred to as da0
- wrong parameter -s instead of -a in example
- use double quotes consistently
PR: docs/150082
Submitted by: N.J. Mann <njm@njm.me.uk>
MFC after: 2 weeks
of add verb. Mention about maximum size limit for "freebsd-boot"
partition. It should be smaller than 545 KB (hardcoded in pmbr).
Show usage of SI unit suffixes in example.
Approved by: mav (mentor)
MFC after: 1 week
Move code that converts params from humanized numbers to sectors count
to subr.c and adjust comment.
Add post-processing for "size" and "start offset" params in gpart,
now they are properly converted to sectors count with known sector size
that can be greater that 512 bytes.
Also replace "unsigned long long" type to "off_t" for unify code since
it used for medium size in libgeom(3) and DIOCGMEDIASIZE ioctl.
PR: bin/146277
Reviewed by: marcel (previous version)
Approved by: kib (mentor)
MFC after: 1 month
- remove stray argument [1]
- remove stray whitespace
- use canonical wording for the HISTORY section
PR: docs/147119 [1]
Submitted by: Alexander Best <alexbestms@wwu.de> [1]
MFC after: 1 week
- Add information regarding VTOC8 bootrstrap code and how it's handled with
r208777 in place.
- Document the mapping of partition types to VTOC8 tags.
- Add examples for VTOC8 to the respective section.
- Eliminated hard sentence breaks.
Reviewed by: marcel (slightly buggy version)
MFC after: 3 days
file to be of maximum size.
- Add special handling required for SMI/VTOC8 disklabel partcode, i.e. avoid
overwriting the label when writing the bootstrap code to the partition
starting at 0 and install it to all partitions when the -i option is omitted
just like geom_sunlabel(4) and sunlabel(8) do by default.
- Add missing prototypes.
- Add const where applicable.
Reviewed by: marcel
MFC after: 3 days
in a device independent manner. Also include an example anticipatory
scheduler, gsched_rr, which gives very nice performance improvements
in presence of competing random access patterns.
This is joint work with Fabio Checconi, developed last year
and presented at BSDCan 2009. You can find details in the
README file or at
http://info.iet.unipi.it/~luigi/geom_sched/
to support various storage boxes which really aren't active-active.
We only write the label on the *first* provider. For all other providers
we just "add" the disk. This also allows for an "add" verb.
A usage implication is that you should specificy the currently active
storage path as the first provider.
Note that this does not add RDAC-like functionality, but better allows for
autovolumefailover configurations (additional checkins elsewhere will support
this).
Sponsored by: Panasas
MFC after: 1 month
Note that due to e.g. write throttling ('wdrain'), it can stall all the disk
I/O instead of just the device it's configured for. Using it for removable
media is therefore not a good idea.
Reviewed by: pjd (earlier version)
when trees were big and FAST mode was enabled by default.
So small block size doesn't benefits linear I/O operations in FAST and
significantly slowdowns in ECONOMIC (default) mode. For single stream random
I/Os so small block doesn't give much benefits, as access time is usually
bigger then transfer time there. Same time it requires all heads to seek
together for every single request, reducing performance on parallel load.
"split" is very ineffective for devices with rotating media as HDDs.
To be effective, it needs that transfer time reduction due to block
splitting was bigger then access time increase due to non-sequential
access. For modern HDDs I was able to reproduce it only with read sizes
of 2MB and above, which is almost not applicable in real life.
"load" algorithm same time is more universal and effective now.
Reviewed by: pjd
GEOM_PART does not exist in the kernel, and 2) the GEOM in
question does not exist.
Additionally abort in case of programming errors that result
in neither the class nor geom not being present in the gctl
request.
Submitted by: "Andrey V. Elsukov" <bu7cher@yandex.ru>
Approved by: re (kib)
gpart(8). LBAs in particular are ugly. The ganularity is a sector,
but users expect byte granularity when specifying the size or offset
with a SI unit. Handle LBAs specially to deal with this.
attributes. The start and end more accurately describe the
space taken by a partition. The offset and size are used to
describe the effective (usable) storage of that partition.
modify the pointer argument passed to it. This triggered an assert in malloc
when a geom command being run under the livefs environment.
PR: bin/130632
Submitted by: Dimitry Andric <dimitry -at- andric.com>
Pointy hat to: me
MFC after: 2 days
* Better wording of sections dealing with physical storage
* A new section on assumptions gvirstor has on its consumer devices
(components) and its interaction with file systems
* Improved markup (by hrs@)
Reviewed by: hrs
Approved by: gnn (mentor)
might do subsequent reads from other providers. This stopped geli (and
probably other classes using g_metadata_store as well) from being put on top
of gvinum raid5 volumes.
Note:
The reason it fails in the gvinum raid5 case is that gvinum will read back the
old parity stripe before calculating the new parity stripe to be written out
again. The write will then fail because the underlying disk to be read is
opened write only.
MFC after: 1 week
formatting a number in a human-friendly way.
Note that with this commit a megabyte changed from 1000000 to
1048576 and a 80G disk is now printed as being 75G in size.
This is deliberate. It's consistent with the core of geom(8).
However, the original choice for a megabyte being 1000000 was
on purpose and matches what disk vendors put on the box. The
consistency is considered more important.
Submitted by: delphij
once it is lost, all data is gone.
Option '-B none' can by used to prevent backup. Option '-B path' can be
used to backup metadata to a different file than the default, which is
/var/backups/<prov>.eli.
The 'geli init' command also prints backup file location and gives short
procedure how to restore metadata.
The 'geli setkey' command now warns that even after passphrase change or keys
update there could be version of the master key encrypted with old
keys/passphrase in the backup file.
Add regression tests to verify that new functionality works as expected.
Update other regression tests so they don't create backup files.
Reviewed by: keramida, rink
Dedicated to: a friend who lost 400GB of his live by accidentally overwritting geli metadata
MFC after: 2 weeks
RELEASE_CRUNCH builds use NO_MAN anyway, so this change is primarily
to avoid that developers have to set NO_MAN manually when they build
the static variant.