partition offsets. If user requests specific alignment and
provider's stripesize is not zero, then use a least common multiple
from the stripesize and user specified value.
Also fix "gpart resize" implementation: do not try to align the partition
size, because the start offset may be not aligned. Instead align the
end offset and then calculate size. Also use stripesize and stripeoffset
for "gpart resize" command.
with geometry. And they do recalculation of user specified parameters.
MBR, PC98, VTOC8, EBR schemes are doing that. For these schemes an
auto alignment feature (ie. gpart add -a alignment) would not work.
But it can work for GPT and BSD schemes. BSD scheme usualy is created
inside MBR, so we can use knowledge about offset of MBR partition to
calculate aligned values for BSD partitions.
Use "offset" attribute of the parent provider for better alignment.
MFC after: 2 weeks
Add new RAID GEOM class, that is going to replace ataraid(4) in supporting
various BIOS-based software RAIDs. Unlike ataraid(4) this implementation
does not depend on legacy ata(4) subsystem and can be used with any disk
drivers, including new CAM-based ones (ahci(4), siis(4), mvs(4), ata(4)
with `options ATA_CAM`). To make code more readable and extensible, this
implementation follows modular design, including core part and two sets
of modules, implementing support for different metadata formats and RAID
levels.
Support for such popular metadata formats is now implemented:
Intel, JMicron, NVIDIA, Promise (also used by AMD/ATI) and SiliconImage.
Such RAID levels are now supported:
RAID0, RAID1, RAID1E, RAID10, SINGLE, CONCAT.
For any all of these RAID levels and metadata formats this class supports
full cycle of volume operations: reading, writing, creation, deletion,
disk removal and insertion, rebuilding, dirty shutdown detection
and resynchronization, bad sector recovery, faulty disks tracking,
hot-spare disks. For Intel and Promise formats there is support multiple
volumes per disk set.
Look graid(8) manual page for additional details.
Co-authored by: imp
Sponsored by: Cisco Systems, Inc. and iXsystems, Inc.
Make `geom XXX list` and `geom XXX status` outputs more consistent:
Add -a options to print all geoms, not only ones with providers.
Add -g option for `status` to report geom's names, not provider's.
Make `status` by default report provider's status (if present), not geom's.
Make `status` report consumer's statuses, not only "synchronized" field.
partitions instead of partition's indexes. This may be useful with
GPT partitioning scheme or EBR without GEOM_PART_EBR_COMPAT option.
MFC after: 2 weeks
does restore them only when -l option is specified [1]. Make number of
entries field in backup format optional. Document -l and -r options of
`gpart show` action.
Suggested by: pjd [1]
MFC after: 1 week
big sector size. When gctl error is set gctl_has_param() always returns
'false', which prevents geli(8) from finding some arguments and also masks
an error, which is generates in such case.
MFC after: 3 days
This was needed for recover implementation.
Implement the recover command for GPT. Now GPT will marked as
corrupt when any of three types of corruption will be detected:
1. Damaged primary GPT header or table
2. Damaged secondary GPT header or table
3. Secondary header is not located in the last LBA
Marked GPT becomes read-only. Any changes with corrupt table
are prohibited. Only "destroy" and "recover" commands are allowed.
Discussed with: geom@ (mostly silence)
Tested by: Ilya A. Arhipov
Approved by: mav (mentor)
MFC after: 2 weeks
Before this change if you wanted to suspend your laptop and be sure that your
encryption keys are safe, you had to stop all processes that use file system
stored on encrypted device, unmount the file system and detach geli provider.
This isn't very handy. If you are a lucky user of a laptop where suspend/resume
actually works with FreeBSD (I'm not!) you most likely want to suspend your
laptop, because you don't want to start everything over again when you turn
your laptop back on.
And this is where geli suspend/resume steps in. When you execute:
# geli suspend -a
geli will wait for all in-flight I/O requests, suspend new I/O requests, remove
all geli sensitive data from the kernel memory (like encryption keys) and will
wait for either 'geli resume' or 'geli detach'.
Now with no keys in memory you can suspend your laptop without stopping any
processes or unmounting any file systems.
When you resume your laptop you have to resume geli devices using 'geli resume'
command. You need to provide your passphrase, etc. again so the keys can be
restored and suspended I/O requests released.
Of course you need to remember that 'geli suspend' won't clear file system
cache and other places where data from your geli-encrypted file system might be
present. But to get rid of those stopping processes and unmounting file system
won't help either - you have to turn your laptop off. Be warned.
Also note, that suspending geli device which contains file system with geli
utility (or anything used by 'geli resume') is not very good idea, as you won't
be able to resume it - when you execute geli(8), the kernel will try to read it
and this read I/O request will be suspended.
This is especially useful for things like installers, where regular
geli prompt can't be used.
- Add support for specifing multiple -K or -k options, so there is no
need to cat all keyfiles and read them from standard input.
Requested by: Kris Moore <kris@pcbsd.org>, thompsa
MFC after: 2 weeks
This option doesn't passed to kernel and handled in user-space.
With -F option gpart creates new "delete" request for each
partition in table. Each request has flags="X" that disables
auto-commit feature. Last request is the original "destroy" request.
It has own flags and can have disabled or enabled auto-commit feature.
If error is occurred when deleting partitions, then new "undo" request
is created and all changes will be rolled back.
Approved by: kib (mentor)
to growing the filesystem.
Refuse to attach providers where the metadata provider size is
wrong. This makes post-boot attaches behave consistently with
pre-boot attaches. Also refuse to restore metadata to a provider
of the wrong size without the new -f switch. The new -f switch
forces the metadata restoration despite the provider size, and
updates the provider size in the restored metadata to the correct
value.
Helped by: pjd
Reviewed by: pjd
allow the option to be specified multiple times. This will help to
implement things like passing multiple keyfiles to geli(8) instead of
cat(1)ing them all into stdin and reading from there using one '-k -'
option.
understand everything correctly, we don't really need it.
- Provide default numeric value as strings. This allows to simplify
a lot of code.
- Bump version number.
- ad0 was referred to as da0
- wrong parameter -s instead of -a in example
- use double quotes consistently
PR: docs/150082
Submitted by: N.J. Mann <njm@njm.me.uk>
MFC after: 2 weeks
of add verb. Mention about maximum size limit for "freebsd-boot"
partition. It should be smaller than 545 KB (hardcoded in pmbr).
Show usage of SI unit suffixes in example.
Approved by: mav (mentor)
MFC after: 1 week
Move code that converts params from humanized numbers to sectors count
to subr.c and adjust comment.
Add post-processing for "size" and "start offset" params in gpart,
now they are properly converted to sectors count with known sector size
that can be greater that 512 bytes.
Also replace "unsigned long long" type to "off_t" for unify code since
it used for medium size in libgeom(3) and DIOCGMEDIASIZE ioctl.
PR: bin/146277
Reviewed by: marcel (previous version)
Approved by: kib (mentor)
MFC after: 1 month
- remove stray argument [1]
- remove stray whitespace
- use canonical wording for the HISTORY section
PR: docs/147119 [1]
Submitted by: Alexander Best <alexbestms@wwu.de> [1]
MFC after: 1 week
- Add information regarding VTOC8 bootrstrap code and how it's handled with
r208777 in place.
- Document the mapping of partition types to VTOC8 tags.
- Add examples for VTOC8 to the respective section.
- Eliminated hard sentence breaks.
Reviewed by: marcel (slightly buggy version)
MFC after: 3 days
file to be of maximum size.
- Add special handling required for SMI/VTOC8 disklabel partcode, i.e. avoid
overwriting the label when writing the bootstrap code to the partition
starting at 0 and install it to all partitions when the -i option is omitted
just like geom_sunlabel(4) and sunlabel(8) do by default.
- Add missing prototypes.
- Add const where applicable.
Reviewed by: marcel
MFC after: 3 days
in a device independent manner. Also include an example anticipatory
scheduler, gsched_rr, which gives very nice performance improvements
in presence of competing random access patterns.
This is joint work with Fabio Checconi, developed last year
and presented at BSDCan 2009. You can find details in the
README file or at
http://info.iet.unipi.it/~luigi/geom_sched/
to support various storage boxes which really aren't active-active.
We only write the label on the *first* provider. For all other providers
we just "add" the disk. This also allows for an "add" verb.
A usage implication is that you should specificy the currently active
storage path as the first provider.
Note that this does not add RDAC-like functionality, but better allows for
autovolumefailover configurations (additional checkins elsewhere will support
this).
Sponsored by: Panasas
MFC after: 1 month
Note that due to e.g. write throttling ('wdrain'), it can stall all the disk
I/O instead of just the device it's configured for. Using it for removable
media is therefore not a good idea.
Reviewed by: pjd (earlier version)
when trees were big and FAST mode was enabled by default.
So small block size doesn't benefits linear I/O operations in FAST and
significantly slowdowns in ECONOMIC (default) mode. For single stream random
I/Os so small block doesn't give much benefits, as access time is usually
bigger then transfer time there. Same time it requires all heads to seek
together for every single request, reducing performance on parallel load.
"split" is very ineffective for devices with rotating media as HDDs.
To be effective, it needs that transfer time reduction due to block
splitting was bigger then access time increase due to non-sequential
access. For modern HDDs I was able to reproduce it only with read sizes
of 2MB and above, which is almost not applicable in real life.
"load" algorithm same time is more universal and effective now.
Reviewed by: pjd
GEOM_PART does not exist in the kernel, and 2) the GEOM in
question does not exist.
Additionally abort in case of programming errors that result
in neither the class nor geom not being present in the gctl
request.
Submitted by: "Andrey V. Elsukov" <bu7cher@yandex.ru>
Approved by: re (kib)
gpart(8). LBAs in particular are ugly. The ganularity is a sector,
but users expect byte granularity when specifying the size or offset
with a SI unit. Handle LBAs specially to deal with this.
attributes. The start and end more accurately describe the
space taken by a partition. The offset and size are used to
describe the effective (usable) storage of that partition.
modify the pointer argument passed to it. This triggered an assert in malloc
when a geom command being run under the livefs environment.
PR: bin/130632
Submitted by: Dimitry Andric <dimitry -at- andric.com>
Pointy hat to: me
MFC after: 2 days
* Better wording of sections dealing with physical storage
* A new section on assumptions gvirstor has on its consumer devices
(components) and its interaction with file systems
* Improved markup (by hrs@)
Reviewed by: hrs
Approved by: gnn (mentor)
might do subsequent reads from other providers. This stopped geli (and
probably other classes using g_metadata_store as well) from being put on top
of gvinum raid5 volumes.
Note:
The reason it fails in the gvinum raid5 case is that gvinum will read back the
old parity stripe before calculating the new parity stripe to be written out
again. The write will then fail because the underlying disk to be read is
opened write only.
MFC after: 1 week
formatting a number in a human-friendly way.
Note that with this commit a megabyte changed from 1000000 to
1048576 and a 80G disk is now printed as being 75G in size.
This is deliberate. It's consistent with the core of geom(8).
However, the original choice for a megabyte being 1000000 was
on purpose and matches what disk vendors put on the box. The
consistency is considered more important.
Submitted by: delphij
once it is lost, all data is gone.
Option '-B none' can by used to prevent backup. Option '-B path' can be
used to backup metadata to a different file than the default, which is
/var/backups/<prov>.eli.
The 'geli init' command also prints backup file location and gives short
procedure how to restore metadata.
The 'geli setkey' command now warns that even after passphrase change or keys
update there could be version of the master key encrypted with old
keys/passphrase in the backup file.
Add regression tests to verify that new functionality works as expected.
Update other regression tests so they don't create backup files.
Reviewed by: keramida, rink
Dedicated to: a friend who lost 400GB of his live by accidentally overwritting geli metadata
MFC after: 2 weeks
RELEASE_CRUNCH builds use NO_MAN anyway, so this change is primarily
to avoid that developers have to set NO_MAN manually when they build
the static variant.
The -l option changes the output to show the partition label, if applicable
and when present. The -r option changes the output to show the raw (i.e.
scheme-specific) partition types.
o gctl_delete_param() -- intended for parameters that are consumed
by geom(8) itself and which should not be passed to the kernel.
o gctl_has_param() -- intended to check if optional parameters are
present.
Both are needed by gpart(8) to process the optional parameter for
writing bootcode to a partition (as part of the bootcode verb).
However, the kernel is itself not involved in this matter and the
parameter needs to be removed from the request destined for the
kernel.
and define STATIC_GEOM_CLASSES when building the rescue binary. This way
geom can more easily be part of other crunched binaries, as it requires
only a Makefile change.
providers with limited physical storage and add physical storage as
needed.
Submitted by: Ivan Voras
Sponsored by: Google Summer of Code 2006
Approved by: re (kensmith)
This allows to use numbers in human-readable form in many geom(8)
utilities. Such a simple change and makes live so much nicer.
Some examples:
gstripe label -s 16k
gmirror label -s 4k
gnop create -o 1g -s 128m -S 2k
gjournal label -s 2g
geli label -i 128k -s 4k
Approved by: re (kensmith)
previous commit and that introduced optional parameters.
Existing classes (like geli(8)) use empty strings by default
and expect the parameter to be passed to the kernel as such.
Also, the default value of a string argument can be NULL.
Fix both cases by making the optional parameter conditional
upon gc_argname being set and making sure to test for NULL
before dereferencing the pointer.
Reported by: brueffer@
In order to support gpart(8), geom(8) needs to support a named
argument. Also, optional string parameters are a requirement.
Both have been added to the infrastructure. The former required
all existing classes to be adjusted.
which size is not multiple of sector size.
Reported by: Eric Anderson <anderson@centtech.com>
- Improve wording in error message. I'm sorry, I don't remember who
submitted this one.
to problems when the geli device is used with file system or as a swap.
Hopefully will prevent problems like kern/98742 in the future.
MFC after: 1 week
arrangement that has no intrinsic internal knowledge of whether devices
it is given are truly multipath devices. As such, this is a simplistic
approach, but still a useful one.
The basic approach is to (at present- this will change soon) use camcontrol
to find likely identical devices and and label the trailing sector of the
first one. This label contains both a full UUID and a name. The name is
what is presented in /dev/multipath, but the UUID is used as a true
distinguishor at g_taste time, thus making sure we don't have chaos
on a shared SAN where everyone names their data multipath as "Fred".
The first of N identical devices (and N *may* be 1!) becomes the active
path until a BIO request is failed with EIO or ENXIO. When this occurs,
the active disk is ripped away and the next in a list is picked to
(retry and) continue with.
During g_taste events new disks that meet the match criteria for existing
multipath geoms get added to the tail end of the list.
Thus, this active/passive setup actually does work for devices which
go away and come back, as do (now) mpt(4) and isp(4) SAN based disks.
There is still a lot to do to improve this- like about 5 of the 12
recommendations I've received about it, but it's been functional enough
for a while that it deserves a broader test base.
Reviewed by: pjd
Sponsored by: IronPort Systems
MFC: 2 months
- First configured key is based only on keyfile (no passphrase).
- Device is attached.
- User changes first key (setkey) from keyfile to passphrase and doesn't
specify number of iterations (with -i option).
...geli(8) won't store calculated number of iterations in metadata.
This result in device beeing unaccesable after detach.
One can recover from this situation by guessing number of iterations
generated, storing it in metadata and trying to attach device.
Recovery procedure isn't nice, but one's data is not lost.
Reported by: Thomas Nickl <T.Nickl@gmx.net>
MFC after: 1 week
gmirror and graid3 in a way that it is not resynchronized after a
power failure or system crash.
It is safe when gjournal is running on top of gmirror/graid3.
read requests to its consumer. It has been developed to address
the problem of a horrible read performance of a 64k blocksize FS
residing on a RAID3 array with 8 data components, where a single
disk component would only get 8k read requests, thus effectively
killing disk performance under high load. Documentation will be
provided later. I'd like to thank Vsevolod Lobko for his bright
ideas, and Pawel Jakub Dawidek for helping me fix the nasty bug.
- Print proper error message when argument is specified twice.
Before the change it was detected properly, because of how
G_OPT_DONE() macro worked.
- Use err(3) functions where appropriate.
- Add some assertions.
- Bump version number, because G_TYPE_BOOL addition breaks API and ABI.
Changes: 98721,98722,98723,101360,106985
- after killing all attached providers, all providers are then detached
and operation is repeated for those who were attached,
- we don't want to remove keys for read-only attached providers, we only
want to detach them.
MFC after: 1 week
Now, encryption algorithm is given using '-e' option, not '-a'.
The '-a' option is now used to specify authentication algorithm.
Supported by: Wheel Sp. z o.o. (http://www.wheel.pl)
supported for a moment.
- Don't allow to use -i when no passphrase is given. Now if iterations is
equal to -1 (not set), we know that we should not ask for the passphrase
on boot.
It still doesn't handle situation when one key is protected with
passphrase and the other is not. There is no quick fix for this.
The complete solution will be to make number of iterations a per-key
value. Because this need metadata format change and is only needed for
devices attached on boot, I'll leave it as it is for now.
MFC after: 3 days
- number of read I/O requests,
- number of write I/O requests,
- number of read bytes,
- number of written bytes.
Add 'reset' subcommand for resetting statistics.
value (intmax_t) and boolean (int).
Based on that provide three functions:
- gctl_get_ascii()
- gctl_get_int()
- gctl_get_intmax()
- Hide gctl_get_param() function, as it is only used internally in
subr.c.
- Allow to provide argument name as (fmt, ...).
- Assert geom(8) bugs (missing argument is a geom(8) bug).
- Clean-up and simplify the code by using new functions and assumtions
(no more checking for missing argument).
Tested by: regression tests