to growing the filesystem.
Refuse to attach providers where the metadata provider size is
wrong. This makes post-boot attaches behave consistently with
pre-boot attaches. Also refuse to restore metadata to a provider
of the wrong size without the new -f switch. The new -f switch
forces the metadata restoration despite the provider size, and
updates the provider size in the restored metadata to the correct
value.
Helped by: pjd
Reviewed by: pjd
allow the option to be specified multiple times. This will help to
implement things like passing multiple keyfiles to geli(8) instead of
cat(1)ing them all into stdin and reading from there using one '-k -'
option.
understand everything correctly, we don't really need it.
- Provide default numeric value as strings. This allows to simplify
a lot of code.
- Bump version number.
- ad0 was referred to as da0
- wrong parameter -s instead of -a in example
- use double quotes consistently
PR: docs/150082
Submitted by: N.J. Mann <njm@njm.me.uk>
MFC after: 2 weeks
of add verb. Mention about maximum size limit for "freebsd-boot"
partition. It should be smaller than 545 KB (hardcoded in pmbr).
Show usage of SI unit suffixes in example.
Approved by: mav (mentor)
MFC after: 1 week
Move code that converts params from humanized numbers to sectors count
to subr.c and adjust comment.
Add post-processing for "size" and "start offset" params in gpart,
now they are properly converted to sectors count with known sector size
that can be greater that 512 bytes.
Also replace "unsigned long long" type to "off_t" for unify code since
it used for medium size in libgeom(3) and DIOCGMEDIASIZE ioctl.
PR: bin/146277
Reviewed by: marcel (previous version)
Approved by: kib (mentor)
MFC after: 1 month
- remove stray argument [1]
- remove stray whitespace
- use canonical wording for the HISTORY section
PR: docs/147119 [1]
Submitted by: Alexander Best <alexbestms@wwu.de> [1]
MFC after: 1 week
- Add information regarding VTOC8 bootrstrap code and how it's handled with
r208777 in place.
- Document the mapping of partition types to VTOC8 tags.
- Add examples for VTOC8 to the respective section.
- Eliminated hard sentence breaks.
Reviewed by: marcel (slightly buggy version)
MFC after: 3 days
file to be of maximum size.
- Add special handling required for SMI/VTOC8 disklabel partcode, i.e. avoid
overwriting the label when writing the bootstrap code to the partition
starting at 0 and install it to all partitions when the -i option is omitted
just like geom_sunlabel(4) and sunlabel(8) do by default.
- Add missing prototypes.
- Add const where applicable.
Reviewed by: marcel
MFC after: 3 days
in a device independent manner. Also include an example anticipatory
scheduler, gsched_rr, which gives very nice performance improvements
in presence of competing random access patterns.
This is joint work with Fabio Checconi, developed last year
and presented at BSDCan 2009. You can find details in the
README file or at
http://info.iet.unipi.it/~luigi/geom_sched/
to support various storage boxes which really aren't active-active.
We only write the label on the *first* provider. For all other providers
we just "add" the disk. This also allows for an "add" verb.
A usage implication is that you should specificy the currently active
storage path as the first provider.
Note that this does not add RDAC-like functionality, but better allows for
autovolumefailover configurations (additional checkins elsewhere will support
this).
Sponsored by: Panasas
MFC after: 1 month
Note that due to e.g. write throttling ('wdrain'), it can stall all the disk
I/O instead of just the device it's configured for. Using it for removable
media is therefore not a good idea.
Reviewed by: pjd (earlier version)
when trees were big and FAST mode was enabled by default.
So small block size doesn't benefits linear I/O operations in FAST and
significantly slowdowns in ECONOMIC (default) mode. For single stream random
I/Os so small block doesn't give much benefits, as access time is usually
bigger then transfer time there. Same time it requires all heads to seek
together for every single request, reducing performance on parallel load.
"split" is very ineffective for devices with rotating media as HDDs.
To be effective, it needs that transfer time reduction due to block
splitting was bigger then access time increase due to non-sequential
access. For modern HDDs I was able to reproduce it only with read sizes
of 2MB and above, which is almost not applicable in real life.
"load" algorithm same time is more universal and effective now.
Reviewed by: pjd
GEOM_PART does not exist in the kernel, and 2) the GEOM in
question does not exist.
Additionally abort in case of programming errors that result
in neither the class nor geom not being present in the gctl
request.
Submitted by: "Andrey V. Elsukov" <bu7cher@yandex.ru>
Approved by: re (kib)
gpart(8). LBAs in particular are ugly. The ganularity is a sector,
but users expect byte granularity when specifying the size or offset
with a SI unit. Handle LBAs specially to deal with this.
attributes. The start and end more accurately describe the
space taken by a partition. The offset and size are used to
describe the effective (usable) storage of that partition.
modify the pointer argument passed to it. This triggered an assert in malloc
when a geom command being run under the livefs environment.
PR: bin/130632
Submitted by: Dimitry Andric <dimitry -at- andric.com>
Pointy hat to: me
MFC after: 2 days
* Better wording of sections dealing with physical storage
* A new section on assumptions gvirstor has on its consumer devices
(components) and its interaction with file systems
* Improved markup (by hrs@)
Reviewed by: hrs
Approved by: gnn (mentor)
might do subsequent reads from other providers. This stopped geli (and
probably other classes using g_metadata_store as well) from being put on top
of gvinum raid5 volumes.
Note:
The reason it fails in the gvinum raid5 case is that gvinum will read back the
old parity stripe before calculating the new parity stripe to be written out
again. The write will then fail because the underlying disk to be read is
opened write only.
MFC after: 1 week
formatting a number in a human-friendly way.
Note that with this commit a megabyte changed from 1000000 to
1048576 and a 80G disk is now printed as being 75G in size.
This is deliberate. It's consistent with the core of geom(8).
However, the original choice for a megabyte being 1000000 was
on purpose and matches what disk vendors put on the box. The
consistency is considered more important.
Submitted by: delphij
once it is lost, all data is gone.
Option '-B none' can by used to prevent backup. Option '-B path' can be
used to backup metadata to a different file than the default, which is
/var/backups/<prov>.eli.
The 'geli init' command also prints backup file location and gives short
procedure how to restore metadata.
The 'geli setkey' command now warns that even after passphrase change or keys
update there could be version of the master key encrypted with old
keys/passphrase in the backup file.
Add regression tests to verify that new functionality works as expected.
Update other regression tests so they don't create backup files.
Reviewed by: keramida, rink
Dedicated to: a friend who lost 400GB of his live by accidentally overwritting geli metadata
MFC after: 2 weeks
RELEASE_CRUNCH builds use NO_MAN anyway, so this change is primarily
to avoid that developers have to set NO_MAN manually when they build
the static variant.
The -l option changes the output to show the partition label, if applicable
and when present. The -r option changes the output to show the raw (i.e.
scheme-specific) partition types.
o gctl_delete_param() -- intended for parameters that are consumed
by geom(8) itself and which should not be passed to the kernel.
o gctl_has_param() -- intended to check if optional parameters are
present.
Both are needed by gpart(8) to process the optional parameter for
writing bootcode to a partition (as part of the bootcode verb).
However, the kernel is itself not involved in this matter and the
parameter needs to be removed from the request destined for the
kernel.
and define STATIC_GEOM_CLASSES when building the rescue binary. This way
geom can more easily be part of other crunched binaries, as it requires
only a Makefile change.
providers with limited physical storage and add physical storage as
needed.
Submitted by: Ivan Voras
Sponsored by: Google Summer of Code 2006
Approved by: re (kensmith)
This allows to use numbers in human-readable form in many geom(8)
utilities. Such a simple change and makes live so much nicer.
Some examples:
gstripe label -s 16k
gmirror label -s 4k
gnop create -o 1g -s 128m -S 2k
gjournal label -s 2g
geli label -i 128k -s 4k
Approved by: re (kensmith)
previous commit and that introduced optional parameters.
Existing classes (like geli(8)) use empty strings by default
and expect the parameter to be passed to the kernel as such.
Also, the default value of a string argument can be NULL.
Fix both cases by making the optional parameter conditional
upon gc_argname being set and making sure to test for NULL
before dereferencing the pointer.
Reported by: brueffer@
In order to support gpart(8), geom(8) needs to support a named
argument. Also, optional string parameters are a requirement.
Both have been added to the infrastructure. The former required
all existing classes to be adjusted.
which size is not multiple of sector size.
Reported by: Eric Anderson <anderson@centtech.com>
- Improve wording in error message. I'm sorry, I don't remember who
submitted this one.
to problems when the geli device is used with file system or as a swap.
Hopefully will prevent problems like kern/98742 in the future.
MFC after: 1 week
arrangement that has no intrinsic internal knowledge of whether devices
it is given are truly multipath devices. As such, this is a simplistic
approach, but still a useful one.
The basic approach is to (at present- this will change soon) use camcontrol
to find likely identical devices and and label the trailing sector of the
first one. This label contains both a full UUID and a name. The name is
what is presented in /dev/multipath, but the UUID is used as a true
distinguishor at g_taste time, thus making sure we don't have chaos
on a shared SAN where everyone names their data multipath as "Fred".
The first of N identical devices (and N *may* be 1!) becomes the active
path until a BIO request is failed with EIO or ENXIO. When this occurs,
the active disk is ripped away and the next in a list is picked to
(retry and) continue with.
During g_taste events new disks that meet the match criteria for existing
multipath geoms get added to the tail end of the list.
Thus, this active/passive setup actually does work for devices which
go away and come back, as do (now) mpt(4) and isp(4) SAN based disks.
There is still a lot to do to improve this- like about 5 of the 12
recommendations I've received about it, but it's been functional enough
for a while that it deserves a broader test base.
Reviewed by: pjd
Sponsored by: IronPort Systems
MFC: 2 months
- First configured key is based only on keyfile (no passphrase).
- Device is attached.
- User changes first key (setkey) from keyfile to passphrase and doesn't
specify number of iterations (with -i option).
...geli(8) won't store calculated number of iterations in metadata.
This result in device beeing unaccesable after detach.
One can recover from this situation by guessing number of iterations
generated, storing it in metadata and trying to attach device.
Recovery procedure isn't nice, but one's data is not lost.
Reported by: Thomas Nickl <T.Nickl@gmx.net>
MFC after: 1 week
gmirror and graid3 in a way that it is not resynchronized after a
power failure or system crash.
It is safe when gjournal is running on top of gmirror/graid3.
read requests to its consumer. It has been developed to address
the problem of a horrible read performance of a 64k blocksize FS
residing on a RAID3 array with 8 data components, where a single
disk component would only get 8k read requests, thus effectively
killing disk performance under high load. Documentation will be
provided later. I'd like to thank Vsevolod Lobko for his bright
ideas, and Pawel Jakub Dawidek for helping me fix the nasty bug.
- Print proper error message when argument is specified twice.
Before the change it was detected properly, because of how
G_OPT_DONE() macro worked.
- Use err(3) functions where appropriate.
- Add some assertions.
- Bump version number, because G_TYPE_BOOL addition breaks API and ABI.
Changes: 98721,98722,98723,101360,106985
- after killing all attached providers, all providers are then detached
and operation is repeated for those who were attached,
- we don't want to remove keys for read-only attached providers, we only
want to detach them.
MFC after: 1 week
Now, encryption algorithm is given using '-e' option, not '-a'.
The '-a' option is now used to specify authentication algorithm.
Supported by: Wheel Sp. z o.o. (http://www.wheel.pl)
supported for a moment.
- Don't allow to use -i when no passphrase is given. Now if iterations is
equal to -1 (not set), we know that we should not ask for the passphrase
on boot.
It still doesn't handle situation when one key is protected with
passphrase and the other is not. There is no quick fix for this.
The complete solution will be to make number of iterations a per-key
value. Because this need metadata format change and is only needed for
devices attached on boot, I'll leave it as it is for now.
MFC after: 3 days
- number of read I/O requests,
- number of write I/O requests,
- number of read bytes,
- number of written bytes.
Add 'reset' subcommand for resetting statistics.
value (intmax_t) and boolean (int).
Based on that provide three functions:
- gctl_get_ascii()
- gctl_get_int()
- gctl_get_intmax()
- Hide gctl_get_param() function, as it is only used internally in
subr.c.
- Allow to provide argument name as (fmt, ...).
- Assert geom(8) bugs (missing argument is a geom(8) bug).
- Clean-up and simplify the code by using new functions and assumtions
(no more checking for missing argument).
Tested by: regression tests
ignore "no such file" errors only, which I wanted to do.
Because of this I ignored all other errors on dlopen(3) failure as well,
which isn't good.
Fix this situation by calling access(2) on library file first and ignore
only ENOENT error. This allows to report all the rest of dlopen(3) errors.
MFC after: 3 days
metadata is equal to -1. if we then wanted to attach provider (or change
keys) and forget about '-p' flag it failed on assertion (quite ok, without
assertion it could call PKCS#5v2 with 4294967295 iterations).
Instead of failing on assertion, remind about '-p' flag.
MFC after: 3 days
usage for a subcommand, so no 'usage' function has to be implemented
in class library.
- Bump version number as it breaks ABI, but don't provide backward
compatibility, because there are probably no external consumers of this
geom(8).
This allows to print more precise usage for standard commands and simplify
class libraries a bit.
MFC after: 1 week
warning on 64-bit platforms. Explicitly cast these values to int
to work around this issue, as these values are tend to be small.
Spotted by: ia64 tinderbox
providers.
This prevents from listing geoms like <name>.sync which can be confusing.
It still allows to show details about it by giving its name when listing.
MFC after: 1 week
shared-last-sector problem.
After this change, even if there is more than one provider with the same
last sector, the proper one will be chosen based on its size.
It still doesn't fix the 'c' partition problem (when da0s1 can be confused
with da0s1c) and situation when 'a' partition starts at offset 0
(then da0s1a can be confused with da0s1 and da0s1c). One can use '-h'
option there, when creating device or avoid sharing last sector.
Actually, when providers share the same last sector and their size is equal,
they provide exactly the same data, so the name (da0s1, da0s1a, da0s1c)
isn't important at all.
- Provide backward compatibility.
- Update copyright's year.
MFC after: 1 week
the given providers. Without even one of the configured components there
should be no way to get the secret.
Supported by: WHEEL Sp. z o.o.
http://www.wheel.pl
After this change, when component is disconnected because of an I/O error,
it will not be connected and synchronized automatically, it will be logged
as broken and skipped. Autosynchronization can occur, when component is
disconnected (on orphan event) and connected again - there were no I/O
error, so there is no need to not connected the component, but when there were
writes while it wasn't connected, it will be synchronized.
This fix cases, when component is disconnected because of I/O error and can be
connected again and again.
- Bump version number.
- Implement backward compatibility mechanism. After this change when metadata in
old version is detected, it is automatically upgraded to the new (current)
version.
After this change, when component is disconnected because of an I/O error,
it will not be connected and synchronized automatically, it will be logged
as broken and skipped. Autosynchronization can occur, when component is
disconnected (on orphan event) and connected again - there were no I/O
error, so there is no need to not connected the component, but when there were
writes while it wasn't connected, it will be synchronized.
This fix cases, when component is disconnected because of I/O error and can be
connected again and again.
- Bump version number.
- Add version change history.
- Implement backward compatibility mechanism. After this change when metadata in
old version is detected, it is automatically upgraded to the new (current)
version.
verification of regular data when device is in complete state.
On verification error, EIO error is returned for the bio and sysctl
kern.geom.raid3.stat.parity_mismatch is increased.
Suggested by: phk
as well, even if device is in complete state.
I observe 40% of speed-up with this option for random read operations,
but slowdown for sequential reads.
Basically, without this option reading from a RAID3 device built from 5
components (c0-c4) looks like this:
Request no. Used components
1 c0+c1+c2+c3
2 c0+c1+c2+c3
3 c0+c1+c2+c3
With the new feature:
Request no. Used components
1 c0+c1+c2+c3
2 (c1^c2^c3^c4)+c1+c2+c3
3 c0+(c0^c2^c3^c4)+c2+c3
4 c0+c1+(c0^c1^c3^c4)+c3
5 c0+c1+c2+(c0^c1^c2^c4)
6 c0+c1+c2+c3
[...]
It allows to fix problems when last provider's sector is shared between few
providers.
- Bump version number for CONCAT and STRIPE and add code for backward
compatibility.
- Do not bump version number of MIRROR, as it wasn't officially introduced yet.
Even if someone started to play with it, there is no big deal, because
wrong MD5 sum of metadata will deny those providers.
- Update manual pages.
- Add version history to g_(stripe|concat).h files.
new problem shows up: symblic links (<libname>.so) are created under
/usr/lib/ now, instead of under /lib/geom/ where geom(8) looks for them.
Introduce a workaround to fix this by teaching geom(8) to open libraries
via /lib/geom/<libname>.so.<major_number> instead of /lib/geom/<libname>.so.
features. The gmirror(8) utility should be used for control of this class.
There is no manual page yet, but I'm working on it with keramida@.
Many useful tests provided by: simon (thank you!)
Some ideas from: scottl, simon, phk
provider.
- Bump version number.
This allows for a quite interesting trick. One can setup a stripe with
stripe size of 512 bytes and create transparent provider on top of it
with sector size equal to <ndisks> * 512. The result will be something
like RAID3 without parity disk (every access will touch all disks).