arrangement that has no intrinsic internal knowledge of whether devices
it is given are truly multipath devices. As such, this is a simplistic
approach, but still a useful one.
The basic approach is to (at present- this will change soon) use camcontrol
to find likely identical devices and and label the trailing sector of the
first one. This label contains both a full UUID and a name. The name is
what is presented in /dev/multipath, but the UUID is used as a true
distinguishor at g_taste time, thus making sure we don't have chaos
on a shared SAN where everyone names their data multipath as "Fred".
The first of N identical devices (and N *may* be 1!) becomes the active
path until a BIO request is failed with EIO or ENXIO. When this occurs,
the active disk is ripped away and the next in a list is picked to
(retry and) continue with.
During g_taste events new disks that meet the match criteria for existing
multipath geoms get added to the tail end of the list.
Thus, this active/passive setup actually does work for devices which
go away and come back, as do (now) mpt(4) and isp(4) SAN based disks.
There is still a lot to do to improve this- like about 5 of the 12
recommendations I've received about it, but it's been functional enough
for a while that it deserves a broader test base.
Reviewed by: pjd
Sponsored by: IronPort Systems
MFC: 2 months
- First configured key is based only on keyfile (no passphrase).
- Device is attached.
- User changes first key (setkey) from keyfile to passphrase and doesn't
specify number of iterations (with -i option).
...geli(8) won't store calculated number of iterations in metadata.
This result in device beeing unaccesable after detach.
One can recover from this situation by guessing number of iterations
generated, storing it in metadata and trying to attach device.
Recovery procedure isn't nice, but one's data is not lost.
Reported by: Thomas Nickl <T.Nickl@gmx.net>
MFC after: 1 week
gmirror and graid3 in a way that it is not resynchronized after a
power failure or system crash.
It is safe when gjournal is running on top of gmirror/graid3.
read requests to its consumer. It has been developed to address
the problem of a horrible read performance of a 64k blocksize FS
residing on a RAID3 array with 8 data components, where a single
disk component would only get 8k read requests, thus effectively
killing disk performance under high load. Documentation will be
provided later. I'd like to thank Vsevolod Lobko for his bright
ideas, and Pawel Jakub Dawidek for helping me fix the nasty bug.
- Print proper error message when argument is specified twice.
Before the change it was detected properly, because of how
G_OPT_DONE() macro worked.
- Use err(3) functions where appropriate.
- Add some assertions.
- Bump version number, because G_TYPE_BOOL addition breaks API and ABI.
Changes: 98721,98722,98723,101360,106985
- after killing all attached providers, all providers are then detached
and operation is repeated for those who were attached,
- we don't want to remove keys for read-only attached providers, we only
want to detach them.
MFC after: 1 week
Now, encryption algorithm is given using '-e' option, not '-a'.
The '-a' option is now used to specify authentication algorithm.
Supported by: Wheel Sp. z o.o. (http://www.wheel.pl)
supported for a moment.
- Don't allow to use -i when no passphrase is given. Now if iterations is
equal to -1 (not set), we know that we should not ask for the passphrase
on boot.
It still doesn't handle situation when one key is protected with
passphrase and the other is not. There is no quick fix for this.
The complete solution will be to make number of iterations a per-key
value. Because this need metadata format change and is only needed for
devices attached on boot, I'll leave it as it is for now.
MFC after: 3 days
- number of read I/O requests,
- number of write I/O requests,
- number of read bytes,
- number of written bytes.
Add 'reset' subcommand for resetting statistics.
value (intmax_t) and boolean (int).
Based on that provide three functions:
- gctl_get_ascii()
- gctl_get_int()
- gctl_get_intmax()
- Hide gctl_get_param() function, as it is only used internally in
subr.c.
- Allow to provide argument name as (fmt, ...).
- Assert geom(8) bugs (missing argument is a geom(8) bug).
- Clean-up and simplify the code by using new functions and assumtions
(no more checking for missing argument).
Tested by: regression tests
ignore "no such file" errors only, which I wanted to do.
Because of this I ignored all other errors on dlopen(3) failure as well,
which isn't good.
Fix this situation by calling access(2) on library file first and ignore
only ENOENT error. This allows to report all the rest of dlopen(3) errors.
MFC after: 3 days
metadata is equal to -1. if we then wanted to attach provider (or change
keys) and forget about '-p' flag it failed on assertion (quite ok, without
assertion it could call PKCS#5v2 with 4294967295 iterations).
Instead of failing on assertion, remind about '-p' flag.
MFC after: 3 days
usage for a subcommand, so no 'usage' function has to be implemented
in class library.
- Bump version number as it breaks ABI, but don't provide backward
compatibility, because there are probably no external consumers of this
geom(8).
This allows to print more precise usage for standard commands and simplify
class libraries a bit.
MFC after: 1 week
warning on 64-bit platforms. Explicitly cast these values to int
to work around this issue, as these values are tend to be small.
Spotted by: ia64 tinderbox
providers.
This prevents from listing geoms like <name>.sync which can be confusing.
It still allows to show details about it by giving its name when listing.
MFC after: 1 week
shared-last-sector problem.
After this change, even if there is more than one provider with the same
last sector, the proper one will be chosen based on its size.
It still doesn't fix the 'c' partition problem (when da0s1 can be confused
with da0s1c) and situation when 'a' partition starts at offset 0
(then da0s1a can be confused with da0s1 and da0s1c). One can use '-h'
option there, when creating device or avoid sharing last sector.
Actually, when providers share the same last sector and their size is equal,
they provide exactly the same data, so the name (da0s1, da0s1a, da0s1c)
isn't important at all.
- Provide backward compatibility.
- Update copyright's year.
MFC after: 1 week
the given providers. Without even one of the configured components there
should be no way to get the secret.
Supported by: WHEEL Sp. z o.o.
http://www.wheel.pl
After this change, when component is disconnected because of an I/O error,
it will not be connected and synchronized automatically, it will be logged
as broken and skipped. Autosynchronization can occur, when component is
disconnected (on orphan event) and connected again - there were no I/O
error, so there is no need to not connected the component, but when there were
writes while it wasn't connected, it will be synchronized.
This fix cases, when component is disconnected because of I/O error and can be
connected again and again.
- Bump version number.
- Implement backward compatibility mechanism. After this change when metadata in
old version is detected, it is automatically upgraded to the new (current)
version.
After this change, when component is disconnected because of an I/O error,
it will not be connected and synchronized automatically, it will be logged
as broken and skipped. Autosynchronization can occur, when component is
disconnected (on orphan event) and connected again - there were no I/O
error, so there is no need to not connected the component, but when there were
writes while it wasn't connected, it will be synchronized.
This fix cases, when component is disconnected because of I/O error and can be
connected again and again.
- Bump version number.
- Add version change history.
- Implement backward compatibility mechanism. After this change when metadata in
old version is detected, it is automatically upgraded to the new (current)
version.
verification of regular data when device is in complete state.
On verification error, EIO error is returned for the bio and sysctl
kern.geom.raid3.stat.parity_mismatch is increased.
Suggested by: phk
as well, even if device is in complete state.
I observe 40% of speed-up with this option for random read operations,
but slowdown for sequential reads.
Basically, without this option reading from a RAID3 device built from 5
components (c0-c4) looks like this:
Request no. Used components
1 c0+c1+c2+c3
2 c0+c1+c2+c3
3 c0+c1+c2+c3
With the new feature:
Request no. Used components
1 c0+c1+c2+c3
2 (c1^c2^c3^c4)+c1+c2+c3
3 c0+(c0^c2^c3^c4)+c2+c3
4 c0+c1+(c0^c1^c3^c4)+c3
5 c0+c1+c2+(c0^c1^c2^c4)
6 c0+c1+c2+c3
[...]
It allows to fix problems when last provider's sector is shared between few
providers.
- Bump version number for CONCAT and STRIPE and add code for backward
compatibility.
- Do not bump version number of MIRROR, as it wasn't officially introduced yet.
Even if someone started to play with it, there is no big deal, because
wrong MD5 sum of metadata will deny those providers.
- Update manual pages.
- Add version history to g_(stripe|concat).h files.
new problem shows up: symblic links (<libname>.so) are created under
/usr/lib/ now, instead of under /lib/geom/ where geom(8) looks for them.
Introduce a workaround to fix this by teaching geom(8) to open libraries
via /lib/geom/<libname>.so.<major_number> instead of /lib/geom/<libname>.so.
features. The gmirror(8) utility should be used for control of this class.
There is no manual page yet, but I'm working on it with keramida@.
Many useful tests provided by: simon (thank you!)
Some ideas from: scottl, simon, phk
provider.
- Bump version number.
This allows for a quite interesting trick. One can setup a stripe with
stripe size of 512 bytes and create transparent provider on top of it
with sector size equal to <ndisks> * 512. The result will be something
like RAID3 without parity disk (every access will touch all disks).
This class is used for detecting volume labels on file systems:
UFS, MSDOSFS (FAT12, FAT16, FAT32) and ISO9660.
It also provide native labelization (there is no need for file system).
g_label_ufs.c is based on geom_vol_ffs from Gordon Tetlow.
g_label_msdos.c and g_label_iso9660.c are probably hacks, I just found
where volume labels are stored and I use those offsets here,
but with this class it should be easy to do it as it should be done by
someone who know how.
Implementing volume labels detection for other file systems also should
be trivial.
New providers are created in those directories:
/dev/ufs/ (UFS1, UFS2)
/dev/msdosfs/ (FAT12, FAT16, FAT32)
/dev/iso9660/ (ISO9660)
/dev/label/ (native labels, configured with glabel(8))
Manual page cleanups and some comments inside were submitted by
Simon L. Nielsen, who was, as always, very helpful. Thanks!
- g_lcm() - calculates Least Common Multiple of two given values,
it is helpful when we need to find sector size for provider
which is based on disks with different sector size;
- g_get_mediasize() - returns media size of given provider;
- g_get_sectorsize() - returns sector size of given provider;
Those function aren't used now, but are used by geom_mirror which will be
committed soon.
GEOM classes. CONCAT should be 100% compatible with existing gconcat(8)
utility, which is going to be removed.
Supported by: Wheel - Open Technologies - http://www.wheel.pl
GEOM classes. It works by loading a shared library via dlopen(3) mechanism
with class-specific code, it is also responsible for communicating with
GEOM via libgeom(3).
Per-class shared libraries are going to be stored in /lib/geom/ directory.
It provides also few standard commands like 'list', 'load' and 'unload'
for existing classes which aren't aware of geom(8).
More info will be send on freebsd-current@ mailing list.
Supported by: Wheel - Open Technologies - http://www.wheel.pl