Add UUIDs for DragonFlyBSD's partition types.
MFC r267356:
Add DragonFlyBSD's Hammer FS types and type names.
MFC r267357:
Add aliases for DragonFlyBSD's partition types.
MFC r267358:
Allow dumping to DragonFlyBSD's swap partition.
MFC r267359:
Add disklabel64 support to GEOM_PART class.
This partitioning scheme is used in DragonFlyBSD. It is similar to
BSD disklabel, but has the following improvements:
* metadata has own dedicated place and isn't accessible through partitions;
* all offsets are 64-bit;
* supports 16 partitions by default (has reserved place for more);
* has reserved place for backup label (but not yet implemented);
* has UUIDs for partitions and partition types;
MFC r267360:
Add disklabel64 support
Relnotes: yes
r265194, r265197
r260522:
Add the manual page for geom_uncompress(4).
r260523:
Build the geom_uncompress(4) module by default.
Fix geom_uncompress(4) module loading. Don't link zlib.c (which is a module
itself) directly.
r261439:
Remove some unnecessary code. The offsets read from the first block are
overwritten a few lines bellow.
r261440:
Fix a logic error. Because of this inflateReset() wasn't being called and
the output buffer wasn't being cleared between the inflate() calls,
producing zeroed output after the first inflate() call.
This fixes the read of mkuzip(8) images with geom_uncompress(4).
r261586:
Fix the build with DEBUG enabled. Where possible, fix style(9) issues.
r264504:
Make sure not to do I/O for more than MAXPHYS bytes. Doing so can cause
problems in our providers, such as a KASSERT in md(4). We can initiate
I/O for more than MAXPHYS bytes if we've been given a BIO for MAXPHYS
bytes, the blocks from which we're reading couldn't be compressed and
we had compression in preceeding blocks resulting in misalignment of
the blocks we're trying to read relative to the sector. We're forced to
round up the I/O length to make it an multiple of the sector size.
When we detect the condition, we'll reduce the block count and perform
a "short" read. In g_uzip_done() we need to consider the original I/O
length and stop early if we're about to deflate a block that we didn't
read. By using bio_completed in the cloned BIO and not bio_length to
check for this, we automatically and gracefully handle short reads that
our providers may be doing on top of the short reads we may initiate
ourselves.
r264769:
Keep geom_uncompress(4) in line with geom_uzip(4), bring in the r264504 fix.
Make sure not to start I/O bigger than MAXPHYS bytes.
r265193:
Some style and whitespace fixes. Reduce the difference between geom_uzip(4)
and geom_uncompress(4). Now, they produce an almost clean diff(1) output.
Remove a duplicated variable from g_uncompress.c and an unnecessary header
from g_uzip.c.
r265194:
Actually the FEATURE() macro is defined on sys/sysctl.h.
r265197:
Fix a leak in g_uzip_taste(). After retrieve all the block offsets from
the uzip image, free the last data read.
GIANT from VFS. This code is particulary broken and fragile and other
in-kernel implementations around, found in other operating systems,
don't really seem clean and solid enough to be imported at all.
If someone wants to reconsider in-kernel NTFS implementation for
inclusion again, a fair effort for completely fixing and cleaning it
up is expected.
In the while NTFS regular users can use FUSE interface and ntfs-3g
port to work with their NTFS partitions.
This is not targeted for MFC.
defined by the SNIA Common RAID Disk Data Format Specification v2.0.
Supports multiple volumes per array and multiple partitions per disk.
Supports standard big-endian and Adaptec's little-endian byte ordering.
Supports all single-layer RAID levels. Dual-layer RAID levels except
RAID10 are not supported now because of GEOM RAID design limitations.
Some work is still to be done, but the present code already manages basic
interoperation with RAID BIOS of the Adaptec 1430SA SATA RAID controller.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
create reasonably large cache for the keys that is filled when
needed. The previous version was problematic for very large providers
(hundreds of terabytes or serval petabytes). Every terabyte of data
needs around 256kB for keys. Make the default cache limit big enough
to fit all the keys needed for 4TB providers, which will eat at most
1MB of memory.
MFC after: 2 weeks
Add new RAID GEOM class, that is going to replace ataraid(4) in supporting
various BIOS-based software RAIDs. Unlike ataraid(4) this implementation
does not depend on legacy ata(4) subsystem and can be used with any disk
drivers, including new CAM-based ones (ahci(4), siis(4), mvs(4), ata(4)
with `options ATA_CAM`). To make code more readable and extensible, this
implementation follows modular design, including core part and two sets
of modules, implementing support for different metadata formats and RAID
levels.
Support for such popular metadata formats is now implemented:
Intel, JMicron, NVIDIA, Promise (also used by AMD/ATI) and SiliconImage.
Such RAID levels are now supported:
RAID0, RAID1, RAID1E, RAID10, SINGLE, CONCAT.
For any all of these RAID levels and metadata formats this class supports
full cycle of volume operations: reading, writing, creation, deletion,
disk removal and insertion, rebuilding, dirty shutdown detection
and resynchronization, bad sector recovery, faulty disks tracking,
hot-spare disks. For Intel and Promise formats there is support multiple
volumes per disk set.
Look graid(8) manual page for additional details.
Co-authored by: imp
Sponsored by: Cisco Systems, Inc. and iXsystems, Inc.
in a device independent manner. Also include an example anticipatory
scheduler, gsched_rr, which gives very nice performance improvements
in presence of competing random access patterns.
This is joint work with Fabio Checconi, developed last year
and presented at BSDCan 2009. You can find details in the
README file or at
http://info.iet.unipi.it/~luigi/geom_sched/
Note that due to e.g. write throttling ('wdrain'), it can stall all the disk
I/O instead of just the device it's configured for. Using it for removable
media is therefore not a good idea.
Reviewed by: pjd (earlier version)
The work have been under testing and fixing since then, and it is mature enough
to be put into HEAD for further testing.
A lot have changed in this time, and here are the most important:
- Gvinum now uses one single workerthread instead of one thread for each
volume and each plex. The reason for this is that the previous scheme was
very complex, and was the cause of many of the bugs discovered in gvinum.
Instead, gvinum now uses one worker thread with an event queue, quite
similar to what used in gmirror.
- The rebuild/grow/initialize/parity check routines no longer runs in
separate threads, but are run as regular I/O requests with special flags.
This made it easier to support mounted growing and parity rebuild.
- Support for growing striped and raid5-plexes, meaning that one can extend the
volumes for these plex types in addition to the concat type. Also works while
the volume is mounted.
- Implementation of many of the missing commands from the old vinum:
attach/detach, start (was partially implemented), stop (was partially
implemented), concat, mirror, stripe, raid5 (shortcuts for creating volumes
with one plex of these organizations).
- The parity check and rebuild no longer goes between userland/kernel, meaning
that the gvinum command will not stay and wait forever for the rebuild to
finish. You can instead watch the status with the list command.
- Many problems with gvinum have been reported since 5.x, and some has been hard
to fix due to the complicated architecture. Hopefully, it should be more
stable and better handle edge cases that previously made gvinum crash.
- Failed drives no longer disappears entirely, but now leave behind a dummy
drive that makes sure the original state is not forgotten in case the system
is rebooted between drive failures/swaps.
- Update manpage to reflect new commands and extend it with some examples.
Sponsored by: Google Summer of Code 2007
Mentored by: le
Tested by: Rick C. Petty <rick-freebsd2008 -at- kiwi-computer.com>
found inside extended partitions and used to create logical partitions.
At this time write/modify support is not (yet) present.
The EBR and MBR schemes both check the parent scheme. The MBR will
back-off when nested under another MBR, whereas the EBR only nests
under a MBR.
providers with limited physical storage and add physical storage as
needed.
Submitted by: Ivan Voras
Sponsored by: Google Summer of Code 2006
Approved by: re (kensmith)
arrangement that has no intrinsic internal knowledge of whether devices
it is given are truly multipath devices. As such, this is a simplistic
approach, but still a useful one.
The basic approach is to (at present- this will change soon) use camcontrol
to find likely identical devices and and label the trailing sector of the
first one. This label contains both a full UUID and a name. The name is
what is presented in /dev/multipath, but the UUID is used as a true
distinguishor at g_taste time, thus making sure we don't have chaos
on a shared SAN where everyone names their data multipath as "Fred".
The first of N identical devices (and N *may* be 1!) becomes the active
path until a BIO request is failed with EIO or ENXIO. When this occurs,
the active disk is ripped away and the next in a list is picked to
(retry and) continue with.
During g_taste events new disks that meet the match criteria for existing
multipath geoms get added to the tail end of the list.
Thus, this active/passive setup actually does work for devices which
go away and come back, as do (now) mpt(4) and isp(4) SAN based disks.
There is still a lot to do to improve this- like about 5 of the 12
recommendations I've received about it, but it's been functional enough
for a while that it deserves a broader test base.
Reviewed by: pjd
Sponsored by: IronPort Systems
MFC: 2 months
read requests to its consumer. It has been developed to address
the problem of a horrible read performance of a 64k blocksize FS
residing on a RAID3 array with 8 data components, where a single
disk component would only get 8k read requests, thus effectively
killing disk performance under high load. Documentation will be
provided later. I'd like to thank Vsevolod Lobko for his bright
ideas, and Pawel Jakub Dawidek for helping me fix the nasty bug.
use a different mechanism for setting warning flags, and using
WARNS here only has null or negative effects.
Submitted by: bde (I think it means "submitted")
the geom creation to a seperate init function and ignore the tasting.
The config is now parsed only in the vinumdrive geom, which hopefully
fixes the problem, that the drive class tasted before the vinum class
had a chance, for good.
Also restore the behaviour that the module can be loaded at boot time
and on a running system.
Add functions to rename objects and to move a subdisk from one drive
to another.
Obtained from: Chris Jones <chris.jones@ualberta.ca>
Sponsored by: Google Summer of Code 2005
MFC in: 1 week
This way, the VINUMDRIVE class is loaded before the VINUM class,
but since geom does the tasting for newly arrived classes
last-in-first-out, the VINUM class tastes first.
This removes the need to call gv_parse_config() in the drive
taste path.
It creates very huge provider (41PB) /dev/gzero.
On BIO_READ request it zero-fills bio_data and on BIO_WRITE it does nothing.
You can also set kern.geom.zero.clear sysctl to 0 to do nothing even for
BIO_READ.
I'm using it for performance testing where it is very helpful.
MFC after: 3 days
the given providers. Without even one of the configured components there
should be no way to get the secret.
Supported by: WHEEL Sp. z o.o.
http://www.wheel.pl