the underlying drive had been hot-unplugged from the system. Here
is a specific example. Filesystem code had opened /dev/da1s1e.
Subsequently, the drive was hot-unplugged. This (correctly) caused
all of the associated /dev/da1* entries to be deleted. When the
filesystem later realized that the drive was gone it closed the
device, reducing the write-access counts to 0 on the geom providers
for da1s1e, da1s1, and da1. This caused geom to re-taste the
providers, resulting in the devices being created again. When the
drive was hot-plugged back in, it resulted in duplicate /dev entries
for da1s1e, da1s1, and da1.
This fix adds a new disk_gone() function which is called by CAM when a
drive goes away. It orphans all of the providers associated with the
drive, setting an error condition of ENXIO in each one. In addition,
we prevent a re-taste on last close for writing if an error condition
has been set in the provider.
Sponsored by: Isilon Systems
Reviewed by: phk
MFC after: 1 week
verbs. Only the create verb operates on a provider. All other verbs
operate on a GPT geom. Also, the GPT entry oriented verbs require
a non-downgraded GPT.
o Have all verbs take an optional flags parameter. The flags parameter
is a string of single-letter flags. The typical use of these flags
is to enable certain behaviour in support fo the gpt(8) tool.
o Add dummy implementations for the destroy and recover verbs.
This change causes test 2 of the GPT regression test suite to fail.
The presence of a geom parameter is now required even for unknown
verbs.
MD class. Previously only the DISK class was dumped. The only
consumer of this sysctl is libdisk (i.e. sysinstall) and it tests
explicitly for instances of the DISK class. Dumping other classes
is therefore harmless.
By also dumping the MD class regression tests can be written that
use the MD class for operations that would normally be done on the
DISK class. The sysctl can now be used to test if those operations
took an effect. An example is partitioning.
- Prefer '_' to ' ', as it results in more easily parsed results in
memory monitoring tools such as vmstat.
- Remove punctuation that is incompatible with using memory type names
as file names, such as '/' characters.
- Disambiguate some collisions by adding subsystem prefixes to some
memory types.
- Generally prefer lower case to upper case.
- If the same type is defined in multiple architecture directories,
attempt to use the same name in additional cases.
Not all instances were caught in this change, so more work is required to
finish this conversion. Similar changes are required for UMA zone names.
memory for request.
I was sure graid3 should handle such situations well, but green@ reported
it is not and we want to fix it before 6.0.
Submitted by: green
up. This make iostat report operations passed down to the device driver
instead of operations passed down to GEOM disk. The transfer size limit
imposed by the device driver is no longer hidden, improving the correlation
between iostat output and device driver workload.
requests. The following features have been added:
1. Extensive checking and validation of both the primary and
secondary headers to protect against corrupted data and to
take advantage of the redundancy to allow the GPT to be
used in the face of recoverable corruption.
2. Dynamic data-structures to avoid hardcoding gratuitous
table limits so as to support the creation of GPT tables
of (as of yet) unspecified size.
3. Only allow kernel dumps to swap partitions to provide the
necessary anti-footshooting measures. Linux swap partitions
are allowed.
4. Complete dump of the GPT configuration, including labels.
5. Supports Byte Order Mark (U+FEFF) handling for big-endian,
little-endian and mixed-endian partition names.
state where sleeping on a sleep queue is not allowed. The facility
doesn't support recursion but uses a simple private per-thread flag
(TDP_NOSLEEPING). The sleepq_add() function will panic if the flag is
set and INVARIANTS is enabled.
- Use this new facility to replace the g_xup and g_xdown mutexes that were
(ab)used to achieve similar behavior.
- Disallow sleeping in interrupt threads when invoking interrupt handlers.
MFC after: 1 week
Reviewed by: phk
it is destroyed in GEOM, in addition to being removed from /dev.
Before this patch, if you applied a new MBR which deleted a slice,
the deleted slice would not be in /dev, but it would still appear
in kern.geom.conftxt and kern.geom.confxml, which would confused
the diskPartitionEditor in sysinstall.
Submitted by: pjd
Tested by: pjd, rodrigc
MFC after: 1 week
waiting for geom events to happen:
Instead of maintaining a count of outstanding events, simply look if
the queue is empty. Make sure to not remove events from the queue
until they are executed in order to not open a new race.
Much work by: pjd
Tested by: kris
MT6: yes, should be.
This way, the VINUMDRIVE class is loaded before the VINUM class,
but since geom does the tasting for newly arrived classes
last-in-first-out, the VINUM class tastes first.
This removes the need to call gv_parse_config() in the drive
taste path.
sizeof(struct g_eli_metadata) will return the exact number of bytes needed
for storing it on the disk.
Without this change GELI was unusable on amd64 (and probably other 64-bit
archs), because sizeof(struct g_eli_metadata) was greater than 512 bytes
and geli(8) was failing on assertion.
Reported by: Michael Reifenberger <mike@Reifenberger.com>
MFC after: 3 days
When a drive is newly created, it's state is initially set to 'down',
so it won't allow saving the config to it (thus it will never know of
itself being created). Work around this by adding a new flag, that's
also checked when saving the config to a drive.
Actually, one cannot setup root file system on RAID3 device, but when
other file system exist in /etc/fstab which are placed on RAID3 device,
boot process will be interrupted when these devices are missing.
MFC after: 3 days
X-MFC-note: MFC only to RELENG_6, as RELENG_5 doesn't have root_mount KPI.
the assumption that performance was more important that beancounter
quality statistics.
As it transpires the microoptimization is not measurable in the
real world and the inconsistent statistics confuse users, so revert
the decision.
MT6 candidate: possibly
MT5 candidate: possibly
It creates very huge provider (41PB) /dev/gzero.
On BIO_READ request it zero-fills bio_data and on BIO_WRITE it does nothing.
You can also set kern.geom.zero.clear sysctl to 0 to do nothing even for
BIO_READ.
I'm using it for performance testing where it is very helpful.
MFC after: 3 days
post an event to the geom event queue that will take care of it,
letting outstanding bios finish, and closing the consumers.
Plus some cosmetic clean ups.
resides on. Fix the special case of the filesystem fragment size not
evenly dividing the size of the provider. Fixing the general case
probably requires better superblock validation (left as an exercise to
the reader).
While we wait for holds to be released, print a list of who holds us
back once per second.
Use the new KPI from GEOM instead of vfs_mount.c calling g_waitidle().
Use the new KPI also from ata.
With ATAmkIII's newbusification, ata could narrowly miss the window
and ad0 would not exist when we tried to mount root.
in BSD and MBR classes, ie. if provider below us uses the same metadata,
don't create labels based on the metadata.
This allows to create labels on geoms with rank != 1 without hacks.
Tested by: Chris Elsworth <chris@shagged.org> on sparc64
OK'ed by: phk
MFC after: 2 weeks
completed I/O requests here.
- First allocate all needed bios, so if any of allocations fail, we can
free memory before sending any I/O requests down.
Reported by: Pawel Malachowski
MFC after: 3 days
seem to be necessary anymore, and it prevents tasting a valid drive
when booting with geom_vinum already loaded, since SCSI disks set their
sectorsize not until first opening them.
shared-last-sector problem.
After this change, even if there is more than one provider with the same
last sector, the proper one will be chosen based on its size.
It still doesn't fix the 'c' partition problem (when da0s1 can be confused
with da0s1c) and situation when 'a' partition starts at offset 0
(then da0s1a can be confused with da0s1 and da0s1c). One can use '-h'
option there, when creating device or avoid sharing last sector.
Actually, when providers share the same last sector and their size is equal,
they provide exactly the same data, so the name (da0s1, da0s1a, da0s1c)
isn't important at all.
- Provide backward compatibility.
- Update copyright's year.
MFC after: 1 week
the previous one failed and there are more than one plex in the volume.
This could have led to a flood of error messages on the console and
probably a deadlock in certain situations.
patch from kan@).
Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on
close. This is not yet a generally safe function, but for this very
specific use it is safe. This solves the problem with buffers not
being flushed by unmount or after failed mount attempts.