Commit Graph

1199 Commits

Author SHA1 Message Date
pjd
8848f7ef86 Fix build breakage by fixing typo.
Reported by:	glebius
2005-12-09 11:38:02 +00:00
pjd
3c3f39be74 - Allow to specify the byte which will be used for filling read buffer.
- Improve sysctl description a bit.

Submitted by:	Ivan Voras <ivoras@gmail.com>
2005-12-08 23:06:59 +00:00
pjd
7ea810fefd Teach NOP GEOM class how to gather the following statistics:
- number of read I/O requests,
- number of write I/O requests,
- number of read bytes,
- number of written bytes.
Add 'reset' subcommand for resetting statistics.
2005-12-08 23:00:31 +00:00
sobomax
44ba92fd8a It is unclear who is wrong and who is right, but when operating on
plain file bsdlabel(8) always writes label at a fixed offset from
its beginning (512 bytes), regardless of the sector size. At the same
time, bsdlabel geom class expects label to be available at the very
beginning of the second sector.

As a result, images prepared in userland for media with sector size
different from 512 bytes (i.e. 2k for cdroms) are not recognized by
the tasting mechanism.

Solve the problem by always looking for the label at 512-byte offset
if we can't find it at the beginning of the second sector and sector
size is not 512 bytes.
2005-11-30 22:54:41 +00:00
sobomax
16e3e466cc Don't pass error value pointer to g_read_data(9) at all if we don't
have any use of it.

Suggested by:	pjd
2005-11-30 22:15:00 +00:00
sobomax
29543921ea Check for g_read_data(9) errors properly:
o The only indication of error condition is NULL value returned by
  the function;

o value pointed to by error argument is undefined in the case when
  operation completes successfully.

Discussed with: phk
2005-11-30 19:24:51 +00:00
sobomax
937b629bd5 Kill leading whilespace. 2005-11-30 19:07:28 +00:00
pjd
fd9fb6b272 We do nothing with returned error value, so just remove it. 2005-11-29 12:07:10 +00:00
sobomax
ca7fa392b3 Check value returned by g_read_data(9), otherwise we can end in panic(9)
if read error happens.

MFC after:	1 week
2005-11-29 03:03:34 +00:00
le
885d3dc97f Add sysctl descriptions. 2005-11-25 10:09:30 +00:00
le
de257b1b7a Since we want a vinum geom created anytime the module loads, move
the geom creation to a seperate init function and ignore the tasting.

The config is now parsed only in the vinumdrive geom, which hopefully
fixes the problem, that the drive class tasted before the vinum class
had a chance, for good.

Also restore the behaviour that the module can be loaded at boot time
and on a running system.
2005-11-24 15:11:41 +00:00
le
00a607320a Whitespace. 2005-11-20 12:14:18 +00:00
le
d875cd85c6 Always declare variables at the start of the function.
Don't allocate potentially large variables on the stack.
Check strsep() return values when the string comes from userland.
Shorten variable names for lucidity's sake.

most of the stuff:
Pointed out by:    njl@
2005-11-20 12:12:31 +00:00
le
f2329c2aa6 Fix whitespace issue.
Pointed out by:   joel@
2005-11-20 10:40:06 +00:00
le
7f74b7e086 Finally bring in what was produced during Google SoC 2005:
Add functions to rename objects and to move a subdisk from one drive
to another.

Obtained from:  Chris Jones <chris.jones@ualberta.ca>
Sponsored by:   Google Summer of Code 2005
MFC in:         1 week
2005-11-19 20:25:18 +00:00
jdp
536960dbba Fix a bug that caused some /dev entries to continue to exist after
the underlying drive had been hot-unplugged from the system.  Here
is a specific example.  Filesystem code had opened /dev/da1s1e.
Subsequently, the drive was hot-unplugged.  This (correctly) caused
all of the associated /dev/da1* entries to be deleted.  When the
filesystem later realized that the drive was gone it closed the
device, reducing the write-access counts to 0 on the geom providers
for da1s1e, da1s1, and da1.  This caused geom to re-taste the
providers, resulting in the devices being created again.  When the
drive was hot-plugged back in, it resulted in duplicate /dev entries
for da1s1e, da1s1, and da1.

This fix adds a new disk_gone() function which is called by CAM when a
drive goes away.  It orphans all of the providers associated with the
drive, setting an error condition of ENXIO in each one.  In addition,
we prevent a re-taste on last close for writing if an error condition
has been set in the provider.

Sponsored by:   Isilon Systems
Reviewed by:    phk
MFC after:      1 week
2005-11-18 02:43:49 +00:00
marcel
59cb038f4a o Slightly refactor the ctlreq code to maximize code sharing between
verbs. Only the create verb operates on a provider. All other verbs
   operate on a GPT geom. Also, the GPT entry oriented verbs require
   a non-downgraded GPT.
o  Have all verbs take an optional flags parameter. The flags parameter
   is a string of single-letter flags. The typical use of these flags
   is to enable certain behaviour in support fo the gpt(8) tool.
o  Add dummy implementations for the destroy and recover verbs.

This change causes test 2 of the GPT regression test suite to fail.
The presence of a geom parameter is now required even for unknown
verbs.
2005-11-13 21:53:55 +00:00
marcel
8c0ef27b70 Make the kern.geom.conftxt sysctl more usable by also dumping the
MD class. Previously only the DISK class was dumped. The only
consumer of this sysctl is libdisk (i.e. sysinstall) and it tests
explicitly for instances of the DISK class. Dumping other classes
is therefore harmless.
By also dumping the MD class regression tests can be written that
use the MD class for operations that would normally be done on the
DISK class. The sysctl can now be used to test if those operations
took an effect. An example is partitioning.
2005-11-12 20:02:02 +00:00
rwatson
be4f357149 Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
pjd
38ae497c6a Fix possible live-lock under heavy load where we can't allocate more
memory for request.
I was sure graid3 should handle such situations well, but green@ reported
it is not and we want to fix it before 6.0.

Submitted by:	green
2005-10-28 20:25:02 +00:00
takawata
3224e8b29a Add checking for File record magic. 2005-10-26 03:24:28 +00:00
marcel
6c3e0035da Rough implementation of the create and add verbs. The verbs cause
in-memory changes only and as such are only useful for prototyping
and regression testing purposes.
2005-10-09 17:10:35 +00:00
tegge
dbcc5e2770 Move some devstat collection to below where large IO operations are chopped
up.  This make iostat report operations passed down to the device driver
instead of operations passed down to GEOM disk.  The transfer size limit
imposed by the device driver is no longer hidden, improving the correlation
between iostat output and device driver workload.
2005-09-30 17:32:08 +00:00
fjoe
20962cb412 - Fix "end_blk out of range" panic when INVARIANTS.
- Do not allow rw access.

Submitted by:	Dario Freni <saturnero at freesbie dot org>
MFC after:	3 days
2005-09-29 22:45:16 +00:00
marcel
c4edd4f5db o Don't cause a panic when the control request lacks a verb.
o  Don't set the error twice when the named class does not exist.
   It causes ioctl(2) to return with error EEXIST.
2005-09-18 23:54:40 +00:00
marcel
615ac2e4d5 Complete rewrite in preparation of adding support for control
requests. The following features have been added:
1. Extensive checking and validation of both the primary and
   secondary headers to protect against corrupted data and to
   take advantage of the redundancy to allow the GPT to be
   used in the face of recoverable corruption.
2. Dynamic data-structures to avoid hardcoding gratuitous
   table limits so as to support the creation of GPT tables
   of (as of yet) unspecified size.
3. Only allow kernel dumps to swap partitions to provide the
   necessary anti-footshooting measures. Linux swap partitions
   are allowed.
4. Complete dump of the GPT configuration, including labels.
5. Supports Byte Order Mark (U+FEFF) handling for big-endian,
   little-endian and mixed-endian partition names.
2005-09-17 07:05:17 +00:00
jhb
e535e11c9f - Add a new simple facility for marking the current thread as being in a
state where sleeping on a sleep queue is not allowed.  The facility
  doesn't support recursion but uses a simple private per-thread flag
  (TDP_NOSLEEPING).  The sleepq_add() function will panic if the flag is
  set and INVARIANTS is enabled.
- Use this new facility to replace the g_xup and g_xdown mutexes that were
  (ab)used to achieve similar behavior.
- Disallow sleeping in interrupt threads when invoking interrupt handlers.

MFC after:	1 week
Reviewed by:	phk
2005-09-15 19:05:37 +00:00
rodrigc
9348f83b49 Fix so that when a slice or a partition is removed through g_slice_config(),
it is destroyed in GEOM, in addition to being removed from /dev.
Before this patch, if you applied a new MBR which deleted a slice,
the deleted slice would not be in /dev, but it would still appear
in kern.geom.conftxt and kern.geom.confxml, which would confused
the diskPartitionEditor in sysinstall.

Submitted by:   pjd
Tested by:      pjd, rodrigc
MFC after:	1 week
2005-09-14 21:38:35 +00:00
pjd
4ca45af625 Fix copy&paste typo.
MFC after:	3 days
2005-09-10 07:46:47 +00:00
pjd
e1cf625a5f Don't forget to initialize crp_etype field.
Reported by:	Nick Evans <nevans@syphen.net>
MFC after:	3 days
2005-09-10 07:45:10 +00:00
le
a5dfb4378d Set the G_PF_WITHER flag on the subdisk provider that is about to
be destroyed.  That way the GEOM system handles all deallocations
and we don't have to do it ourselves.
2005-09-08 20:08:46 +00:00
phk
f24fdd9686 Remove a race condition that could result in processes being stuck
waiting for geom events to happen:

Instead of maintaining a count of outstanding events, simply look if
the queue is empty.  Make sure to not remove events from the queue
until they are executed in order to not open a new race.

Much work by:	pjd
Tested by:	kris
MT6:		yes, should be.
2005-09-04 19:14:19 +00:00
phk
7192488ee7 Typo. 2005-09-03 11:03:10 +00:00
pjd
aa258f8f85 Use KTR to log allocations and destructions of bios.
This should hopefully allow to track down "duplicate free of g_bio" panics.
2005-08-29 11:39:24 +00:00
le
bd2fb8891e Prevent that sync operations can be started when they are already
in progress, and be a bit more user friendly in terms of error
messages returned from the kernel.
2005-08-28 18:16:31 +00:00
pjd
fc694070c0 Verify length of the data to read as well. 2005-08-28 00:14:21 +00:00
le
33712bde29 Shuffle around the order in which the components are compiled.
This way, the VINUMDRIVE class is loaded before the VINUM class,
but since geom does the tasting for newly arrived classes
last-in-first-out, the VINUM class tastes first.

This removes the need to call gv_parse_config() in the drive
taste path.
2005-08-26 14:40:32 +00:00
pjd
eac0371f3b Verify offset before reading.
MFC after:	2 days
2005-08-26 12:50:08 +00:00
takawata
81149099bd Add NTFS labeling function.
Reviewed by:pjd
2005-08-26 11:35:10 +00:00
pjd
2e01dfe9a9 Verify if we can actually read the data at given offset.
Reported by:	Martin <nakal@nurfuerspam.de>
2005-08-23 18:55:38 +00:00
le
ff1cc2947a Correct the check if a plex is accessible in case it is not up.
This makes degraded RAID5 plexes actually work.
2005-08-22 23:24:26 +00:00
pjd
aee0040df6 By default, when doing crypto work in software, start as many threads
as we have active CPUs and bind each thread to its own CPU.

MFC after:	3 days
2005-08-21 18:12:51 +00:00
pjd
af30c99a23 Remove stale comment (we now always start worker thread).
MFC after:	3 days
2005-08-21 18:06:35 +00:00
pjd
114d28e6a7 Back-out the change from revision 1.14 and allow for '/' in labels again.
Convinced by:	green, Gavin Atkinson, dougb, gordon
MFC after:	1 day
2005-08-20 17:05:47 +00:00
pjd
a2f0d0b06b Add a __packed keyword to g_eli_metadata struct definition, so
sizeof(struct g_eli_metadata) will return the exact number of bytes needed
for storing it on the disk.
Without this change GELI was unusable on amd64 (and probably other 64-bit
archs), because sizeof(struct g_eli_metadata) was greater than 512 bytes
and geli(8) was failing on assertion.

Reported by:	Michael Reifenberger <mike@Reifenberger.com>
MFC after:	3 days
2005-08-20 10:43:03 +00:00
pjd
0c33c951a5 Allow to change number of iterations for PKCS#5v2. It can only be used
when there is only one key set.

MFC after:	3 days
2005-08-19 22:19:25 +00:00
pjd
e6d1db2424 - Add a missing period.
- Fix number of spaces.

MFC after:	3 days
2005-08-19 22:16:26 +00:00
pjd
863deb3c00 Avoid code duplication and implement bitcount32() function in systm.h only.
Reviewed by:	cperciva
MFC after:	3 days
2005-08-19 22:10:19 +00:00
pjd
653998193b Always run dedicated kernel thread (even when we have hardware support).
There is no performance impact, but allows to allocate memory with
M_WAITOK flag.
As a side effect this simplify code a bit.

MFC after:	3 days
2005-08-17 15:25:57 +00:00
pjd
768e62bfca We should now return 0. 2005-08-17 15:12:34 +00:00
pjd
b5aaabac19 Even if crypto_dispatch() return an error, request is not canceled and
our callback will still be called, just to tell us that requested
failed...

Reported by:	Mike Tancsa <mike@sentex.net>
MFC after:	3 days
2005-08-17 14:34:52 +00:00
pjd
8e698f8cb4 We don't need to clear allocated memory. This will speed-up things a bit.
MFC after:	3 days
2005-08-17 14:08:50 +00:00
phk
c1c0b1f44e remove stale comments 2005-08-16 20:03:29 +00:00
le
cd8ed397a3 Make it possible to remove stale, left-over subdisks. 2005-08-16 15:12:44 +00:00
le
e3eb852545 Fix a stupid logic bug introduced in geom_vinum_drive.c rev 1.18:
When a drive is newly created, it's state is initially set to 'down',
so it won't allow saving the config to it (thus it will never know of
itself being created).  Work around this by adding a new flag, that's
also checked when saving the config to a drive.
2005-08-15 17:07:47 +00:00
pjd
fdcfb5ee7a Because code paths for I/O requests are quite complex, add comments above
the functions which participate in I/O paths.

MFC after:	1 day
2005-08-13 17:45:37 +00:00
pjd
e0a42961b2 Provide more complete "How to add a new file system to glabel." list.
MFC after:	1 week
2005-08-12 00:34:45 +00:00
pjd
b9935076f6 Add code for Ext2FS and ReiserFS labels recognition.
Submitted by:	Stanislav Sedov <stas@310.ru>
PR:		kern/84638
MFC after:	1 week
2005-08-12 00:27:45 +00:00
pjd
df75611e8a Avoid creating directories in devfs by changing all '/' in labels to '_'.
Idea from:	Stanislav Sedov <stas@310.ru>
MFC after:	3 days
2005-08-12 00:05:09 +00:00
pjd
112012604e GELI doesn't need cryptodev.
MFC after:	3 days
2005-08-11 14:52:27 +00:00
pjd
540e708ef5 Be case-insensitive when dealing with algorithm names.
PR:		kern/84659
Submitted by:	Benjamin Lutz <benlutz@datacomm.ch>
2005-08-08 19:40:38 +00:00
pjd
8ecb9be842 MFp4: Export more informations about encrypted providers.
MFC after:	1 week
2005-07-27 22:31:57 +00:00
pjd
354bcaec75 Reduce default debug level to 0.
MFC after:	1 week
2005-07-27 21:48:47 +00:00
pjd
57922fa5cc Add GEOM_ELI class which provides GEOM providers encryption.
For features list and usage see manual page: geli(8).

Sponsored by:	Wheel Sp. z o.o.
		http://www.wheel.pl
MFC after:	1 week
2005-07-27 21:43:37 +00:00
pjd
eb467446d5 Use root_mount KPI for RAID3 to delay root file system mount.
Actually, one cannot setup root file system on RAID3 device, but when
other file system exist in /etc/fstab which are placed on RAID3 device,
boot process will be interrupted when these devices are missing.

MFC after:	3 days
X-MFC-note:	MFC only to RELENG_6, as RELENG_5 doesn't have root_mount KPI.
2005-07-27 09:03:51 +00:00
phk
388b4d6c8d By design I left a tiny race in updating the I/O statistics based on
the assumption that performance was more important that beancounter
quality statistics.

As it transpires the microoptimization is not measurable in the
real world and the inconsistent statistics confuse users, so revert
the decision.

MT6 candidate:	possibly
MT5 candidate:	possibly
2005-07-25 21:12:54 +00:00
pjd
da0fa3b3e0 Add a very simple and small GEOM class - ZERO.
It creates very huge provider (41PB) /dev/gzero.
On BIO_READ request it zero-fills bio_data and on BIO_WRITE it does nothing.
You can also set kern.geom.zero.clear sysctl to 0 to do nothing even for
BIO_READ.

I'm using it for performance testing where it is very helpful.

MFC after:	3 days
2005-07-25 10:03:16 +00:00
phk
55fedcd384 Comment typo 2005-07-20 18:08:16 +00:00
pjd
e180bbc58e Before calling g_orphan_provider(), add G_PF_WITHER flag, so GEOM will know
to destroy it.

PR:		kern/81758
Submitted by:	trasz <trasz@buziaczek.pl>
MFC after:	3 days
2005-07-17 13:15:02 +00:00
nyan
8d748343c0 Merged from geom_mbr.c revisions 1.62 and 1.66.
- Implement a gctl handler and the verb "write MBR".
2005-07-15 15:29:45 +00:00
le
c23617fec8 *) Implement round-robin reads for multiplex volumes.
*) Plug a possible memory leak. [1]

[1] obtained from: pjd@.
2005-07-15 13:38:06 +00:00
phk
45ce75efff Implement a gctl handler and the verb "write MBR" which can be used to
update metadata and bootcode while the MBR is in use.

MFC candidate
2005-07-15 08:00:44 +00:00
pjd
9ef3d97ebe Add CANCEL command which allows to remove one request from the queue or
all requests from the queue if request number is not given.

Bump version number.

Approved by:	re (scottl)
2005-07-08 21:08:53 +00:00
pjd
b1db94ccb3 After provider creation!! 2005-05-25 15:54:17 +00:00
pjd
f5dbb79246 - Call root_mount_rel() when provider IS created, not earlier.
This should close the race observed by Daniel Eriksson.
- Remove redundant wakeup().
2005-05-25 13:10:04 +00:00
pjd
ae767d0b06 Add some debug code to diagnose root-on-mirror problems with recent -current.
Reported by:	Daniel Eriksson
2005-05-23 13:05:07 +00:00
pjd
0435460c63 Correct typo. 2005-05-18 21:53:08 +00:00
le
86733f6df7 When a drive dies, don't call g_wither_geom() directly, but instead
post an event to the geom event queue that will take care of it,
letting outstanding bios finish, and closing the consumers.

Plus some cosmetic clean ups.
2005-05-17 16:38:30 +00:00
pjd
0a798a236d cp can't be NULL.
Noticed by:	Coverity Prevent analysis tool
2005-05-11 19:36:56 +00:00
pjd
0e95eeadc2 gp can't be NULL.
Noticed by:	Coverity Prevent analysis tool
2005-05-11 19:35:43 +00:00
pjd
08790cba80 Add KASSERT() to be sure there is an active component.
Suggested by:	Coverity Prevent analysis tool
2005-05-11 18:13:51 +00:00
pjd
9d22b74b8f Check return value.
Found by:	Coverity Prevent analysis tool
2005-05-11 18:07:39 +00:00
nyan
123c83cc66 Fix signed vs unsigned warning. 2005-05-01 09:44:50 +00:00
le
3998514c39 Only allow RAID5 plexes to be parity checked.
PR:           kern/80427
Submitty by:  Stijn Hoop <stijn@win.tue.nl>
2005-04-28 13:09:00 +00:00
pjd
93f62be898 Fix provider's size check for 'insert' command.
Before this fix one was able to insert one sector too small provider.

MFC after:	3 days
2005-04-25 10:41:26 +00:00
wollman
bc5273fd8f The size of a filesystem may be less than the size of the provider it
resides on.  Fix the special case of the filesystem fragment size not
evenly dividing the size of the provider.  Fixing the general case
probably requires better superblock validation (left as an exercise to
the reader).
2005-04-19 21:55:28 +00:00
pjd
15eddd96be Remove the hack which allowed to use gmirror for root file system,
use root_mount KPI instead.
2005-04-19 21:47:25 +00:00
phk
ed5a7da798 Call g_waitidle() instead of GEOM using the root_mount_hold() KPI.
GEOM could (and will) get events as a result of drivers coming in
late so a one-shot method is not good enough for GEOM.
2005-04-19 06:23:59 +00:00
phk
b7f29c0fc0 Add a named reference-count KPI to hold off mounting of the root filesystem.
While we wait for holds to be released, print a list of who holds us
back once per second.

Use the new KPI from GEOM instead of vfs_mount.c calling g_waitidle().

Use the new KPI also from ata.

With ATAmkIII's newbusification, ata could narrowly miss the window
and ad0 would not exist when we tried to mount root.
2005-04-18 21:21:26 +00:00
pjd
c3333321cc Protect against recursive labels creation in simlar way as it is done
in BSD and MBR classes, ie. if provider below us uses the same metadata,
don't create labels based on the metadata.
This allows to create labels on geoms with rank != 1 without hacks.

Tested by:	Chris Elsworth <chris@shagged.org> on sparc64
OK'ed by:	phk
MFC after:	2 weeks
2005-04-12 08:14:15 +00:00
pjd
a2b6f80684 Fix a long-standing bug. Error string has to be copyied from the user
process context.

Approved by:	phk
MFC after:	3 days
2005-04-08 09:28:08 +00:00
pjd
ae7daf0c96 - Add a missing g_io_deliver() in case of allocation failure - we didn't
completed I/O requests here.
- First allocate all needed bios, so if any of allocations fail, we can
  free memory before sending any I/O requests down.

Reported by:	Pawel Malachowski
MFC after:	3 days
2005-04-03 14:55:49 +00:00
nyan
a855c5b2ce Remove geometry translations here. 2005-03-30 12:59:54 +00:00
joerg
a1f08bc5f7 Support VTOC volume names. This can be useful to distinguish multiple
disks in a system.  Solaris' format(1m) displays the volume names in
the disk overview.

MFC after:	1 month
2005-03-30 09:33:10 +00:00
phk
55543da1eb fix a "modify after free" bug which is practically impossible to
experience.

Found by:	Coverity (id #540 #541)
2005-03-26 21:07:35 +00:00
pjd
9129b5403c If an error occurs, clean up before returning from g_raid3_connect_disk(). 2005-03-26 17:24:19 +00:00
pjd
e13782ca35 Make the code more obvious - when an error occurs in g_mirror_connect_disk(),
detach and destroy consumer before returning.
2005-03-26 17:23:01 +00:00
pjd
90f14e00b7 Check for return values.
Submitted by:	sam
Found by:	Coverity Prevent analysis tool
2005-03-26 16:51:19 +00:00
phk
31aaa9f619 g_read_data() can return NULL, check for it.
Found by:	Coverity (ID#258)
2005-03-18 07:03:56 +00:00
phk
05ad105753 After rejecting the bio request early, return instead of panicing.
Found by:	Coverity (ID#450)
2005-03-18 07:01:31 +00:00
phk
aaa532d778 Avoid null pointer dereference. 2005-03-18 06:57:58 +00:00
pjd
84f16c193c Plug memory leak.
Submitted by:	Ted Unangst
Found by:	Coverity Prevent analysis tool
Approved by:	phk
MFC after:	3 days
2005-03-16 20:48:13 +00:00
phk
f6aaa7c4a4 forward declare struct disk. 2005-03-15 10:47:38 +00:00
phk
ddc0cb6817 Do not attach MBR on top of an MBR. This removes some confusing
slice names on disks with extended partitions.

Spotted on:	Mother-in-laws computer.
2005-03-14 15:22:18 +00:00
ume
73402ef817 stop including rijndael-api-fst.h from rijndael.h.
this is required to integrate opencrypto into crypto.
2005-03-11 15:42:51 +00:00
le
ee457df6ea Remove test for zero sectorsize when tasting. This check doesn't
seem to be necessary anymore, and it prevents tasting a valid drive
when booting with geom_vinum already loaded, since SCSI disks set their
sectorsize not until first opening them.
2005-03-07 19:58:58 +00:00
phk
18d1a64228 Add placeholder mutex argument to new_unrhdr(). 2005-03-07 11:05:47 +00:00
le
af31e4b0b5 Don't allow to synchronize a plex that is already sychronizing.
Reset the 'syncing' flag in case of errors, too.

Some cosmetics.
2005-03-04 16:43:40 +00:00
pjd
668a028670 - Add md_provsize field to metadata, which will help with
shared-last-sector problem.
  After this change, even if there is more than one provider with the same
  last sector, the proper one will be chosen based on its size.
  It still doesn't fix the 'c' partition problem (when da0s1 can be confused
  with da0s1c) and situation when 'a' partition starts at offset 0
  (then da0s1a can be confused with da0s1 and da0s1c). One can use '-h'
  option there, when creating device or avoid sharing last sector.
  Actually, when providers share the same last sector and their size is equal,
  they provide exactly the same data, so the name (da0s1, da0s1a, da0s1c)
  isn't important at all.
- Provide backward compatibility.
- Update copyright's year.

MFC after:	1 week
2005-02-27 23:07:47 +00:00
le
5722d64390 Correctly calculate what to do and how to retry a request to a plex when
the previous one failed and there are more than one plex in the volume.

This could have led to a flood of error messages on the console and
probably a deadlock in certain situations.
2005-02-23 14:59:14 +00:00
phk
66dfd63961 Try to unbreak the vnode locking around vop_reclaim() (based mostly on
patch from kan@).

Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on
close.  This is not yet a generally safe function, but for this very
specific use it is safe.  This solves the problem with buffers not
being flushed by unmount or after failed mount attempts.
2005-02-19 11:44:57 +00:00
le
364c3f9996 In case of drive errors, don't close the associated consumer and
detach it, but instead let the geom wither away.

Bump copyright year.
2005-02-17 16:08:36 +00:00
pjd
1e033a07a4 Fix year in copyrights. 2005-02-16 22:26:34 +00:00
pjd
c05fe5379b Update copyright in files changed this year. 2005-02-16 22:14:52 +00:00
pjd
7d8dd193b1 Fix year in copyrights. 2005-02-16 22:13:22 +00:00
pjd
b83e0d6d3f Remove mutex asserion from g_gate_find(). We don't want g_gate_list_mtx
mutex to be held here, because we want speed here.
2005-02-16 16:13:56 +00:00
pjd
a0cee6bbb3 Remove TDP_GEOM flag from thread after ggate device creation.
This flag means "wait for all pending requests before returning to userland".
There are pending events for sure, because we just created new provider and
other classes want to taste it, but we cannot answer on I/O requests until
we're here.
2005-02-16 16:12:28 +00:00
pjd
a8be0605e4 Fix typo. We want to unlock mutex here.
Submitted by:	Andreas Kohn <andreas.kohn@gmail.com>
MFC after:	1 week
2005-02-12 16:19:03 +00:00
phk
67e4aa7d33 Make various random things static 2005-02-10 12:10:35 +00:00
pjd
c154250e86 - Remove g_gate_hold()/g_gate_release() from start/done paths. It saves
4 mutex operations per I/O requests.
- Use only one mutex to protect both (incoming and outgoing) queue.
  As MUTEX_PROFILING(9) shows, there is no big contention for this lock.
- Protect sc_queue_count with queue mutex, instead of doing atomic
  operations on it.
- Remove DROP_GIANT()/PICKUP_GIANT() - ggate is marked as MPSAFE and no
  Giant there.
2005-02-09 08:29:39 +00:00
des
80eee84ab5 merge from geom_vol_ffs.c rev 1.14 (avoid unaligned I/O requests) 2005-02-08 12:34:11 +00:00
des
4edf9621a3 Take care not to issue unaligned I/O requests while tasting a provider. 2005-02-08 08:04:23 +00:00
pjd
e4c6b83cfd - Use bioq_insert_tail()/bioq_insert_head() instead of bioq_disksort().
- Improve mediasize checking.

MFC after:	1 week
2005-02-05 00:30:08 +00:00
phk
1a5ece93a1 When dumping to a unpartitioned disk, make sure to chop the
length of the dump area accordingly.

Run into by:	scottl
2005-01-29 16:49:43 +00:00
jeff
83cae1af10 - If mpsafevfs is off, acquire giant around all calls to bufdone().
Sponsored by:   Isilon Systems, Inc.
2005-01-28 16:04:44 +00:00
phk
bb8d78bf44 Introduce and use g_vfs_close(). 2005-01-25 15:52:04 +00:00
phk
9e3a1e9a23 Create a correctly sized vnode objects for disk devices. 2005-01-24 22:41:21 +00:00
jeff
b033acd674 - Don't acquire giant around calls to bufdone().
Sponsored By:   Isilon Systems, Inc.
2005-01-24 10:47:46 +00:00
le
082eb69990 Only report state changes of subdisks and plexes when there's
really a state change.

Reword the info a bit.
2005-01-21 18:27:23 +00:00
le
f7ba150b68 Don't initialize error with ENXIO as we might end up here when
the plex has no more consumers (e.g. orphaning).
2005-01-21 18:24:20 +00:00
pjd
9e051a0e23 Protect against recursive slices creation in simlar way as it is done
in BSD class, ie. if provider below us uses the same metadata, don't
create slices based on the metadata.
This allows to create slices on geoms with rank != 1 without hacks.

Discussed with:	phk
Approved by:	phk
MFC after:	2 weeks
2005-01-20 22:14:05 +00:00
le
f03cf50f54 Rename synchronization and initialization threads and prefix them
with 'gv_' for consistency.
2005-01-19 14:49:26 +00:00
le
8c1fee0f75 Although an object may already be known in the configuration, it's
worker thread may have been destroyed (e.g. during orphaning).

Make sure that objects get back their worker threads when they get a
new geom.
2005-01-19 14:08:16 +00:00
le
eb12fefc35 Reset object flags after killing off an object's worker thread. 2005-01-19 13:57:09 +00:00
phk
63a9efe25c Discontinue zero-length g_ctl arguments as "just give him this pointer"
transfers.  The necessary context for calling copyin() isn't available
anyway and automatic code-validation chokes on this.
2005-01-17 07:14:24 +00:00
phk
6d1e88efa3 CAM will sometimes remove a disk again even before it finished being
initialized.  We already cancel the pending events but we need to not
dereference the geom pointer which never got set different from NULL.
2005-01-14 21:05:35 +00:00
pjd
8d8363ee39 Introduce a new GEOM class - SHSEC. It provides sharing secret between
the given providers. Without even one of the configured components there
should be no way to get the secret.

Supported by:	WHEEL Sp. z o.o.
		http://www.wheel.pl
2005-01-11 18:06:44 +00:00
phk
5a497775d6 Add BO_SYNC() and add a default which uses the secret vnode pointer
and VOP_FSYNC() for now.
2005-01-11 10:43:08 +00:00
pjd
a063b2627e Increase default synchronization speed.
MFC after: 3 days
2005-01-09 14:43:39 +00:00
imp
872b48591b /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 18:27:30 +00:00
pjd
589a9682ce - Fix 'rebuild' command - it can no longer relay on retaste event
(we ignore it).
- Remove code used for handling spoil events, as spoiling is not possible
  anymore, because we keep consumers open for writing all the time.

MFC after:	4 days
2005-01-04 12:15:21 +00:00
pjd
c1f23c6a62 Spoiling is now not possible, because we keep consumers open for writing
all the time. Remove unused code then.

MFC after:	4 days
2005-01-04 12:11:49 +00:00
pjd
fcf90f45eb Fix 'rebuild' command (we ignore retaste event now, so don't relay on it). 2005-01-03 19:42:37 +00:00
pjd
25d6a6cbc3 Remove unused #include. 2005-01-03 12:53:10 +00:00
jhb
7b611b0cb2 Stop explicitly touching td_base_pri outside of the scheduler and simply
set a thread's priority via sched_prio() when that is the desired action.
The schedulers will start managing td_base_pri internally shortly.
2004-12-30 20:29:58 +00:00
pjd
1955652326 Remove debug code. 2004-12-28 21:52:45 +00:00
pjd
0bb72c3b00 - Add genid field to the metadata which will allow to improve reliability a bit.
After this change, when component is disconnected because of an I/O error,
  it will not be connected and synchronized automatically, it will be logged
  as broken and skipped. Autosynchronization can occur, when component is
  disconnected (on orphan event) and connected again - there were no I/O
  error, so there is no need to not connected the component, but when there were
  writes while it wasn't connected, it will be synchronized.
  This fix cases, when component is disconnected because of I/O error and can be
  connected again and again.
- Bump version number.
- Implement backward compatibility mechanism. After this change when metadata in
  old version is detected, it is automatically upgraded to the new (current)
  version.
2004-12-25 19:17:47 +00:00
pjd
9efcf672b9 Update disk->d_genid field when increasing sc->sc_genid. 2004-12-23 21:15:15 +00:00
pjd
b58db25ebe - Add genid field to the metadata which will allow to improve reliability a bit.
After this change, when component is disconnected because of an I/O error,
  it will not be connected and synchronized automatically, it will be logged
  as broken and skipped. Autosynchronization can occur, when component is
  disconnected (on orphan event) and connected again - there were no I/O
  error, so there is no need to not connected the component, but when there were
  writes while it wasn't connected, it will be synchronized.
  This fix cases, when component is disconnected because of I/O error and can be
  connected again and again.
- Bump version number.
- Add version change history.
- Implement backward compatibility mechanism. After this change when metadata in
  old version is detected, it is automatically upgraded to the new (current)
  version.
2004-12-22 23:09:32 +00:00
pjd
968f03faa7 Now, when force device destruction is done on shutdown, hide warning,
that device cannot be destroyed immediately, under debug=1.

Suggested by:	simon
2004-12-21 19:50:18 +00:00
pjd
c18c5197dc Improve reliability and clean up code a bit.
For more details check src/sys/geom/mirror/g_mirror.c rev.1.47,1.48,1.49,1.50.
2004-12-21 19:30:59 +00:00
pjd
c5ff5344f3 This should not be permitted, but some GEOM classes held the topology lock
while doing g_(read|write)_data() (e.g. BSD). This can cause a deadlock
in MIRROR class. Not sure if this is safe to drop the topology lock in BSD
class, so change the code in MIRROR class to avoid this deadlock.
2004-12-21 18:42:51 +00:00
pjd
27d828652f Implement g_topology_try_lock().
No objection from:	phk
2004-12-21 18:32:46 +00:00
pjd
d359cbb8ec Remove unused variables. 2004-12-19 23:55:49 +00:00
pjd
5d72d5354a - Argument 'flags' in g_mirror_destroy_consumer() function is unsed -
mark it as such.
- Before closing consumer check if it is open. It can be closed here
  when g_mirror_connect_disk() fails on g_access().
2004-12-19 23:33:59 +00:00
pjd
8c72a601a3 Some major cleanups.
Keeping consumers open when device is closed is very hard. We need to
open consumers sometimes to update metadata, etc.
Many hacks was introduced in the past to made it possible. You cannot
be sure that you can open consumer for writing always, even if you think
it should be allowed. If one of the mirror components is for example da0
and you try to open it, you can get EPERM when da0s1 is opened for reading
(because BSD class opens consumers (da0) with an extra 'e' bit set).
Waiting for the events queue to be empty may do the trick, but it makes
code much uglier (as you cannot always call g_waitidle()), it doesn't
solve all edge cases and it can introduce deadlocks if there are events
in the queue that wait for gmirror.

I removed those hacks. Now all consumers are open r1w1e1 always, even if
device is closed. Maybe it is less clean from GEOM perspective, but simpify
code a lot and make it much more reliable.
The only issue was retaste event which is sent when we close consumers
opened for writing. I ignore retaste event by not detaching consumer
immediately (so retaste event is not send to my class) and sending event
right after it to detach and destroy consumer.
2004-12-19 23:12:00 +00:00
pjd
fccc23ca68 Don't quit on first failure, just skip failures. 2004-12-19 22:58:25 +00:00
brueffer
77a79d8302 Fix typo in a comment.
MFC after:	3 days
2004-12-15 12:18:41 +00:00
pjd
7136589630 bioq_insert_head() function is already in subr_disk.c. 2004-12-13 13:02:06 +00:00
phk
35b3c9fdfb Pass the file->flags down to geom ioctl handlers.
Reject certain ioctls if write permission is not indicated.

Bump geom API version.

Reported by:	Ruben de Groot <mail25@bzerk.org>
2004-12-12 10:09:05 +00:00
pjd
3342c80c8f - Turn off 'fast' mode by default and increase maximum memory to consume
when this mode is used.
- Manual page update.
2004-12-09 12:26:47 +00:00
marcel
b898758464 o Don't limit GPT as a rank 2 provider. Allow it to be connected
anywhere in the DAG. This includes configurations that are not
   allowed by the EFI specification.
o  Reject a GPT partition table if it's not preceeded by a PMBR.
   There's no need to preserve the MBR partitioning anymore as GPT
   is mature and with the first bullet extending the applicability
   of GPT, it's better to be a bit more strict.
2004-12-05 06:02:21 +00:00
pjd
ca88614a1e When initializing device, set d_softc and d_no fields for all components,
because we know it then and we need it when inserting a component which
wasn't destroyed while device was running.

Reported by:	Michael Handler <handler@grendel.net>
MFC after:	1 week
2004-12-04 21:20:59 +00:00
imp
4671aabad6 Add observations of the Linux98 and Grub/98 boot loaders. These
observations lead me to believe that the convetion for pc98 boot
loaders is to have a jump unstruction, followed by a string, followed
by code.  The jump usually doesn't have a nop after it and usually the
string is NUL terminated, but Grub/98 breaks both of these rules.

# I looked for, but failed to find the Minux boot blocks for PC-9801 port.
2004-11-30 09:40:11 +00:00
imp
b06b583ff5 Reject tasting of this provider if the sector size isn't a multiple of
512.  If I had an audio cdrom in my cd player when I booted my system,
I'd get a panic from geom because you can't read 8192 bytes from an
audio cdrom.

Remove XXX comment about IPL1 and replace it with some information
from my soon to be published web page on the pc98 disk layout.  The
IPL1 test was the result of an observation of a disk with FreeBSD's
boot0 program.  It was testing part of an area what appears to be
reserved for a boot loader name, which comes after a jump over this
area.  I don't yet know if it is required to be any specific jump
instruction, or if the destination has to be location 11. [1]

[1] FreeBSD Press No. 13, page 115, poorly translated by myself.  The
picture there shows offset 8 as the destination of the jump, but
FreeBSD's boot0 program has three padding NULs after the IPL1 name and
uses a 16-bit 'jmp' instruction.
2004-11-30 08:00:14 +00:00
phk
b760ac8e38 Fix a long standing bug in geom_mbr which is only now exposed by the
correct open/close behaviour of filesystems:

When an ioctl to modify the MBR arrives, we cannot take for granted that
we have the consumer open.

The symptom is that one cannot run 'boot0cfg -s2 /dev/ad0' in single-user
mode because / is the only open partition in only open r1w0e1.

If it is not, we attempt to increase the write count by one and
decrease it again afterwards.

Presumably most if not all other slices suffer from the same problem.
2004-11-28 20:57:25 +00:00
le
c2bd948f68 Implement 'setstate' to allow setting the state of drives and subdisks
for debugging and emergency purposes.
2004-11-26 12:31:36 +00:00
le
e257308b07 Implement checkparity/rebuildparity. 2004-11-26 12:01:00 +00:00
pjd
bac3bee98c - Add missing Giant drop before acquiring the topology lock.
- Move DROP_GIANT()/PICKUP_GIANT() to g_gate_ioctl().
2004-11-23 11:18:26 +00:00
fjoe
89f1126514 Use M_ZERO to not panic in mtx_init when INVARIANTS enabled.
Submitted by:	simokawa
MFC after:	1 week
2004-11-20 13:10:04 +00:00
le
255694f7fc Move RAID5 offset calculation into a separate function to avoid
code duplication.
2004-11-15 13:04:55 +00:00
le
c174c57d9d Share gv_roughlength() between kernel and userland, as we will need it
there later.
2004-11-15 12:30:59 +00:00
pjd
64bf081bc3 Before trying to update metadata (so open consumer for writing), be sure
that the events queue is empty. In other case we're able to hit the race
where for example da0s1 is tasted by some other class, which means that
da0 is open with exclusive bit set, which means that we can't open da0
for writing if it is our component.

Reported by:	Attila Nagy <bra@fsn.hu> (and somebody else sometime ago,
		                          but I cannot find who it was)
2004-11-09 23:27:21 +00:00
pjd
412bff1f39 Introduce g_waitidlelock() function which is simlar to g_waitidle(),
but should be called with the topology lock held and returns with the
topology lock held and empty event queue.

Approved by:	phk (sometime ago)
2004-11-09 23:20:50 +00:00
pjd
a2d5ae235d Don't rely on DIRTY flag to be sure that consumer if open, because
DIRTY flag can be removed in idle process. Use consumer's acw field
instead to avoid opening consumer twice.
2004-11-09 23:15:40 +00:00
pjd
5cac522046 For BIO_READ check if provider is open for reading and for BIO_WRITE,
check if provider is open for writing.
This fixes panic when device is open only for writing and we send write
request.
2004-11-09 23:04:45 +00:00
pjd
592adb99c2 Drop Giant lock before grabbing the topology lock. 2004-11-09 00:35:08 +00:00
pjd
b9bb54bcb8 If device is marked as beeing destroyed, deny all access requests. 2004-11-08 20:23:53 +00:00
pjd
d62be2e0d6 Don't forget to make sure that there are no not-finished requests before
marking components as clean.

Pointed out by:	scottl
2004-11-05 17:18:39 +00:00
pjd
b004592010 - Mark all raid3 components as clean after kern.geom.raid3.idletime seconds.
- Make kern.geom.raid3.timeout variable tunable.
2004-11-05 13:12:58 +00:00
pjd
f229109eb7 Mark raid3 devices as clean on shutdown (after all file systems are
unmounted).

Suggested by:	scottl
2004-11-05 13:01:25 +00:00
pjd
270f218c1d - Use ->index consumer's field to track number of in-flight requests.
- Remove unused #include.
2004-11-05 12:42:16 +00:00
pjd
268c658b69 Use shutdown hooks to mark mirrors as clean after all file systems are
unmounted.

Suggested by:	scottl
2004-11-05 12:35:21 +00:00
pjd
de4e5b4e88 Remove unused #include. 2004-11-05 12:31:32 +00:00
pjd
0bd7b4d36a - Add a sysctl kern.geom.mirror.idletime, so one can specify after how many
seconds of idling, DRITY flags are removed.
- If mirror is in idle state or is not open for writing, sleep without
  timeout when waiting for I/O requests.
- Don't use atomic operations, for now sysctls are protected by Giant.
- Update debugs.
2004-11-05 10:55:04 +00:00
pjd
d0890a743f MFp4:
- Fix for good (I hope) force-stopping mirrors and some filure cases
  (e.g. the last good component dies when synchronization is in progress).
  Don't use ->nstart/->nend consumer's fields, as this could be racy,
  because those fields are used in g_down/g_up, use ->index consumer's
  field instead for tracking number of not finished requests.

  Reported by:	marcel

- After 5 seconds of idle time (this should be configurable) mark all
  dirty providers as clean, so when mirror is not used in 5 seconds
  and there will be power failure, no synchronization on boot is needed.

  Idea from:	sorry, I can't find who suggested this

- When there are no ACTIVE components and no NEW components destroy whole
  mirror, not only provider.

- Fix one debug to show information about I/O request, before we change
  its command.
2004-11-05 09:05:15 +00:00
phk
95abc5d4fd Finish cut&paste adjustments.
Spotted by:	tegge
2004-11-04 07:17:08 +00:00
phk
f31ec96a3e Stop dumping the MBR entries under bootverbose 2004-11-03 09:08:33 +00:00
phk
cc263b9934 Stop wasting a bootverbose line on all geom slices. 2004-11-03 09:08:10 +00:00
phk
6358ee90d6 Don't set si_bsize_phys, nobody cares. 2004-10-29 11:11:44 +00:00
phk
f0dd76e153 Add GEOM class "VFS" for filesystems and other buffer cache users
of GEOM devices.

There is nothing magic about this, it just gives a bufobj interface
to GEOM.
2004-10-29 09:56:56 +00:00
phk
269b7a298a Add g_wither_geom_close() function. 2004-10-29 09:19:03 +00:00
phk
86cc21c765 Give dev_strategy() an explict cdev argument in preparation for removing
buf->b-dev.

Put a bio between the buf passed to dev_strategy() and the device driver
strategy routine in order to not clobber fields in the buf.

Assert copyright on vfs_bio.c and update copyright message to canonical
text.  There is no legal difference between John Dysons two-clause
abbreviated BSD license and the canonical text.
2004-10-29 07:16:37 +00:00
le
d0679d2fde Give each plex a separate queue where held back bios are put on.
This lowers the CPU usage of the worker thread and prevents a
possible live lock on non-SMP machines.

MFC candidate.
2004-10-26 21:01:42 +00:00
phk
fc5bad60e3 Use unit number allocation functions for GEOM minor numbers. 2004-10-25 12:28:28 +00:00
phk
5065a91152 Retire si_stripesize and si_stripeoffset they will not be needed in cdev
in the future.
2004-10-25 07:40:54 +00:00
phk
80dc3e1bc2 Don't call g_waitidle(), it happens automagically now. 2004-10-23 20:52:15 +00:00
phk
93429f2778 Add a new per-thread private flag: TDP_GEOM.
This flag gets set whenever the thread posts an event on the GEOM
event queue, and if the flag is set when the thread is prepared
to return to userland from the kernel, g_waitidle() will be called
to make sure that the posted events have completed.

This can replace an insufficient number of g_waitidle() calls in
various other places, and has the advantage of being failsafe:  Any
system call which does a VOP_OPEN()/VOP_CLOSE will now correctly
wait for any geom events it posted as part of spoils or tastes.

Assert that topology and Giant is not held in g_waitidle().
2004-10-23 20:49:17 +00:00
phk
00ae1b0f02 Move the prototype for g_waitidle() to a more visible place. 2004-10-23 20:22:02 +00:00
arr
b510c7b809 - Turn KASSERT()s into warning printf()'s in the g_class_load() routine.
This removes a panic that will occur if you build with GENERIC and
  attempt to kldload a GEOM module that is already in the kernel.

Reviewed by: phk
2004-10-22 22:16:24 +00:00