Commit Graph

1474 Commits

Author SHA1 Message Date
lulf
a8c357db8a - Fix spelling errors.
Approved by:    kib (mentor)
PR:             kern/124788
Submitted by:   Hywel Mallett <Hywel -at- hmallett.co.uk>
2008-06-20 19:48:18 +00:00
marcel
1fee86b4dc Add the set and unset verbs used to set and clear attributes for
partition entries. Implement the setunset method for the MBR
scheme to control the active flag.
2008-06-18 01:13:34 +00:00
marcel
2f592aa3d4 Finish the support for partition labels and add it to the XML. 2008-06-12 19:34:07 +00:00
marcel
1053568d9d Add the raw partition type to the XML. 2008-06-12 06:34:14 +00:00
marcel
a38655dc5c Add the raw partition type to the XML. 2008-06-12 06:26:36 +00:00
marcel
d62c85e04f Add the raw partition type to the XML. 2008-06-12 05:56:03 +00:00
marcel
d210ed3b77 Add the raw partiton type to the XML. 2008-06-12 05:28:47 +00:00
marcel
169adaca5e Add the raw partition type to the XML. 2008-06-12 05:27:23 +00:00
marcel
964e8e52e8 Add the partition label and the raw partition type to the XML. 2008-06-12 04:43:34 +00:00
ed
5de6a45e07 Remove the distinction between device minor and unit numbers.
Even though we got rid of device major numbers some time ago, device
drivers still need to provide unique device minor numbers to make_dev().
These numbers are only used inside the kernel. They are not related to
device major and minor numbers which are visible in devfs. These are
actually based on the inode number of the device.

It would eventually be nice to remove minor numbers entirely, but we
don't want to be too agressive here.

Because the 8-15 bits of the device number field (si_drv0) are still
reserved for the major number, there is no 1:1 mapping of the device
minor and unit numbers. Because this is now unused, remove the
restrictions on these numbers.

The MAXMAJOR definition was actually used for two purposes. It was used
to convert both the userspace and kernelspace device numbers to their
major/minor pair, which is why it is now named UMINORMASK.

minor2unit() and unit2minor() have now become useless. Both minor() and
dev2unit() now serve the same purpose. We should eventually remove some
of them, at least turning them into macro's. If devfs would become
completely minor number unaware, we could consider using si_drv0 directly,
just like si_drv1 and si_drv2.

Approved by:	philip (mentor)
2008-05-29 12:50:46 +00:00
lulf
4cc0ad7f58 - Recognize the 'volume' parameter when creating a plex.
PR:		kern/75632
Approved by:	pjd (mentor)
MFC after:	1 day
2008-05-22 10:27:03 +00:00
pjd
e225ca8138 - Assert that we don't send new provider event for a provider which has
G_PF_WITHER flag set.
- Fix typo in assertion condition (sorry, but I forgot who report that).
2008-05-18 22:50:50 +00:00
pjd
21a0be64c7 Play nice with DDB pager.
Educated by:	jhb's BSDCan presentation
2008-05-18 21:13:10 +00:00
marcel
38e8adf720 Implement the G_PART_DUMPCONF method for all 6 schemes. Also call
the method for the (indent == NULL) case (i.e. the kern.geom.conftxt
sysctl). The purpose is to extend the conftxt output with scheme-
specific fields which can be used by libdisk. In particular, have
the schemes dump the xs and xt fields, which contain the backward
compatible values for class type and partition type. This allows
libdisk to work with the legacy slicers as well as with gpart and
helps/promotes migration.
2008-04-23 20:13:05 +00:00
marcel
ecdc3b6d8c Add the bootcode verb for installing boot code. Boot code
is supported for the MBR, GPT and PC98 schemes, where GPT
installs boot code into the PMBR.
2008-04-13 19:54:54 +00:00
marcel
5b21e055a6 Change the order from SI_ORDER_FIRST to SI_ORDER_ANY (within
SI_SUB_DRIVERS) to avoid loading schemes before all the GEOM
classes have been loaded and initialized. Otherwise we may
end up using mutexes that haven't been initialized (due to
g_retaste() posting an event).
2008-03-29 17:33:29 +00:00
marcel
4545b45cdc Add support for PC-9800 partition tables. 2008-03-28 17:58:55 +00:00
marcel
dd866faa70 When retasting, wither any existing GEOMs of the same class. This
allows the class to create a different GEOM for the same provider
as well as avoid that we end up with multiple GEOMs of the same
class with the same name.

For example, when a disk contains a PC98 partition table but
only MBR is supported, then the partition table can be treated
as a MBR. If support for PC98 is later loaded as a module, the
MBR scheme is pre-empted for the PC98 scheme as expected.
2008-03-28 06:31:12 +00:00
marcel
c184f6ced2 Redefine G_PART_SCHEME_DECLARE() from populating a private linker set
to declaring a proper module. The module event handler is part of the
gpart core and will add the scheme to an internal list on module load
and will remove the scheme from the internal list on module unload.
This makes it possible to dynamically load and unload partitioning
schemes.
2008-03-23 01:31:59 +00:00
marcel
31a163ef06 Add g_retaste(), which given a class will present all non-open providers
to it for tasting. This is useful when the class, through means outside
the scope of GEOM, can claim providers previously unclaimed.

The g_retaste() function posts an event which is handled by the
g_retaste_event().

Event suggested by: phk
2008-03-23 01:23:35 +00:00
lulf
8a5c25a52b - Fix a memory leak when re-discovering a gvinum configuration.
Approved by:	pjd (mentor)
MFC after:	1 week
2008-03-18 08:48:51 +00:00
marcel
dd3a7906ab Add support for VTOC8 labels (aka sun disk labels). When a label does
not have VTOC information about the partitions, it will be created.
This is because the VTOC information is used for the partition type
and FreeBSD's sunlabel(8) does not create nor use VTOC information.
For this purpose, new tags have been added to support FreeBSD's
partition types.
2008-03-02 00:52:49 +00:00
marcel
1c97028d69 Follow-up improvements to the handling of false positives: If the
partition table is empty, check to see if we have something that
looks sufficiently like a BPB. On non-i386 machines, the boot
sector typically doesn't contain boot code; the end of the boot
sector is all zeroes. This is also where the partition table is
for MBRs.
We only check the sector size and cluster size, as that seems to
be the most reliable across implementations, BPB versions and
platforms.
2008-02-29 22:41:36 +00:00
marcel
aa08e756e2 Better handle false positives. The MBR differs from the boot sector
only because there's a partition table where the boot sector has
boot code. Boot sectors without boot code look like a MBR for all
practical purposes. This change adds a check for the partition table
and fails the probe when it's obvously invalid. The assumption being
that the sector contains a boot sector and not a MBR.
More checks are needed to distinguish a boot secto without boot code
from a (empty) MBR.
2008-02-28 22:30:41 +00:00
thompsa
b6bbd7f540 geom_lvm(4) is now known as geom_linux_lvm(4). 2008-02-20 07:52:43 +00:00
thompsa
5443a03210 Add a geom class to map Linux LVM logical volumes.
The logical disks will appear as /dev/lvm/<vol group>-<logical vol>, for
instance /dev/lvm/vg0-home. G_LINUX_LVM currently supports linear stripes with
segments on multiple physical disks. The metadata is read only, logical
volumes can not be allocated or resized.

Reviewed by:	Ivan Voras

Previously known as geom_lvm(4), rename requested by des, phk.
2008-02-20 07:45:36 +00:00
scottl
7163c9c1fc Teach the dump and minidump code to respect the maxioszie attribute of
the disk; the hard-coded assumption of 64K doesn't work in all cases.
2008-02-15 06:26:25 +00:00
thompsa
25bc946ffc Unbreak build, size_t is larger on 64bit platforms. 2008-02-11 09:20:01 +00:00
thompsa
1d945b74cc Add a geom class to map Linux LVM logical volumes.
The logical disks will appear as /dev/lvm/<vol group>-<logical vol>, for
instance /dev/lvm/vg0-home. GLVM currently supports linear stripes with
segments on multiple physical disks. The metadata is read only, logical
volumes can not be allocated or resized.

Reviewed by:	Ivan Voras
2008-02-11 03:05:11 +00:00
marcel
5817fa6edc Various fixes:
o  BSD disklabels have relative offsets. Even for the BSD in MBR slice
   setup, except when the mbroffset ioctl is supported. Since we don't
   support that ioctl, bsdlabel(8) expects relative offsets. So, when
   reading an existing disklabel, correct for disklabels that mistakenly
   have the mbroffset offsets.
o  Don't take the geometry seriously, because it's untrustworthy. We do
   expect the numbers to be within range. This means that the secperunit
   field will not be computed from secpercyl and ncyls, but simply is
   the mediasize in sectors.
o  Don't enforce partitions to be aligned to track boundaries. The
   default label, constructed by bsdlabel(8), puts partition a at offset
   BBSIZE bytes, which commonly means sector 16.
2007-12-24 01:01:59 +00:00
phk
f0debf860a Chop DIOCGDELETE from userland up in 1024 sector chunks to give geom_disk
or any other bio chopping geom a reasonable size of work.

Check for delivered signals between chunks, because the request size
and service time is unbounded.
2007-12-16 19:38:26 +00:00
phk
e8d782d36d Don't limit BIO_DELETE requests to MAXPHYS, they perform no data
transfers, so they are not subject to the VM system limitation.
2007-12-16 18:03:31 +00:00
marcel
2f8a94b8fe Decode as many or as few partition entries as the label claims there
are. We have already checked it against the caller provided maxpart.
2007-12-09 22:44:22 +00:00
marcel
3d56eaad5f Fix a bug in the add verb, where we failed to keep the list
of partitions in index-order. This is assumed by the APM, MBR
and BSD partitioning schemes.
2007-12-09 22:26:42 +00:00
marcel
534092d0a2 Internal partitions can not be deleted or modified. 2007-12-08 23:08:42 +00:00
marcel
fd024cb2be Skip internal partitions in the check for (user) partitions for
the destroy command. Previously a freshly created BSD disklabel
could not be destroyed because of the internal partition.
2007-12-08 22:06:17 +00:00
marcel
7827a1496a Add support for FS_ZFS. 2007-12-08 07:01:10 +00:00
jhb
82ec005a0a Only attach to a GPT partition if it has the GPT_ENT_TYPE_FREEBSD type.
XXX: This only works currently with GEOM_GPT which only exists in 6.x.
XXX: I didn't add 'mbroffset' support for a GPT partition holding a BSD
label as I'm not sure if they use relative or absolute offsets.

MFC after:	3 days
2007-12-06 09:20:27 +00:00
marcel
5f073f2789 Add a BSD disklabel backend to g_part:
o  Disklabels can have between 8 and 20 partitions (inclusive).
o  No device special file is created for the raw partition.
o  Switch ia64 to use this backend.
o  No support for boot code yet.
2007-12-06 02:32:42 +00:00
jb
1785efbd75 On some arches, openssl is built with OPENSSL_NO_CAMELLIA, so the
code here needs to depend on that too.
2007-11-19 08:59:32 +00:00
maxim
ba7248e777 o s/resiserfs_sb/reiserfs_sb/.
Submitted by:	Ighighi
2007-11-16 19:43:26 +00:00
pjd
4f3fb64c4b Save stack only when KTR_GEOM is both compiled into the kernel and enabled
in debug.ktr.mask. Because saving stack is very expensive, it's better only
to do it when one really wants to.

Reported by:	Dan Nelson
2007-10-26 06:55:00 +00:00
jhb
2f8a906c36 First cut at support for booting a GPT labeled disk via the BIOS bootstrap
on i386 and amd64 machines.  The overall process is that /boot/pmbr lives
in the PMBR (similar to /boot/mbr for MBR disks) and is responsible for
locating and loading /boot/gptboot.  /boot/gptboot is similar to /boot/boot
except that it groks GPT rather than MBR + bsdlabel.  Unlike /boot/boot,
/boot/gptboot lives in its own dedicated GPT partition with a new
"FreeBSD boot" type.  This partition does not have a fixed size in that
/boot/pmbr will load the entire partition into the lower 640k.  However,
it is limited in that it can only be 545k.  That's still a lot better than
the current 7.5k limit for boot2 on MBR.  gptboot mostly acts just like
boot2 in that it reads /boot.config and loads up /boot/loader.  Some more
details:
- Include uuid_equal() and uuid_is_nil() in libstand.
- Add a new 'boot' command to gpt(8) which makes a GPT disk bootable using
  /boot/pmbr and /boot/gptboot.  Note that the disk must have some free
  space for the boot partition.
  - This required exposing the backend of the 'add' function as a
    gpt_add_part() function to the rest of gpt(8).  'boot' uses this to
    create a boot partition if needed.
- Don't cripple cgbase() in the UFS boot code for /boot/gptboot so that
  it can handle a filesystem > 1.5 TB.
- /boot/gptboot has a simple loader (gptldr) that doesn't do any I/O
  unlike boot1 since /boot/pmbr loads all of gptboot up front.  The
  C portion of gptboot (gptboot.c) has been repocopied from boot2.c.
  The primary changes are to parse the GPT to find a root filesystem
  and to use 64-bit disk addresses.  Currently gptboot assumes that the
  first UFS partition on the disk is the / filesystem, but this algorithm
  will likely be improved in the future.
- Teach the biosdisk driver in /boot/loader to understand GPT tables.
  GPT partitions are identified as 'disk0pX:' (e.g. disk0p2:) which is
  similar to the /dev names the kernel uses (e.g. /dev/ad0p2).
- Add a new "freebsd-boot" alias to g_part() for the new boot UUID.

MFC after:	1 month
Discussed with:	marcel (some things might still change, but am committing
			what I have so far)
2007-10-24 21:33:00 +00:00
marcel
96e4f348f4 Add the freebsd-zfs alias. Both APM and GPT have ZFS partition
types.
2007-10-21 20:02:57 +00:00
julian
51d643caa6 Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0  so that we can eventually MFC the
new kthread_xxx() calls.
2007-10-20 23:23:23 +00:00
pjd
e429b3be95 When orphaning a provider, cancel events related to it.
Without this change the following situation was possible:

1. Provider is orphaned from within class' access() method on last write
   close - orphan provider event is send.
2. GEOM detects last write close on a provider and sends new provider event.
3. g_orphan_register() is called, and calls all orphan methods of attached
   consumers.
4. New provider event is executed on orphaned provider, all classes can
   taste already orphaned provider, and some may attach consumers to it.
   Those consumers will never go away, because the g_orphan_register()
   was already called.

We end up with a zombie provider.

With this change, at step 3, we will cancel new provider event.

How to repeat this problem:

	# mdconfig -a -t malloc -s 10m
	# geli init -i 0 md0
	# geli attach md0
	# newfs -L test /dev/md0.eli
	# mount /dev/ufs/test /mnt/tmp
	# geli detach -l md0.eli
	# umount /mnt/tmp
	# glabel status
            Name  Status  Components
        ufs/test  N/A     N/A

Reviewed by:	phk
Approved by:	re (kensmith)
2007-09-27 20:18:34 +00:00
pjd
581e534e82 LINT compiled just fine for me, but it seems it breaks tinerbox way of
compiling LINT.

Approved by:	re (implicitly)
2007-09-23 15:10:48 +00:00
pjd
27bd800e61 Bring in the GEOM Virtualisation class, which allows to create huge GEOM
providers with limited physical storage and add physical storage as
needed.

Submitted by:	Ivan Voras
Sponsored by:	Google Summer of Code 2006
Approved by:	re (kensmith)
2007-09-23 07:34:23 +00:00
pjd
9afb74d049 Add support for Camellia encryption algorithm.
PR:		kern/113790
Submitted by:	Yoshisato YANAGISAWA <yanagisawa@csg.is.titech.ac.jp>
Approved by:	re (bmah)
2007-09-01 06:33:02 +00:00
marcel
3455d229da Have gpart synthesize a disk geometry if the underlying provider
don't have it. Some partitioning schemes, as well as file systems,
operate on the geometry and without it such schemes (e.g. MBR)
and file systems (e.g. FAT) can't be created. This is useful for
memory disks.
2007-06-17 22:19:19 +00:00
marcel
3f70795dda Add the MBR partitioning scheme to g_part. This does not yet
support the ability to install boot code.
2007-06-13 04:27:36 +00:00
marcel
ed1819a480 Prefix unknown (i.e. un-aliased) partition types with '!'. This is
how they had to be given with ctlreq.
2007-06-06 05:06:14 +00:00
marcel
1094a6916b Call sbuf_finish() before sbuf_data() and sbuf_len(). 2007-06-06 05:01:41 +00:00
jeff
91d1501790 Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
dwmalone
771efb08f5 Despite several examples in the kernel, the third argument of
sysctl_handle_int is not sizeof the int type you want to export.
The type must always be an int or an unsigned int.

Remove the instances where a sizeof(variable) is passed to stop
people accidently cut and pasting these examples.

In a few places this was sysctl_handle_int was being used on 64 bit
types, which would truncate the value to be exported.  In these
cases use sysctl_handle_quad to export them and change the format
to Q so that sysctl(1) can still print them.
2007-06-04 18:25:08 +00:00
marcel
57822449b0 Fix a dereference in KASSERT. 2007-05-15 23:29:57 +00:00
marcel
8ea441aed5 o Implement automatic commit. It's enabled when the flags parameter
exists and contains the 'C' flag.
o  The partition label can be the empty string. It's how labels are
   cleared.
o  When an action fails, lower permissions when they were raised
   in order to allow the action. A failed action will not result
   in any uncommitted changes.
o  Allow the flags paremeter to be present but empty. It's the
   equivalent of not being present.
2007-05-15 20:14:55 +00:00
marcel
095436c238 Write the output parameter (if present) for the add, create, delete
destroy and modify verbs.
2007-05-09 05:37:53 +00:00
marcel
f02e42b61a When reverting the creation of a partitioning scheme on a provider,
the failure to probe an existing partitioning scheme means that no
previous partitioning scheme existed. Don't error. Just destroy the
geom.
2007-05-09 01:46:42 +00:00
marcel
3b37bd02b4 MFp4:
119373:	o  Remove the query verb, along with the request and response
	   parameters.
	o  Add the version and output parameters.
119390: [APM,GPT] Properly clear deleted entries.
119394:	o  Make the alias the standard and use the '!' to prefix
	   literal partition types.
	o  Treat schemes and partition types as case insensitive.
119462: [GPT] Fix a page fault caused when modifying a partition entry
	without a new partition type.
2007-05-08 20:18:17 +00:00
pjd
52b222af91 When deleting key, flush write cache after each overwrite, so we don't
overwrite data N times in cache and only once on disk.
2007-05-06 14:56:03 +00:00
pjd
5326cfc8d7 Allow to use ':' in d_ident, which is quite handy character. 2007-05-05 18:09:17 +00:00
pjd
592f466b1b Handle GEOM::ident attribute by attaching 'sX' string at the end of ident
received from the underlying provider, where X is pp->index value.

OK'ed by:	phk
2007-05-05 17:52:22 +00:00
pjd
9409284b5b Because there are many strange hardware out there, allow to use only
[a-zA-Z0-9-_@#%.] characters in d_ident field.
2007-05-05 17:47:20 +00:00
pjd
4e8b8cd34e - Extend disk structure to allow to store disk's serial number, which can be
retrieved via GEOM::ident attribute.
- Bump disk(9) ABI version.

OK'ed by:	phk
2007-05-05 17:12:15 +00:00
pjd
adc7ddd991 Implement three new ioctls that can be used with GEOM provider:
DIOCGFLUSH - Flush write cache (sends BIO_FLUSH).

	DIOCGDELETE - Delete data (mark as unused) (sends BIO_DELETE).

	DIOCGIDENT - Get provider's uniqe and fixed identifier (asks for
		GEOM::ident attribute).

First two are self-explanatory, but the last one might not be. Here are
properties of provider's ident:

- ident value is preserved between reboots,
- provider can be detached/attached and ident is preserved,
- provider's name can change - ident can't,
- ident value should not be based on on-disk metadata; in other words
  copying whole data from one disk to another should not yield the same
  ident for the other disk,
- there could be more than one provider with the same ident, but only if
  they point at exactly the same physical storage, this is the case for
  multipathing for example,
- GEOM classes that consumes single providers and provide single providers,
  like geli, gbde, should just attach class name to the ident of the
  underlying provider,
- ident is an ASCII string (is printable),
- ident is optional and applications can't relay on its presence.

The main purpose for this is that application and remember provider's ident
and once it tries to open provider by its name again, it may compare idents
to be sure this is the right provider. If it is not (idents don't match),
then it can open provider by its ident.

OK'ed by:	phk
2007-05-05 17:02:19 +00:00
pjd
835266e088 Implement g_delete_data() similar to g_read_data() and g_write_data().
OK'ed by:	phk
2007-05-05 16:35:22 +00:00
pjd
ddfa2416f5 - Implement helper g_handleattr_str() function for string attributes
handling.
- Extend g_handleattr() to treat attribute as string when len=0.

OK'ed by:	phk
2007-05-05 16:33:44 +00:00
marcel
377294ae02 Put the scheme (APM, GPT, etc) in the XML. 2007-04-27 05:58:10 +00:00
simokawa
172f73729f If compressed length is zero, return a zero-filled block.
MFC after: 1 week
2007-04-24 06:30:06 +00:00
le
5b070780c0 -) Correct sdcount for a plex when removing or adding subdisks.
-) Set correct sizes for plexes and volumes a subdisk has been removed.

Submitted by:   Ulf Lilleengen <lulf_AT_freebsd.org>
2007-04-12 17:54:35 +00:00
le
1652a41e4b Avoid infinite loop if the device string given for a drive
only consists of "/".

Submitted by:  Ulf Lilleengen <lulf_AT_freebsd.org>
2007-04-12 17:40:44 +00:00
pjd
f0a2e6d38c Use root_mounted(). 2007-04-08 23:54:23 +00:00
simokawa
c52b092310 Fix a bug for over 4GB media.
MFC after: 3 days
2007-04-07 02:52:13 +00:00
pjd
1b48438fa6 Sysctl description is not a format string, so one % is enough. 2007-04-06 12:53:54 +00:00
delphij
29a66510eb - Be more verbose when saying "foo" not found.
- In gctl_get_geom(), don't issue error when we were not
   provided with an parameter, like gctl_get_provider() did.

Reviewed by:	pjd
2007-03-30 16:32:08 +00:00
kris
21b5ddcd2e make_dev(9) can be (and is) called without Giant, so there is no need to
drop the topology lock and acquire Giant around this call.

Reviewed by:	phk
2007-03-26 21:47:03 +00:00
pjd
fe8d58a251 Add missing \n. 2007-03-22 15:42:13 +00:00
sam
f96ba7ffda Overhaul driver/subsystem api's:
o make all crypto drivers have a device_t; pseudo drivers like the s/w
  crypto driver synthesize one
o change the api between the crypto subsystem and drivers to use kobj;
  cryptodev_if.m defines this api
o use the fact that all crypto drivers now have a device_t to add support
  for specifying which of several potential devices to use when doing
  crypto operations
o add new ioctls that allow user apps to select a specific crypto device
  to use (previous ioctls maintained for compatibility)
o overhaul crypto subsystem code to eliminate lots of cruft and hide
  implementation details from drivers
o bring in numerous fixes from Michale Richardson/hifn; mostly for
  795x parts
o add an optional mechanism for mmap'ing the hifn 795x public key h/w
  to user space for use by openssl (not enabled by default)
o update crypto test tools to use new ioctl's and add cmd line options
  to specify a device to use for tests

These changes will also enable much future work on improving the core
crypto subsystem; including proper load balancing and interposing code
between the core and drivers to dispatch small operations to the s/w
driver as appropriate.

These changes were instigated by the work of Michael Richardson.

Reviewed by:	pjd
Approved by:	re
2007-03-21 03:42:51 +00:00
pjd
b23a2a2ffb Warn when user use sectorsize bigger than the page size, which will lead
to problems when the geli device is used with file system or as a swap.

Hopefully will prevent problems like kern/98742 in the future.

MFC after:	1 week
2007-03-05 12:41:44 +00:00
pjd
38868f2cec Fix geli after last commit for UP systems that are running SMP kernel.
Submitted by:	Hyo geol, Lee <hyogeollee@gmail.com>
MFC after:	1 week
2007-03-02 09:38:16 +00:00
jhb
9081d44243 Use pause() rather than tsleep() on stack variables and function pointers. 2007-02-27 17:23:29 +00:00
mjacob
05b92097cb First cut at GEOM based multipath. This is an active/passive{/passive...}
arrangement that has no intrinsic internal knowledge of whether devices
it is given are truly multipath devices. As such, this is a simplistic
approach, but still a useful one.

The basic approach is to (at present- this will change soon) use camcontrol
to find likely identical devices and and label the trailing sector of the
first one. This label contains both a full UUID and a name. The name is
what is presented in /dev/multipath, but the UUID is used as a true
distinguishor at g_taste time, thus making sure we don't have chaos
on a shared SAN where everyone names their data multipath as "Fred".

The first of N identical devices (and N *may* be 1!) becomes the active
path until a BIO request is failed with EIO or ENXIO. When this occurs,
the active disk is ripped away and the next in a list is picked to
(retry and) continue with.

During g_taste events new disks that meet the match criteria for existing
multipath geoms get added to the tail end of the list.

Thus, this active/passive setup actually does work for devices which
go away and come back, as do (now) mpt(4) and isp(4) SAN based disks.

There is still a lot to do to improve this- like about 5 of the 12
recommendations I've received about it,  but it's been functional enough
for a while that it deserves a broader test base.

Reviewed by: pjd
Sponsored by: IronPort Systems
MFC: 2 months
2007-02-27 04:01:58 +00:00
jhb
61da08318e Use tsleep() rather than msleep() with a NULL mtx parameter. 2007-02-23 23:06:10 +00:00
n_hibma
3d196e1a91 Reduce the noise when plugging in (USB) mass storage devices, like a 4 port
flash card reader.
Also remove an 'Opened da0 -> <random number>' which is not needed on a daily
basis (available through bootverbose).

Reviewed by:	phk, ken
MFC after:	1 week
2007-02-21 07:45:02 +00:00
rodrigc
4b93723aab #include <sys/systm.h> before <sys/geom.h> to get KASSERT(), and fix LINT build. 2007-02-08 04:02:56 +00:00
marcel
0245423ad8 Evolve the ctlreq interface added to geom_gpt into a generic
partitioning class that supports multiple schemes. Current
schemes supported are APM (Apple Partition Map) and GPT.
Change all GEOM_APPLE anf GEOM_GPT options into GEOM_PART_APM
and GEOM_PART_GPT (resp).

The ctlreq interface supports verbs to create and destroy
partitioning schemes on a disk; to add, delete and modify
partitions; and to commit or undo changes made.
2007-02-07 18:55:31 +00:00
pjd
cb51d8d011 We expect 'bio_data != NULL' for BIO_{READ,WRITE,GETATTR}, but for
BIO_{DELETE,FLUSH} we expect 'bio_data == NULL'.

Reviewed by:	phk
2007-01-28 23:36:07 +00:00
pjd
4e4fa80cab It is possible that GEOM taste provider before SMP is started.
We can't bind to a CPU which is not yet on-line, so add code that wait for
CPUs to go on-line before binding to them.

Reported by:	Alin-Adrian Anton <aanton@spintech.ro>
MFC after:	2 weeks
2007-01-28 20:29:12 +00:00
kib
fdd50404d1 Cylinder group bitmaps and blocks containing inode for a snapshot
file are after snaplock, while other ffs device buffers are before
snaplock in global lock order. By itself, this could cause deadlock
when bdwrite() tries to flush dirty buffers on snapshotted ffs. If,
during the flush, COW activity for snapshot needs to allocate block
and ffs_alloccg() selects the cylinder group that is being written
by bdwrite(), then kernel would panic due to recursive buffer lock
acquision.

Avoid dealing with buffers in bdwrite() that are from other side of
snaplock divisor in the lock order then the buffer being written. Add
new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in
the bdwrite(). Default implementation, bufbdflush(), refactors the code
from bdwrite(). For ffs device buffers, specialized implementation is
used.

Reviewed by:	tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes)
Tested by:	Peter Holm
X-MFC after:	3 weeks (if ever: it changes ABI)
2007-01-23 10:01:19 +00:00
pjd
c3fbfd0542 Softc may be NULL in g_journal_orphan(), so don't be surprised. 2006-12-02 09:10:29 +00:00
pjd
fa40850739 Fix ia64 build breakage. 2006-11-02 16:24:18 +00:00
pjd
ba0f348261 - Use g_duplicate_bio() instead of g_clone_bio(), so there memory is
allocated with M_WAITOK flag.
- Check 'buf' instead of 'error' so Prevent is not confused.

CID:		1562, 1563
Found by:	Coverity Prevent analysis tool
2006-11-02 09:14:18 +00:00
pjd
9fbd00e878 I want CPU number here.
Noticed by:	ru
2006-11-02 09:01:34 +00:00
pjd
bd92877da9 Grr, fix one more build breakage. 2006-11-02 00:37:39 +00:00
pjd
b34fb80d83 Now, that we have gjournal in the tree add possibility to configure
gmirror and graid3 in a way that it is not resynchronized after a
power failure or system crash.
It is safe when gjournal is running on top of gmirror/graid3.
2006-11-01 22:51:49 +00:00
pjd
cf33008f77 Change spaces to tabs where needed. 2006-11-01 22:16:53 +00:00
pjd
d55b9d74e2 Skip disabled CPU, because after we sched_bind() to a disabled CPU,
we won't be able to exit from the thread.

Function g_eli_cpu_is_disabled() stoled from kern_pmc.c.

PR:		104669
Reported by:	Nikolay Mirin <nik@optim.com.ru>
MFC after:	1 week
2006-11-01 16:05:06 +00:00
pjd
64d3eaa81e Forgot to remove this line.
Reported by:	maxim
2006-11-01 14:09:59 +00:00
pjd
fe4b3f9ce1 Add BIO_FLUSH support to GSHSEC class. 2006-11-01 12:30:51 +00:00
pjd
457f8e4c90 Add BIO_FLUSH support to GPT class. 2006-11-01 12:29:49 +00:00
pjd
d639eb8d4b Update the code to the current sync(2) version:
- Do not modify mnt_flag without mount interlock held.
- Do not touch MNT_ASYNC flag, as this can lead to a race with nmount(2).

Pointed out by:	tegge
Reviewed by:	tegge
2006-11-01 09:37:11 +00:00
pjd
a9b80b86ae Remove debugging code I accidentally committed. 2006-11-01 01:19:13 +00:00
pjd
66101a4314 Add gjournal GEOM class (kernel side), which implements block level
journaling and can be tought about marking file system as clean before
doing journal switch, which easly allows to add journaling to file
systems that don't have this feature.

Sponsored by:	home.pl
2006-10-31 21:31:00 +00:00
pjd
c33849dc41 Implement BIO_FLUSH handling by simply passing it down to the components.
Sponsored by:	home.pl
2006-10-31 21:23:51 +00:00
pjd
67c00d09c1 Add a new disk flag - DISKFLAG_CANFLUSHCACHE, which indicates that the disk
can handle BIO_FLUSH requests.

Sponsored by:	home.pl
2006-10-31 21:12:43 +00:00
pjd
d5cc909451 Add a new I/O request - BIO_FLUSH, which basically tells providers below to
flush their caches. For now will mostly be used by disks to flush their
write cache.

Sponsored by:	home.pl
2006-10-31 21:11:21 +00:00
pjd
e4e060fa9e Guard against invalid metadata.
MFC after:	1 week
2006-10-10 15:01:47 +00:00
ru
f53bc81fe1 A GEOM cache can speed up read performance by sending fixed size
read requests to its consumer.  It has been developed to address
the problem of a horrible read performance of a 64k blocksize FS
residing on a RAID3 array with 8 data components, where a single
disk component would only get 8k read requests, thus effectively
killing disk performance under high load.  Documentation will be
provided later.  I'd like to thank Vsevolod Lobko for his bright
ideas, and Pawel Jakub Dawidek for helping me fix the nasty bug.
2006-10-06 08:27:07 +00:00
pjd
d60aaf3441 One more white space fix. 2006-09-30 08:23:06 +00:00
pjd
cb0554d5cc Remove trailing spaces. 2006-09-30 08:16:49 +00:00
pjd
8cff3b898f Remove trailing spaces. 2006-09-30 08:01:11 +00:00
pjd
5b67d8da02 Fix detecting of UFS1 label when mediasize%fragsize != 0.
Submitted by:	Stanislav Sedov
PR:		kern/84637
MFC after:	1 week
2006-09-16 11:24:41 +00:00
pjd
2e387b9b85 Add 'configure' subcommand which for now only allows setting and removing
of the BOOT flag. It can be performed on both attached and detached
providers.

Requested by:	Matthias Lederhofer <matled@gmx.net>
MFC after:	1 week
2006-09-16 10:43:17 +00:00
pjd
f5e129df20 Add __printflike() to gctl_error().
Approved by:	phk
MFC after:	1 week
2006-09-16 10:39:07 +00:00
pjd
4f982725d1 Small fixes after adding __printflike() to gctl_error().
Approved by:	phk
MFC after:	3 days
2006-09-16 09:48:29 +00:00
pjd
5d795537ae Remove extra arguments.
MFC after:	3 days
2006-09-16 07:47:57 +00:00
pjd
556424a17a Add 'show geom [addr]' ddb(4) command, which prints entire GEOM topology if
no additional argument is given or details about the given GEOM object
(class, geom, provider or consumer).

Approved by:	phk
2006-09-15 16:36:45 +00:00
pjd
610c4b7a06 Fix synchronization in gmirror and graid3 which I broken. Synchronization
request can still have bio_to set to sc_provider (this is READ part of a
synchronization request) and in this case g_{mirror,raid3}_sync() wasn't
called as it should be.

MFC after:	1 week
2006-09-13 15:46:49 +00:00
pjd
d06bfaa1a9 Delay an orphan event if provider has still in-flight I/O requests.
This way GEOM classes can safely detach from provider when an orphan
event is received. This fixes 'detach with active requests' panic for
gstripe/gconcat under load.

PR:		kern/102766
Submitted by:	mjacob
OK'ed by:	phk
MFC after:	1 week
2006-09-10 09:11:54 +00:00
jmg
ecd9e77d3e move created/detected/activated under debug level 1 to quiet the common case..
add count of active and total components to the launched line so you can
see at a glance if your mirror/raid3 is complete...

now:
GEOM_MIRROR: Device mirror/sam launched (2/2).

Reviewed by:	pjd
2006-09-09 21:45:37 +00:00
pjd
dd3e975df2 Fix format character.
Reported by:	andre
2006-09-08 13:46:18 +00:00
pjd
454c903c07 Bump copyright year. 2006-09-08 10:20:44 +00:00
pjd
bc38d5de48 Use __FBSDID in .c files. 2006-09-08 10:19:24 +00:00
pjd
40cda51553 - Split failure probability configuration into read failure probability and
write failure probability.
- Allow to specify an error number to return of failure.

MFC after:	3 days
2006-09-08 09:21:21 +00:00
pjd
5c567602d8 Fix problems with destroy and forcible destroy functionality:
- hold/release device in start/done routines, this will probably slow
  down things a bit, but previous code was racy;
- only release device if g_gate_destroy() failed - if it succeeded device
  is dead and there is nothing to release;
- various other changes which makes forcible destruction reliable.

MFC after:	3 days
2006-09-05 21:56:00 +00:00
imp
db85f415fa while (0); -> while (0) in multi-line macros 2006-08-17 22:50:33 +00:00
pjd
c0667f3ce5 Handle MSDOS file systems properly. Before the change file systems
created on Windows XP (and others maybe) were not detected.
We detected only those created with newfs_msdos(8).

Submitted by:		Tobias Reifenberger <treif@mayn.de>
style(9)ified by:	pjd
2006-08-12 15:34:15 +00:00
pjd
f615f3af6a Verify if a label doesn't point to the parent directory. 2006-08-12 15:30:24 +00:00
pjd
3a923dc027 Before using byte offset for IV creation, covert it to little endian.
This way one will be able to use provider encrypted on eg. i386 on
eg. sparc64. This doesn't really buy us much today, because UFS isn't
endian agnostic.

We retain backward compatibility by setting G_ELI_FLAG_NATIVE_BYTE_ORDER
flag on devices with version number less than 2 and not converting the
offset.
2006-08-11 19:09:12 +00:00
pjd
d9810ee8e2 Forgot to bump version number after G_ELI_FLAG_READONLY flag addition. 2006-08-11 18:39:58 +00:00
marcel
52f0123d8d Strengthen the check for a PMBR:
o PMBR partitions count to the number of partitions on the disk, which
  means that if a PMBR entry is invalid we will not treat the MBR as a
  PMBR by virtue of it not describing any partitions.
  Previously the checks were inconsistent in that an invalid PMBR entry
  would be harmless when no other partitions exist (we would treat the
  MBR as a PMBR by virtue of it being empty), but it would be fatal when
  there is at least one other partition.
o The partition size of a PMBR partition is one less than the media size
  because the GPT starts at the second sector (LBA 1) and extends to
  the end of the media. For backward bug-compatibility we accept a size
  that's exactly the media size (FreeBSD bug).
  Also, when the partition size can not be represented in a 32-bit
  integral, the partition size in the MBR is to be set to 0xFFFFFFFF.
  Accept this as a valid size, even if the size can be represented.
2006-08-09 20:53:01 +00:00
pjd
b2ae936be5 Allow geli to operate on read-only providers.
Initial patch from:	vd
MFC after:		2 weeks
2006-08-09 18:11:14 +00:00
pjd
b539048d28 Not only a request from us can be passed to g_{mirror,raid3}_worker()
function, but also a request to us, in which case checking bio_cflags
is wrong, because the class above us is controling it, not we.

MFC after:	1 week
2006-08-09 09:41:53 +00:00
marcel
e4aeb824ed Fix a phase-ordering bug: check the mediasize and sectorsize after
we obtained access. It is possible that GPT gets to taste a disk
first, which means the disk has not been opened before and it will
not get opened until after we checked the mediasize and sectorsize.
However, since the mediasize and sectorsize are determined at open
and that happens when access is optained, checking the mediasize
and sectorsize before obtaining access may result in GPT rejecting
the disk.
2006-08-08 21:33:26 +00:00
yar
209e4786e7 Commit the results of the typo hunt by Darren Pilgrim.
This change affects documentation and comments only,
no real code involved.

PR:		misc/101245
Submitted by:	Darren Pilgrim <darren pilgrim bitfreak org>
Tested by:	md5(1)
MFC after:	1 week
2006-08-04 07:56:35 +00:00
pjd
38bff79a10 Don't use f-word in comments. We are gentlemans.
Pointed out by:	Maciej Sobczak
2006-08-01 23:17:33 +00:00
yar
99e7c62f6f Fix what looks like a typo: MODULE_DEPEND() takes module names,
not KLD file names; and GELI module's name is g_eli, not geom_eli.

Approved by:	pjd (silence)
MFC after:	5 days
2006-07-27 11:52:12 +00:00
pjd
603264d10f Don't forget to initialize crp_olen field, which is used to calculate
bio_completed value.
2006-07-22 10:05:55 +00:00
pjd
27c2ca3212 Always allow to specify components with /dev/ prefix.
MFC after:	3 days
2006-07-13 20:37:59 +00:00
pjd
c146aa7e54 Only check if we're freeing a valid object if we hold the topology lock.
This prevents panic under heavy load with DIAGNOSTIC compiled in.
2006-07-12 15:44:00 +00:00
pjd
ee41eea403 Use proper defines instead of magic values.
MFC after:	1 week
2006-07-10 21:18:00 +00:00
pjd
dfcf2677fc When kern.geom.raid3.use_malloc tunnable is set to 1, malloc(9) instead of
uma(9) will be used for memory allocation.
In case of problems or tracking bugs, there are more useful tools for malloc(9)
debugging than for uma(9) debugging, like memguard(9) and redzone(9).

MFC after:	1 week
2006-07-09 12:25:56 +00:00
pjd
706381c9ed Remove bogus assertion.
Reported by:	Bradley W. Dutton <brad-fbsd-stable@duttonbros.com>
MFC after:	3 days
2006-07-07 14:32:27 +00:00
pjd
444d196b29 Allow to close access even if device is already destroyed.
Reported by:	Ulrich Spoerlein <uspoerlein@gmail.com>
PR:		kern/98093
MFC after:	1 week
2006-07-03 10:32:38 +00:00
sobomax
c410e97673 Improve check for protective MBR. Instead of assiming that protective
MBR should have only one entry of type 0xEE, consider protective MBR
to be one, that has at least one entry of type 0xEE covering the whole
unit. This makes GEOM_GPT compatible with disks partitioned by the
Apple's BootCamp.

Approved in principle by:       marcel
MFC After:			1 month
2006-06-26 00:32:54 +00:00
simon
9d0350bdbe In g_dev_strategy(), when failing an IO request with EINVAL due to
offset or request size which is not a multiple of the sector size, make
sure that the bio is set to indicate that no data has actually been
transferred.

The result of this is that the file offset is no longer incremented for
these requests.  The fact that the file offset was incremented broke
fdisk(8)'s probing of sector size for non-512 byte sector sizes.

Reviewed by:	phk, cperciva
Submitted by:	mdodd
MFC after:	2 weeks
2006-06-18 22:01:15 +00:00
pjd
ec70ef58cb Allow to use the old -a option to specify an encryption algorithm to use
(for backward compatibility), but print a warning to inform about the
change.
2006-06-06 22:06:24 +00:00
pjd
dfb8f689dd - Unbreak the build when geli is compiled into the kernel (on as module),
by silencing unfounded compiler warning.

Reported by:
2006-06-06 14:48:19 +00:00
pjd
3af66839d0 Implement data integrity verification (data authentication) for geli(8).
Supported by:	Wheel Sp. z o.o. (http://www.wheel.pl)
2006-06-05 21:38:54 +00:00
pjd
c7f4418287 Make kern.geom.eli.overwrites sysctl a tunable as well. 2006-06-05 21:25:19 +00:00
pjd
280370a7da Add g_duplicate_bio() function which does the same thing what g_clone_bio()
is doing, but g_duplicate_bio() allocates new bio with M_WAITOK flag.
2006-06-05 21:13:22 +00:00
marcel
00649b1143 Fix unaligned memory accesses on Alpha and possible other platforms.
By using a pointer to struct dos_partition, we implicitly tell the
compiler that the pointer is 4-bytes aligned, even though we know
that's not the case. The fact that we only dereference the pointer
to access a byte-wide field (field dp_ptyp) is not a guarantee that
the compiler will in fact use a byte-wide load. On some platforms
it's more efficient to use long word or quad word loads and use
bit-shifting and bit-masking to get the intended byte. On those
platforms an misaligned load will be the result.
The fix is to use byte-wide pointer arithmetic based on sizeof() and
offsetof() to avoid invalid casts which avoids that the compiler
makes invalid assumptions.

Backtrace provided by: wilko@
MFC after: 1 week
2006-06-04 20:26:13 +00:00
ceri
f0549dc49b Remove the trailing half of a sentence which was clearly superceded
by the preceding one some time during editing.
2006-05-24 11:02:32 +00:00
pjd
488dfb1ea0 Use G_RAID3_FOREACH_SAFE_BIO() macro instead of G_RAID3_FOREACH_BIO() in
two places where g_io_request() is called. g_io_request() can free bio
structure so we can't reference it after and G_RAID3_FOREACH_BIO() macro
was doing this.

Found by:	Coverity Prevent analysis tool (with my new models)
MFC after:	1 day
2006-05-04 13:01:16 +00:00
pjd
3c465c60bb We shouldn't lock the topology here - we will panic on assertion inside
g_raid3_bump_syncid().

Reported by:	Bradley W. Dutton <brad-fbsd-stable@duttonbros.com>
MFC after:	1 day
2006-04-30 22:14:17 +00:00
pjd
b084a5ad26 - Don't hold the device sx lock when going to sleep.
- Prevent possible live-lock in case of memory problems by freeing
  already completed requests first.

Reported and tested by:	markus, Bradley W. Dutton <brad-fbsd-stable@duttonbros.com>
MFC after:		1 day
2006-04-28 12:18:03 +00:00
pjd
c930d9ab2f - Remove dead code.
- Comment possible event miss, which isn't critical, but probably can be
  fixed by replacing the event lock usage with the queue lock.

MFC after:	2 weeks
2006-04-28 12:13:49 +00:00
pjd
4b38e5bbca Be sure to not destroy device twice. This is not possible in theory, but
with this change there is even no theoretical race.

MFC after:	2 weeks
2006-04-28 11:52:45 +00:00
pjd
f430b234fb Be sure to not destroy device twice. This is not possible in theory, but
with this change there is even no theoretical race.

MFC after:	2 weeks
2006-04-28 11:47:28 +00:00
pjd
7df97ee4bb geli(8) provides keys on newsession time, so remove CRD_F_KEY_EXPLICIT flag
as HW crypto drivers don't support it.
2006-04-20 06:33:46 +00:00
pjd
7a0948ca69 Fix storing offset of already synchronized data. Offset in entire array was
stored in metadata instead of an offset in single disk.
After reboot/crash synchronization process started from a wrong offset
skipping (not synchronizing) part of the component which can lead to data
corrutpion (when synchronization process was interrupted on initial
synchronization) or other strange situations like 'graid3 status' showing
value more than 100%.

Reported, reviewed and tested by:	ru
Reported by:	Dmitry Morozovsky <marck@rinet.ru>
MFC after:	1 day
2006-04-18 13:52:11 +00:00
pjd
9727721d25 Correct debug: we are sending child bio here, not parent bio.
MFC after:	1 week
2006-04-15 18:30:42 +00:00
cracauer
452517900b Make CCD be able to read and write Linux software raids.
Supported for raid-0 with <n> disks, raid-1 with 2 disks.

Manpages have examples, warnings etc.

Test scripts on
http://www.cons.org/cracauer/ccdconfig-linux/
Reviewed by:	alfred
2006-04-13 20:35:31 +00:00
pjd
dd51e2368b Pass BIO_GETATTR requests down.
MFC after:	1 week
2006-04-12 12:18:44 +00:00
pjd
d7eb5b2fe9 Introduce and use delayed-destruction functionality from a pre-sync hook,
which means that devices will be destroyed on last close.

This fixes destruction order problems when, eg. RAID3 array is build on
top of RAID1 arrays.

Requested, reviewed and tested by:	ru
MFC after:	2 weeks
2006-04-10 10:32:22 +00:00
marcel
74d3377eaa MFp4:
o  Implement the remove verb to remove a partition entry.
o  Improve error reporting by first checking that the verb is valid.
o  Add an entry parameter to the add verb. this parameter can be
   both read-only as welll as read-write and specifies the entry
   number of the newly added partition.
o  Make sure that the provider is alive when passed to us. It may
   be withering away.
o  When adding a new partition entry, test for overlaps with existing
   partitions.
2006-04-10 04:03:14 +00:00
marcel
c168f9530e Add g_wither_provider() to abstract the details of destroying a
particular provider. Use this function where g_orphan_provider()
is being called so that the flags are updated correctly and
g_orphan_provider() is called only when allowed.
2006-04-10 03:55:13 +00:00
marcel
a5bd277da0 Change gctl_set_param() to return an error instead of setting an
error on the request.  Add a wrapper, gctl_set_param_err(), that
sets the error on the request from the error returned by
gctl_set_param() and update current callers of gctl_set_param()
to call gctl_set_param_err() instead.
This makes gctl_set_param() much more usable in situations where
the caller knows better what to do with certain (apparent) error
conditions and setting an error on the request is not one of the
things that need to be done.
2006-04-07 16:19:48 +00:00
pjd
c94d951bcd Typos. 2006-04-05 22:07:31 +00:00
pjd
f0667561aa Revert previous change, as I fixed MD5(9). 2006-03-30 18:50:00 +00:00
pjd
2a7268cfd4 md_hash field in g_eli_metadata structure is not 4 byte aligned, which
case panic on sparc64.

The problem is in MD5(9) implementation. The Encode() function takes
'unsigned char *output' as its first argument, which is then assigned to
'u_int32_t *op'. If the 'output' argument is not 4 byte aligned (and in
geli(8) case it is not), sparc64 machine will panic.

I don't know how to fix MD5(9) in a clean way, so I'm implementing a
work-around in geli(8).

Reported by:	brueffer
MFC after:	3 days
2006-03-30 14:41:13 +00:00
le
148e4e97f6 Protect from creating striped and RAID5 plexes with unequally sized
subdisks.
2006-03-30 14:01:25 +00:00
pjd
568ba3bc0f - 'ndisks' variable is not boolean, so compare it with a value.
- Keep conditions order consistent with the comment above.

MFC after:	3 days
2006-03-30 12:15:41 +00:00
pjd
46a2a98421 Preserve previous behaviour of kern.geom.raid3.n{64,16,4}k tunables were 0
means unlimited.

Reported by:	ru
MFC after:	3 days
2006-03-28 18:34:36 +00:00
pjd
2f146bc4fd Increase debug level for "Thread exiting." message. It's not that important
and is 0 by accident.

MFC after:	3 days
2006-03-25 23:30:36 +00:00
le
9a2fc25611 Fix whitespace. 2006-03-23 20:01:13 +00:00
le
80efd8a6c8 Implement the 'resetconfig' command.
PR:            kern/94835
Submitted by:  Ulf Lilleengen <lulf@stud.ntnu.no>
2006-03-23 19:58:43 +00:00
pjd
ba3414666e Update copyright for 2006. 2006-03-19 12:55:51 +00:00
pjd
5990508a15 kern.geom.raid3.sync_requests=2 seems to be a better default - it still
keeps disks very busy, but makes system much more responsive.

While here, kill extra space.
2006-03-19 11:18:33 +00:00
pjd
fadb519311 kern.geom.mirror.sync_requests=2 seems to be a better default - it still
keeps disks very busy, but makes system much more responsive.

While here, kill extra space.
2006-03-19 10:49:05 +00:00
ru
740bc18a1b Fix a typo. 2006-03-13 14:59:57 +00:00
ru
9348187bf1 Fix build on 64-bit platforms. 2006-03-13 14:48:45 +00:00
pjd
349adc9b52 - Reimplement I/O data allocation to prevent deadlocks.
Submitted by:	green

- Speed up synchronization process by using configurable number of I/O
  requests in parallel.
  + Add kern.geom.raid3.sync_requests tunable which defines how many parallel
    I/O requests should be used.
  + Retire kern.geom.raid3.reqs_per_sync and kern.geom.raid3.syncs_per_sec
    sysctls.
- Fix race between regular and synchronization requests.
- Reimplement raid3's data synchronization - do not use the topology lock
  for this purpose, as it may case deadlocks.
- Stop synchronization from pre-sync hook.
- Fix some other minor issues.

Tested by:	Mike Tancsa <mike@sentex.net>
MFC after:	3 days
2006-03-13 01:03:18 +00:00
pjd
11cbb2f275 - Speed up synchronization process by using configurable number of I/O
requests in parallel.
  + Add kern.geom.mirror.sync_requests tunable which defines how many parallel
    I/O requests should be used.
  + Retire kern.geom.mirror.reqs_per_sync and kern.geom.mirror.syncs_per_sec
    sysctls.
- Fix race between regular and synchronization requests.
- Reimplement mirror's data synchronization - do not use the topology lock
  for this purpose, as it may case deadlocks.
- Stop synchronization from pre-sync hook.
- Fix some other minor issues.

MFC after:	3 days
2006-03-13 00:58:41 +00:00
pjd
f0925fcaf9 When inserting a new component md_provsize metadata field wasn't set, which
means that old problem was triggered (when two providers end at the same
offset, eg. ad0 and ad0s1 and the wrong was is picked up by gmirror/graid3).

Reported by:	Michal Suszko <dry@dry.pl>
MFC after:	3 days
2006-03-10 07:41:31 +00:00
pjd
1c595687a8 Allow to dump kernel to gmirror providers.
Some conditions have to be met to make it work properly. This will be
described in the manual page.

MFC after:	3 days
2006-03-08 08:27:33 +00:00
pjd
ef0f2742d9 We need to check if file system size is equal to provider's size, because
sysinstall(8) still bogusly puts first partition at offset 0 instead of 16,
so glabel/ufs will find file system on slice instead of partition.

Before sysinstall is fixed, we must keep this code, which means that we
wont't be able to detect UFS file systems created with 'newfs -s ...'.

PS. bsdlabel(8) creates partitions properly.

MFC after:	3 days
2006-03-04 19:41:54 +00:00
jeff
8e6862e21e - Lock Giant if needed around the call to vnode_create_vobject(). This is
only important if devfs is not mpsafe.

Sponsored by:	Isilon Systems, Inc.
Found by:	kris
2006-03-02 05:37:44 +00:00
pjd
b4b6876e6e Assert proper use of bio_caller1, bio_caller2, bio_cflags, bio_driver1,
bio_driver2 and bio_pflags fields.

Reviewed by:	phk
2006-03-01 19:01:58 +00:00
pjd
46e57ae3d3 Do not use bio structure after g_io_deliver(), it may not longer by valid.
Found and fixed by:	Vsevolod Lobko <seva@ip.net.ua>
MFC after:		3 days
2006-02-22 10:21:05 +00:00
pjd
5729ae2a57 Inform when label disappears.
MFC after:	3 days
2006-02-18 11:24:00 +00:00
pjd
c2f8ebb2b6 Allow to use g_slice_orphan() from outside.
MFC after:	3 days
2006-02-18 11:21:17 +00:00
pjd
55a384575f - Do not depend on fact that file system covers entire provider.
It won't work for file systems created with -s option.
  Use better file system verfication.
- Add myself to the copyright.

MFC after:	3 days
2006-02-18 10:59:47 +00:00
pjd
fe67232768 This function returns nothing. 2006-02-18 03:04:26 +00:00
pjd
14c2a07e91 If provider's sector size prevents reading SBLOCKSIZE bytes return
immediatelly.
2006-02-18 03:00:49 +00:00
pjd
dded50a417 On component state change to ACTIVE don't forget to update metadata.
MFC after:	3 days
2006-02-12 17:38:09 +00:00
pjd
a9a29a4821 Use time_uptime instead of time_second, as the latter may go backwards.
Suggested by:	ru
MFC after:	3 days
2006-02-12 17:36:09 +00:00
pjd
9357beb7f2 Allow to set kern.geom.raid3.disconnect_on_failure from loader.conf.
MFC after:	3 days
2006-02-12 02:01:38 +00:00
pjd
beaa5fcb4d - Add kern.geom.raid3.disconnect_on_failure sysctl/tunnable (default to 1
to preserve currect behaviour). When set to 0, components are not
  disconnected - graid3 will try to still use them (only first error will
  be logged). This is helpful when we have two broken components, but in
  different places, so actually all data is available.
  Such buggy component will be visible in 'graid3 list' output with flag
  BROKEN.
- Never disconnect the last valid component. If we detect errors there we
  will just pass them up. This wasn't reasonable to deny access to the
  whole provider because of one broken sector.

Prodded by:	ru
MFC after:	3 days
2006-02-11 17:42:31 +00:00
pjd
392d25e4bc - Add kern.geom.mirror.disconnect_on_failure sysctl/tunnable (default to 1
to preserve currect behaviour). When set to 0, components are not
  disconnected - gmirror will try to still use them (only first error will
  be logged). This is helpful when we have two broken components, but in
  different places, so actually all data is available.
  Such buggy component will be visible in 'gmirror list' output with flag
  BROKEN.
- Never disconnect the last valid component. If we detect errors there we
  will just pass them up. This wasn't reasonable to deny access to the
  whole provider because of one broken sector.

Prodded by:	ru
MFC after:	3 days
2006-02-11 17:39:29 +00:00
pjd
1aa881eae6 Correct typo. 'fbp' is NULL here so this will result in a panic.
MFC after:	3 days
2006-02-11 17:29:06 +00:00
pjd
26f9aeb047 Mark array as CLEAN when there are no write requests in
kern.geom.raid3.idletime seconds. Write, not any requests.
Mark array as clean immediatelly on last write close.

Prodded by:	ru
MFC after:	3 days
2006-02-11 14:42:58 +00:00
pjd
ef80617741 Mark array as CLEAN when there are no write requests in
kern.geom.mirror.idletime seconds. Write, not any requests.
Mark array as clean immediatelly on last write close.

Prodded by:	ru
MFC after:	3 days
2006-02-11 14:42:23 +00:00
pjd
204d3235ab Teach geli how to load keyfiles before root file system is mounted.
An example entries for loader.conf to make it possible:

geli_da0_keyfile0_load="YES"
geli_da0_keyfile0_type="da0:geli_keyfile0"
geli_da0_keyfile0_name="/boot/keys/da0.key0"
geli_da0_keyfile1_load="YES"
geli_da0_keyfile1_type="da0:geli_keyfile1"
geli_da0_keyfile1_name="/boot/keys/da0.key1"
geli_da0_keyfile2_load="YES"
geli_da0_keyfile2_type="da0:geli_keyfile2"
geli_da0_keyfile2_name="/boot/keys/da0.key2"

geli_da1s3a_keyfile0_load="YES"
geli_da1s3a_keyfile0_type="da1s3a:geli_keyfile0"
geli_da1s3a_keyfile0_name="/boot/keys/da1s3a.key"

Thanks for jhb and kan who showed me the right direction.

MFC after:	3 days
2006-02-11 13:08:24 +00:00
pjd
f9926daa99 Check rootvnode variable to see if we still want to ask for passphrase on
boot. Other methods just don't work properly.

MFC after:	3 days
2006-02-11 12:45:01 +00:00
le
40531c331d Catch the case when a subdisk has no provider or no consumer
attached to it.
2006-02-08 21:32:45 +00:00
brueffer
1620f68fa6 Clean up some sysctl descriptions, debug messages etc.
Approved by:	pjd
MFC after:	3 days
2006-02-07 17:23:22 +00:00
pjd
6f074b6d64 Remove trailing spaces. 2006-02-01 12:06:01 +00:00
pjd
74978a10e1 Allow to specify only one disk. This is helpful when we want to extend
our concatenated device later.

MFC after:	1 week
2006-01-30 22:47:07 +00:00
pjd
1fe4753153 Fix typo which cased that 64kB elements limit was not set properly and
16kB elements limit wasn't set at all.

Submitted by:	Vsevolod Lobko <seva@ip.net.ua>
MFC after:	3 days
2006-01-30 22:45:43 +00:00
fjoe
c7ef8009e0 Rename geom_uzip class to g_uzip in order to be consistent with the naming
of other GEOM modules.

PR:		89998
2006-01-22 15:35:14 +00:00
pjd
9f8d658cf4 Fix bio leak in case of malloc(9) failure.
Found by:	Coverity Prevent(tm)
Coverity ID:	CID794
MFC after:	3 days
2006-01-18 21:44:57 +00:00
pjd
fb2c7cfc24 Remove dead code.
Found by:	Coverity Prevent(tm)
Coverity ID:	CID105
MFC after:	3 days
2006-01-18 21:43:27 +00:00
pjd
82bba3d6cd Remove dead code.
Found by:	Coverity Prevent(tm)
Coverity ID:	CID104
MFC after:	3 days
2006-01-18 21:42:19 +00:00
pjd
aae482441a Style cleanups.
X-MFC-after:	Already MFCed to RELENG_6 by accident.
2006-01-18 11:03:20 +00:00
pjd
0918b5cf25 Move $FreeBSD$ from comment to __FBSDID(). 2006-01-17 11:48:16 +00:00
pjd
bc614c3af5 - Use better types.
- Log problems at level 0 when killing providers.

MFC after:	3 days
2006-01-17 07:32:43 +00:00
pjd
df676bfd16 Check return value.
Found by:	Coverity Prevent(tm)
MFC after:	3 days
2006-01-17 07:30:34 +00:00
pjd
ad2b246949 Remove dead code.
Found by:	Coverity Prevent(tm)
MFC after:	3 days
2006-01-17 07:27:46 +00:00
pjd
1911dbbb9c Remove unused value.
Found by:	Coverity Prevent(tm)
MFC after:	3 days
2006-01-17 07:26:48 +00:00
pjd
be5920e329 Log situation when EIO is returned. 2006-01-17 07:23:36 +00:00
pjd
a80ec2a9b9 Remove bio leak when EIO error is emulated.
Found by:	Coverity Prevent(tm)
MFC after:	3 days
2006-01-17 07:22:44 +00:00
le
c29a43240c Get rid of the gv_bioq hack in most parts of the I/O path and
use the standard bioq structures.
2006-01-06 18:03:17 +00:00
pjd
684d795581 MFp4: Typo fix (without it the XML GEOM tree wasn't consistent).
Reported by:	Eric Anderson <anderson@centtech.com>
2005-12-19 06:05:40 +00:00
pjd
8848f7ef86 Fix build breakage by fixing typo.
Reported by:	glebius
2005-12-09 11:38:02 +00:00
pjd
3c3f39be74 - Allow to specify the byte which will be used for filling read buffer.
- Improve sysctl description a bit.

Submitted by:	Ivan Voras <ivoras@gmail.com>
2005-12-08 23:06:59 +00:00
pjd
7ea810fefd Teach NOP GEOM class how to gather the following statistics:
- number of read I/O requests,
- number of write I/O requests,
- number of read bytes,
- number of written bytes.
Add 'reset' subcommand for resetting statistics.
2005-12-08 23:00:31 +00:00
sobomax
44ba92fd8a It is unclear who is wrong and who is right, but when operating on
plain file bsdlabel(8) always writes label at a fixed offset from
its beginning (512 bytes), regardless of the sector size. At the same
time, bsdlabel geom class expects label to be available at the very
beginning of the second sector.

As a result, images prepared in userland for media with sector size
different from 512 bytes (i.e. 2k for cdroms) are not recognized by
the tasting mechanism.

Solve the problem by always looking for the label at 512-byte offset
if we can't find it at the beginning of the second sector and sector
size is not 512 bytes.
2005-11-30 22:54:41 +00:00
sobomax
16e3e466cc Don't pass error value pointer to g_read_data(9) at all if we don't
have any use of it.

Suggested by:	pjd
2005-11-30 22:15:00 +00:00
sobomax
29543921ea Check for g_read_data(9) errors properly:
o The only indication of error condition is NULL value returned by
  the function;

o value pointed to by error argument is undefined in the case when
  operation completes successfully.

Discussed with: phk
2005-11-30 19:24:51 +00:00
sobomax
937b629bd5 Kill leading whilespace. 2005-11-30 19:07:28 +00:00
pjd
fd9fb6b272 We do nothing with returned error value, so just remove it. 2005-11-29 12:07:10 +00:00
sobomax
ca7fa392b3 Check value returned by g_read_data(9), otherwise we can end in panic(9)
if read error happens.

MFC after:	1 week
2005-11-29 03:03:34 +00:00
le
885d3dc97f Add sysctl descriptions. 2005-11-25 10:09:30 +00:00
le
de257b1b7a Since we want a vinum geom created anytime the module loads, move
the geom creation to a seperate init function and ignore the tasting.

The config is now parsed only in the vinumdrive geom, which hopefully
fixes the problem, that the drive class tasted before the vinum class
had a chance, for good.

Also restore the behaviour that the module can be loaded at boot time
and on a running system.
2005-11-24 15:11:41 +00:00
le
00a607320a Whitespace. 2005-11-20 12:14:18 +00:00
le
d875cd85c6 Always declare variables at the start of the function.
Don't allocate potentially large variables on the stack.
Check strsep() return values when the string comes from userland.
Shorten variable names for lucidity's sake.

most of the stuff:
Pointed out by:    njl@
2005-11-20 12:12:31 +00:00
le
f2329c2aa6 Fix whitespace issue.
Pointed out by:   joel@
2005-11-20 10:40:06 +00:00
le
7f74b7e086 Finally bring in what was produced during Google SoC 2005:
Add functions to rename objects and to move a subdisk from one drive
to another.

Obtained from:  Chris Jones <chris.jones@ualberta.ca>
Sponsored by:   Google Summer of Code 2005
MFC in:         1 week
2005-11-19 20:25:18 +00:00
jdp
536960dbba Fix a bug that caused some /dev entries to continue to exist after
the underlying drive had been hot-unplugged from the system.  Here
is a specific example.  Filesystem code had opened /dev/da1s1e.
Subsequently, the drive was hot-unplugged.  This (correctly) caused
all of the associated /dev/da1* entries to be deleted.  When the
filesystem later realized that the drive was gone it closed the
device, reducing the write-access counts to 0 on the geom providers
for da1s1e, da1s1, and da1.  This caused geom to re-taste the
providers, resulting in the devices being created again.  When the
drive was hot-plugged back in, it resulted in duplicate /dev entries
for da1s1e, da1s1, and da1.

This fix adds a new disk_gone() function which is called by CAM when a
drive goes away.  It orphans all of the providers associated with the
drive, setting an error condition of ENXIO in each one.  In addition,
we prevent a re-taste on last close for writing if an error condition
has been set in the provider.

Sponsored by:   Isilon Systems
Reviewed by:    phk
MFC after:      1 week
2005-11-18 02:43:49 +00:00
marcel
59cb038f4a o Slightly refactor the ctlreq code to maximize code sharing between
verbs. Only the create verb operates on a provider. All other verbs
   operate on a GPT geom. Also, the GPT entry oriented verbs require
   a non-downgraded GPT.
o  Have all verbs take an optional flags parameter. The flags parameter
   is a string of single-letter flags. The typical use of these flags
   is to enable certain behaviour in support fo the gpt(8) tool.
o  Add dummy implementations for the destroy and recover verbs.

This change causes test 2 of the GPT regression test suite to fail.
The presence of a geom parameter is now required even for unknown
verbs.
2005-11-13 21:53:55 +00:00
marcel
8c0ef27b70 Make the kern.geom.conftxt sysctl more usable by also dumping the
MD class. Previously only the DISK class was dumped. The only
consumer of this sysctl is libdisk (i.e. sysinstall) and it tests
explicitly for instances of the DISK class. Dumping other classes
is therefore harmless.
By also dumping the MD class regression tests can be written that
use the MD class for operations that would normally be done on the
DISK class. The sysctl can now be used to test if those operations
took an effect. An example is partitioning.
2005-11-12 20:02:02 +00:00
rwatson
be4f357149 Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
pjd
38ae497c6a Fix possible live-lock under heavy load where we can't allocate more
memory for request.
I was sure graid3 should handle such situations well, but green@ reported
it is not and we want to fix it before 6.0.

Submitted by:	green
2005-10-28 20:25:02 +00:00
takawata
3224e8b29a Add checking for File record magic. 2005-10-26 03:24:28 +00:00
marcel
6c3e0035da Rough implementation of the create and add verbs. The verbs cause
in-memory changes only and as such are only useful for prototyping
and regression testing purposes.
2005-10-09 17:10:35 +00:00
tegge
dbcc5e2770 Move some devstat collection to below where large IO operations are chopped
up.  This make iostat report operations passed down to the device driver
instead of operations passed down to GEOM disk.  The transfer size limit
imposed by the device driver is no longer hidden, improving the correlation
between iostat output and device driver workload.
2005-09-30 17:32:08 +00:00
fjoe
20962cb412 - Fix "end_blk out of range" panic when INVARIANTS.
- Do not allow rw access.

Submitted by:	Dario Freni <saturnero at freesbie dot org>
MFC after:	3 days
2005-09-29 22:45:16 +00:00
marcel
c4edd4f5db o Don't cause a panic when the control request lacks a verb.
o  Don't set the error twice when the named class does not exist.
   It causes ioctl(2) to return with error EEXIST.
2005-09-18 23:54:40 +00:00