Commit Graph

2114 Commits

Author SHA1 Message Date
Kenneth D. Merry
c3fb2891f0 Fix a bug which causes a panic in daopen(). The panic is caused by
a da(4) instance going away while GEOM is still probing it.

In this case, the GEOM disk class instance has been created by
disk_create(), and the taste of the disk is queued in the GEOM
event queue.

While that event is queued, the da(4) instance goes away.  When the
open call comes into the da(4) driver, it dereferences the freed
(but non-NULL) peripheral pointer provided by GEOM, which results
in a panic.

The solution is to add a callback to the GEOM disk code that is
called when all of its resources are cleaned up.  This is
implemented inside GEOM by adding an optional callback that is
called when all consumers have detached from a provider, and the
provider is about to be deleted.

scsi_cd.c,
scsi_da.c:	In the register routine for the cd(4) and da(4)
		routines, acquire a reference to the CAM peripheral
		instance just before we call disk_create().

		Use the new GEOM disk d_gone() callback to register
		a callback (dadiskgonecb()/cddiskgonecb()) that
		decrements the peripheral reference count once GEOM
		has finished cleaning up its resources.

		In the cd(4) driver, clean up open and close
		behavior slightly.  GEOM makes sure we only get one
		open() and one close call, so there is no need to
		set an open flag and decrement the reference count
		if we are not the first open.

		In the cd(4) driver, use cam_periph_release_locked()
		in a couple of error scenarios to avoid extra mutex
		calls.

geom.h:		Add a new, optional, providergone callback that
		is called when a provider is about to be deleted.

geom_disk.h:	Add a new d_gone() callback to the GEOM disk
		interface.

		Bump the DISK_VERSION to version 2.  This probably
		should have been done after a couple of previous
		changes, especially the addition of the d_getattr()
		callback.

geom_disk.c:	Add a providergone callback for the disk class,
		g_disk_providergone(), that calls the user's
		d_gone() callback if it exists.

		Bump the DISK_VERSION to 2.

geom_subr.c:	In g_destroy_provider(), call the providergone
		callback if it has been provided.

		In g_new_geomf(), propagate the class's
		providergone callback to the new geom instance.

blkfront.c:	Callers of disk_create() are supposed to pass in
		DISK_VERSION, not an explicit disk API version
		number.  Update the blkfront driver to do that.

disk.9:		Update the disk(9) man page to include information
		on the new d_gone() callback, as well as the
		previously added d_getattr() callback, d_descr
		field, and HBA PCI ID fields.

MFC after:	5 days
2012-06-24 04:29:03 +00:00
Andrey V. Elsukov
d4746e107f Always reconstruct partition entries in the PMBR when Boot Camp is
disabled. This helps to easily recover from situations when PMBR is
damaged and contains no entries.

MFC after:	1 week
2012-06-14 11:17:54 +00:00
Alexander Motin
a839e33278 Add missing newlines into XML output.
MFC after:	3 days
Sponsored by:	iXsystems, Inc.
2012-06-05 16:46:34 +00:00
Marcel Moolenaar
f24a8224b2 Add a partition type for nandfs to the apm, bsd, gpt and vtoc8 schemes.
The gpart alias for these partition types is "freebsd-nandfs".
2012-05-25 20:33:34 +00:00
Edward Tomasz Napierala
d87e55886e Revert r235918 for now and add comment explaining the reason for the
size check.
2012-05-25 10:08:48 +00:00
Edward Tomasz Napierala
202f0f2a02 Make g_label(4) ignore provider size when looking for UFS labels.
Without it, it fails to create labels for filesystems resized by
growfs(8).

PR:		kern/165962
Submitted by:	Olivier Cochard-Labbe <olivier at cochard dot me>
2012-05-24 16:48:33 +00:00
Xin LI
8287ee1bbe - Correct signedness for casts;
- Wrap long line while I'm there.

Noticed by:	pjd, avg
2012-05-23 20:51:21 +00:00
Xin LI
dc89cfa691 Use %ju to match uintmax_t usage 2012-05-23 18:17:02 +00:00
Xin LI
2920997423 Use %j and cast off_t to intmax_t for now to fix build.
Noticed by:	bz
2012-05-23 17:49:59 +00:00
Grzegorz Bernacki
4ffd4dfe17 Add a new geom class which allows to divide NAND Flash chip
into partitions.

Partitions are created based on data in dts file which are
extracted and interpreted by slicer.

Obtained from: Semihalf
Supported by:  FreeBSD Foundation, Juniper Networks
2012-05-22 08:33:14 +00:00
Andrey V. Elsukov
f931cd70af Prevent removing of the last active component from a mirror.
PR:		kern/154860
Reviewed by:	pjd
MFC after:	1 week
2012-05-18 09:22:21 +00:00
Andrey V. Elsukov
1ee0138d2f Introduce new device flag G_MIRROR_DEVICE_FLAG_TASTING. It should
protect geom from destroying while it is tasting.

PR:		kern/154860
Reviewed by:	pjd
MFC after:	1 week
2012-05-18 09:19:07 +00:00
Eitan Adler
615a3e398d Add missing period at the end of the error message
Submitted by:	pjd
Approved by:	cperciva (implicit)
MFC after:	3 days
X-MFC-With:	r235201
2012-05-13 23:27:06 +00:00
Alexander Motin
ef844ef76f - Prevent error status leak if write to some of the RAID1/1E volume disks
failed while write to some other succeeded. Instead mark disk as failed.
- Make RAID1E less aggressive in failing disks to avoid volume breakage.

MFC after:	2 weeks
2012-05-11 13:20:17 +00:00
Eitan Adler
af23b88b5c Clarify error that geli generates
when it finds corrupt data.

PR:		kern/165695
Submitted by:	Robert Simmons <rsimmons0@gmail.com>
Reviewed by:	pjd
Approved by:	cperciva
MFC after:	1 week
2012-05-09 17:26:52 +00:00
Alexander Motin
14f9f25ba0 Remove some hardcoded constants from code. 2012-05-06 16:41:27 +00:00
Alexander Motin
eb3b1cd0de Plug small memory leaks. 2012-05-06 12:55:20 +00:00
Alexander Motin
8f12ca2ee1 Add support for RAID5R. Slightly improve support for RAIDMDF. 2012-05-06 11:32:36 +00:00
Alexander Motin
c0b1ef6661 Fix gmultipath configure for big-endian machines.
MFC after:	1 week
2012-05-06 05:49:23 +00:00
Alexander Motin
86b0366909 Fix bug causing memory corruption and panics with big-endian metadata. 2012-05-04 08:59:19 +00:00
Alexander Motin
4b97ff6137 Implement read-only support for volumes in optimal state (without using
redundancy) for the following RAID levels: RAID4/5E/5EE/6/MDF.
2012-05-04 07:32:57 +00:00
Alexander Motin
8df8e26adc Add optional -o argument to the graid label to specify some metadata
format options. Use it for specifying byte order for the DDF metadata:
big-endian defined by specification and little-endian used by Adaptec.
2012-05-03 05:32:56 +00:00
Alexander Motin
d525d87560 Improve spare disks support. Unluckily, for some reason Adaptec 1430SA
RAID BIOS doesn't want to understand spare disks created by graid. But
at least spares created by BIOS are working fine now.
2012-05-01 18:00:31 +00:00
Alexander Motin
2b9c925ff0 Implement volume deletion if disk has more then one partition. 2012-05-01 09:21:21 +00:00
Alexander Motin
47e980965c Improve DDF metadata writing. 2012-05-01 08:19:29 +00:00
Alexander Motin
00f32ecbd0 Add to GEOM RAID class module, supporting the DDF metadata format, as
defined by the SNIA Common RAID Disk Data Format Specification v2.0.

Supports multiple volumes per array and multiple partitions per disk.
Supports standard big-endian and Adaptec's little-endian byte ordering.
Supports all single-layer RAID levels. Dual-layer RAID levels except
RAID10 are not supported now because of GEOM RAID design limitations.

Some work is still to be done, but the present code already manages basic
interoperation with RAID BIOS of the Adaptec 1430SA SATA RAID controller.

MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2012-04-30 17:53:02 +00:00
Alexander Motin
c9f545e5f9 s/gmirror/graid/ 2012-04-29 19:40:50 +00:00
Alexander Motin
7b2a8d7823 Fix RAID5 level names changed at r234603. 2012-04-27 08:49:15 +00:00
Alexander Motin
bafd0b5b0a Fix copy-paste typo in r234603.
Submitted by:	kan
2012-04-23 16:35:19 +00:00
Alexander Motin
dbb2e75504 Add names for all primary RAID levels defined by DDF 2.0 specification. 2012-04-23 13:04:02 +00:00
Alexander Motin
e26083ca69 Add sos@ copyrights to RAID metadata modules, respecting his efforts in
decoding metadata formats in ataraid(4) code.
2012-04-23 09:39:39 +00:00
Alexander Motin
fc1de96060 Add to GEOM RAID class module for reading non-degraded RAID5 volumes and
some environment to differentiate 4 possible RAID5 on-disk layouts.

Tested with Intel and AMD RAID BIOSes.

MFC after:	2 weeks
2012-04-19 12:30:12 +00:00
Dmitry Morozovsky
b20e4de387 VMware environments are not unusual now. Add VMware partitions recognition
(both MBR for ESXi <= 4.1 and GPT for ESXi 5) to g_part.

Reviewed by:	ae
Approved by:	ae
MFC after:	2 weeks
2012-04-18 11:59:03 +00:00
Alexander Motin
63297dfd4a Some improvements to GEOM MULTIPATH:
- Implement "configure" command to allow switching operation mode of
running device on-fly without destroying and recreation.
 - Implement Active/Read mode as hybrid of Active/Active and Active/Passive.
In this mode all paths not marked FAIL may handle reads same time,
but unlike Active/Active only one path handles write requests at any
point in time. It allows to closer follow original write request order
if above layers need it for data consistency (not waiting for requisite
write completion before sending dependent write).
 - Hide duplicate messages about device status change.
 - Remove periodic thread wake up with 10Hz rate.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2012-04-18 09:42:14 +00:00
Kirk McKusick
85121b0979 Expand locking around identification of filesystem mount point when
accounting for I/O counts at completion of I/O operation. Also switch
from using global devmtx to vnode mutex to reduce contention.

Suggested and reviewed by: kib
2012-04-08 06:20:21 +00:00
Andrey V. Elsukov
ba289b84b0 VMDB offset should be greater than logical volume size only for MBR. 2012-03-29 07:29:27 +00:00
Andrey V. Elsukov
1c45872b03 Do proper cleanup for the GPT case when an error occurs. 2012-03-29 06:37:02 +00:00
Kirk McKusick
1faacf5d09 Keep track of the mount point associated with a special device
to enable the collection of counts of synchronous and asynchronous
reads and writes for its associated filesystem. The counts are
displayed using `mount -v'.

Ensure that buffers used for paging indicate the vnode from
which they are operating so that counts of paging I/O operations
from the filesystem are collected.

This checkin only adds the setting of the mount point for the
UFS/FFS filesystem, but it would be trivial to add the setting
and clearing of the mount point at filesystem mount/unmount
time for other filesystems too.

Reviewed by: kib
2012-03-28 20:49:11 +00:00
Andrey V. Elsukov
472794bb9f Check that scheme is not already registered. This may happens when a
KLD is preloaded with loader(8) and leads to infinity loops.

Also do not return EEXIST error code from MOD_LOAD handler, because
we have undocumented(?) ability replace kernel's module with preloaded one.
And if we have so, then preloaded module will be initialized first.
Thus error in MOD_LOAD handler will be triggered for the kernel.

PR:		kern/165573
MFC after:	3 weeks
2012-03-23 07:26:17 +00:00
Andrey V. Elsukov
f1104f7190 Add CTLFLAG_TUN to sysctls.
MFC after:	1 month
2012-03-19 13:21:10 +00:00
Andrey V. Elsukov
37d1a121d9 Add new GEOM_PART_LDM module that implements the Logical Disk Manager
scheme. The LDM is a logical volume manager for MS Windows NT and it
is also known as dynamic volumes. It supports about 2000 partitions
and also provides the capability for software RAID implementations.

This version implements only partitioning scheme capability and based
on the linux-ntfs project documentation and several publications across
the Web. NOTE: JBOD, RAID0 and RAID5 volumes aren't supported.

An access to the LDM metadata is read-only. When LDM is on the disk
partitioned with MBR we can also destroy metadata. For the GPT
partitioned disks destroy action is not supported.

Reviewed by:	ivoras (previous version)
MFC after:	1 month
2012-03-19 13:14:44 +00:00
Andrey V. Elsukov
422783e365 Make kern.geom.part node not static. Also add CTLFLAG_TUN to the
check_integrity sysctl.

MFC after:	1 month
2012-03-19 12:57:52 +00:00
Andrey V. Elsukov
5284aff594 Add MODULE_DEPEND() to geom_part modules.
MFC after:	2 weeks
2012-03-15 08:39:10 +00:00
Ed Maste
972f6945b8 Remove unactionable message about label geometry
It's not clear to a user what they should do after seeing the "geometry
does not match label" kernel message, and it does not appear to present
a problem in practice.  Thus, just remove the messages.

Approved by:	marcel
2012-03-08 01:48:44 +00:00
Andrey V. Elsukov
5357f27569 If nested scheme allows dump kernel to its partition, we may allow
dump for the parent partition too.

MFC after:	2 weeks
2012-02-20 06:35:52 +00:00
Andrey V. Elsukov
c3f9f306d2 Add alias for the partition type 0x0f. Now "ebr" name is used for both
types 0x05 and 0x0f, but 0x05 is preferred and used when partition is
created with "gpart add -t ebr ...".
This should keep EBR partitions accessible after r231754 for those,
who have EBR on the partition with type 0x0f.
2012-02-20 05:48:57 +00:00
Andrey V. Elsukov
3bcf7d7191 Add additional check to EBR probe and create methods:
don't try probe and create  EBR scheme when parent partition type
is not "ebr". This fixes error messages about corrupted EBR for
some partitions where is actually another partition scheme.

NOTE: if you have EBR on the partition with different than "ebr"
(0x05) type, then you will lost access to partitions until it will be
changed.

MFC after:	2 weeks
2012-02-15 10:33:29 +00:00
Andrey V. Elsukov
0d8bc07eba Add PART::type attribute handler. It returns partition type as string.
MFC after:	2 weeks
2012-02-15 10:02:19 +00:00
Andrey V. Elsukov
48ef46e55a Add alias for the partition with type 0x42 to the MBR scheme.
MFC after:	1 week
2012-02-10 09:55:18 +00:00
Andrey V. Elsukov
f44d97bd0c Let's be more realistic and limit maximum number of partition to 4k.
MFC after:	1 week
2012-02-10 06:44:30 +00:00
Konstantin Belousov
c480f781ea Current implementations of sync(2) and syncer vnode fsync() VOP uses
mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which
is needed to guarantee a synchronous completion of the initiated i/o
before syscall or VOP return.  Global removal of MNTK_ASYNC option is
harmful because not only i/o started from corresponding thread becomes
synchronous, but all i/o is synchronous on the filesystem which is
initiated during sync(2) or syncer activity.

Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local
thread flag to disable async i/o for current thread only. Use the
opportunity to move DOINGASYNC() macro into sys/vnode.h and
consistently use it through places which tested for MNTK_ASYNC.

Some testing demonstrated 60-70% improvements in run time for the
metadata-intensive operations on async-mounted UFS volumes, but still
with great deviation due to other reasons.

Reviewed by:	mckusick
Tested by:	scottl
MFC after:	2 weeks
2012-02-06 11:04:36 +00:00
Ed Maste
23f6856fff Correct typo in comment (numbver) 2012-02-04 18:14:39 +00:00
Andrey V. Elsukov
7b540236bb The scheme code may not know about some inconsistency in the metadata.
So, add an integrity check after recovery attempt.

MFC after:	1 week
2012-02-01 09:28:16 +00:00
Attilio Rao
5d7380f8e3 Avoid to check the same cache line/variable from all the locking
primitives by breaking stop_scheduler into a per-thread variable.
Also, store the new td_stopsched very close to td_*locks members as
they will be accessed mostly in the same codepaths as td_stopsched and
this results in avoiding a further cache-line pollution, possibly.

STOP_SCHEDULER() was pondered to use a new 'thread' argument, in order to
take advantage of already cached curthread, but in the end there should
not really be a performance benefit, while introducing a KPI breakage.

In collabouration with:	flo
Reviewed by:	avg
MFC after:	3 months (or never)
X-MFC:		r228424
2012-01-28 14:00:21 +00:00
Nathan Whitehorn
090dd24636 Experimental support for booting CHRP-type PowerPC systems from hard disks. 2012-01-25 03:37:39 +00:00
Don Lewis
b5bad28182 Allow an MBR primary or extended Linux swap partition to be specified
as the system dump device.  This was already allowed for GPT.  The Linux
swap metadata at the beginning of the partition should not be disturbed
because the crash dump is written at the end.

Reviewed by:	alfred, pjd, marcel
MFC after:	2 weeks
2012-01-13 18:32:56 +00:00
Jim Harris
c1ad3fcf6a Add support for >2TB disks in GEOM RAID for Intel metadata format.
Reviewed by: mav
Approved by: scottl
MFC after: 1 week
2012-01-09 23:01:42 +00:00
Aleksandr Rybalko
ce96bb7942 GEOM_UNCOMPRESS module, can be used with uzip images and with new ulzma images.
Approved by:	adrian (mentor)
2012-01-04 23:39:11 +00:00
Andriy Gapon
f6ce353e58 replace uses of libkern gets with cngets
MFC after:	2 months
2011-12-17 15:26:34 +00:00
Alexander Motin
a2fa37fe67 Close race between geom destruction on g_vfs_close() when softc destroyed
and g_vfs_orphan() call that tries to access softc, intruced at r227015.

PR:		kern/162997
2011-12-02 17:09:48 +00:00
Andrey V. Elsukov
a85a0d469e Add an ability to increase number of allocated APM entries when we
have reserved free space in the APM area.
Also instead of one write request per each APM entry, use MAXPHY
sized writes when we are updating APM.

MFC after:	1 month
2011-11-28 16:07:26 +00:00
Andrey V. Elsukov
64c4a83782 The size of APM could be bigger than number of already allocated entries.
And the first usable sector should not start from the inside of APM area.

MFC after:	1 month
2011-11-28 12:38:24 +00:00
Alexander Motin
107c1508fa Temporary revert r227009 to fix freeze on UP systems without PREEMPTION.
Before r215687, if some withered geom or provider could not be destroyed,
g_event thread went to sleep for 0.1s before retrying. After that change
it is just restarting immediately. r227009 made orphaned (withered) provider
to not detach immediately, but only after context switch. That made loop
inside g_event thread infinite on UP systems without PREEMPTION.

To address original problem with possible dead lock addressed by r227009
we have to fix r215687 change first, that needs some time to think and test.
2011-11-14 19:32:05 +00:00
Alexander Motin
0c883cef45 Major GEOM MULTIPATH class rewrite:
- Improved locking and destruction process to fix crashes.
 - Improved "automatic" configuration method to make it consistent and safe
by reading metadata back from all specified paths after writing to one.
 - Added provider size check to reduce chance of ordering conflict with
other GEOM classes.
 - Added "manual" configuration method without using on-disk metadata.
 - Added "add" and "remove" commands to allow manage paths manually.
 - Failed paths are no longer dropped from geom, but only marked as FAIL
and excluded from I/O operations.
 - Automatically restore failed paths when all others paths are marked
as failed, for example, because of device-caused (not transport) errors.
 - Added "fail" and "restore" commands to manually control FAIL flag.
 - geom is now destroyed on last path disconnection.
 - Added optional Active/Active mode support. Unlike Active/Passive
mode, load evenly distributed between all working paths. If supported by
the device, it allows to significantly improve performance, utilizing
bandwidth of all paths. It is controlled by -A option during creation.
Disabled by default now.
 - Improved `status` and `list` commands output.

Sponsored by:	iXsystems, inc.
MFC after:	1 month
2011-11-12 09:52:27 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Ed Schouten
d745c852be Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
2011-11-07 06:44:47 +00:00
Alexander Motin
ea5791d7ab Add mutex and two flags to make orphan() call properly asynchronous:
- delay consumer closing and detaching on orphan() until all I/Os complete;
 - prevent new I/Os submission after orphan() called.
Previous implementation could destroy consumers still having active
requests and worked only because of global workaround made on GEOM level.
2011-11-02 09:24:59 +00:00
Alexander Motin
755d1ea5b5 Make orphan() method in geom_dev asynchronous using destroy_dev_sched_cb()
instead of destroy_dev(). It moves device destruction waiting out of the
topology lock and so fixes dead lock between orphanization and closing.
Real provider and geom destruction called from swi context after device
destroyed as callback of the destroy_dev_sched_cb().
2011-11-01 23:12:22 +00:00
Alexander Motin
df96fd6e14 Refactor disk disconnection and geom destruction handling sequences.
Do not close/destroy opened consumer directly in case of disconnect. Instead
keep it existing until it will be closed in regular way in response to
upstream provider destruction. Delay geom destruction in the same way.
Previous implementation could destroy consumers still having active
requests and worked only because of global workaround made on GEOM level.
2011-11-01 20:56:19 +00:00
Alexander Motin
0849a53fc0 Refactor disk disconnection and geom destruction handling sequences.
Do not close/destroy opened consumer directly in case of disconnect. Instead
keep it existing until it will be closed in regular way in response to
upstream provider destruction. Delay geom destruction in the same way.
Previous implementation could destroy consumers still having active
requests and worked only because of global workaround made on GEOM level.
2011-11-01 17:04:42 +00:00
Alexander Motin
20a5d5dc60 Workaround the problem introduced by combination of r162200 and r215687.
r162200 delays provider orphanization until all running requests complete,
to workaround broken orphan() method implementation in some classes.
r215687 removes persistent periodic (10Hz) event thread wake ups.
Together these changes can indefinitely delay orphanization until some
other event wake up the event thread. One consequence of this is inability
of CAM to destroy device disconnected when busy and, as consequence, create
new one after reconnection.

While the best solution would be to revert r162200, it is not easy, as
some classes still look broken in that way. Instead conditionally wake up
event thread if there are some providers waiting for orphanization.

MFC after:	1 week
2011-11-01 08:57:49 +00:00
Andrey V. Elsukov
aea26bc05a Our geom withering function could take some time before geom with its
providers and consumers will be destroyed.  Before take some actions
with a geom, check that it is not destroyed at the moment.

Tested by:	nwhitehorn
MFC after:	1 week
2011-10-28 11:45:24 +00:00
Pawel Jakub Dawidek
0c879bd990 Before this change when GELI detected hardware crypto acceleration it will
start only one worker thread. For software crypto it will start by default
N worker threads where N is the number of available CPUs.

This is not optimal if hardware crypto is AES-NI, which uses CPU for AES
calculations.

Change that to always start one worker thread for every available CPU.
Number of worker threads per GELI provider can be easly reduced with
kern.geom.eli.threads sysctl/tunable and even for software crypto it
should be reduced when using more providers.

While here, when number of threads exceeds number of CPUs avilable don't
reduce this number, assume the user knows what he is doing.

Reported by:	Yuri Karaban <dev@dev97.com>
MFC after:	3 days
2011-10-27 16:12:25 +00:00
Alexander Motin
733a1f3f52 Clarify disks/volumes above 2TiB support in geom_raid:
- add support for volumes above 2TiB with Promise metadata format;
 - enforse and document other limitations:
   - Intel and Promise metadata formats do not support disks above 2TiB;
   - NVIDIA metadata format does not support volumes above 2TiB.

Sponsored by:	iXsystems, Inc.
MFC after:	2 weeks
2011-10-26 21:50:10 +00:00
Pawel Jakub Dawidek
92f84a9fae Allow upper layers to discover than BIO_DELETE and/or BIO_FLUSH is not
supported by returning EOPNOTSUPP instead of 0 or ENODEV.

MFC after:	3 days
2011-10-25 14:07:17 +00:00
Pawel Jakub Dawidek
37f0f0a75e Improve style a bit.
MFC after:	3 days
2011-10-25 14:05:39 +00:00
Pawel Jakub Dawidek
9495476273 Simplify disk_alloc().
MFC after:	3 days
2011-10-25 14:04:59 +00:00
Pawel Jakub Dawidek
1f8c92e6fa Add support for creating GELI devices with older metadata version for use
with older FreeBSD versions:
- Add -V option to 'geli init' to specify version number. If no -V is given
  the most recent version is used.
- If -V is given don't allow to use features not supported by this version.
- Print version in 'geli list' output.
- Update manual page and add table describing which GELI version is
  supported by which FreeBSD version, so one can use it when preparing GELI
  device for older FreeBSD version.

Inspired by:	Garrett Cooper <yanegomi@gmail.com>
MFC after:	3 days
2011-10-25 13:57:50 +00:00
Pawel Jakub Dawidek
effb9912c7 When decoding metadata, check magic string, so we know this is not GELI device
before we check its version. We don't want to report that some garbage is
unsupported version if this is not even GELI provider.

MFC after:	3 days
2011-10-25 13:44:23 +00:00
Pawel Jakub Dawidek
0e236b6c47 Prefer G_ELI_VERSION_* defines for version numbers over plain digits.
MFC after:	3 days
2011-10-25 13:09:22 +00:00
Pawel Jakub Dawidek
038c55adcc Fit lines into 80 chars.
MFC after:	3 days
2011-10-25 13:08:03 +00:00
Pawel Jakub Dawidek
e880ff0062 When metadata is at newer version than the highest supported, return
EOPNOTSUPP when decoding.

MFC after:	3 days
2011-10-25 07:48:53 +00:00
Marcel Moolenaar
369fe59de8 Add support for Boot Camp. The support is defined as follows:
o   Detect when Boot Camp is enabled (i.e. the MBR mirrors the GPT).
o   When Boot Camp is enabled, update the MBR whenever we write the GPT.
o   Creation of a Boot Camp enabled GPT is not supported.
o   Automatically disable Boot Camp when the GPT has been changed so that
    there's either no EFI partition or no HFS+ partition.
o   The first 4 partitions (by index) get mirrored in the MBR.

Requested by, discussed with and tested by: kris@pcbsd.org
MFC after: 1 week
2011-10-23 02:51:23 +00:00
Marius Strobl
479a4ef021 Allow to dump on Solaris swap partitions.
PR:		161764
Submitted by:	Peter Jeremy
2011-10-18 20:16:02 +00:00
Pawel Jakub Dawidek
8d680f2cc9 Add some spare fields to the g_class and g_geom structures needed to implement
direct I/O handling and provider's property changes handling.
2011-07-17 20:35:30 +00:00
Andrey V. Elsukov
0857ee8cb8 Remove include of sys/sbuf.h from geom/geom.h.
sbuf support is not always required for geom/geom.h users, and no need to
depend from it.

PR:		kern/158398
2011-07-11 10:02:27 +00:00
Andrey V. Elsukov
5d807a0e1a Include sys/sbuf.h directly.
Reviewed by:	pjd
2011-07-11 05:22:31 +00:00
Kirk McKusick
8795189c98 Allow disk partitions associated with UFS read-only mounted
filesystems to be opened for writing. This functionality used to
be special-cased for just the root filesystem, but with this change
is now available for all UFS filesystems. This change is needed for
journaled soft updates recovery.

Discussed with: Jeff Roberson
2011-07-10 00:41:31 +00:00
Andrey V. Elsukov
2b9be05588 Initialize elements of state array when creating the GPT table.
This fixes the problem, when the secondary GPT header is not erased when
partition table destroyed. Move equal operations from g_part_gpt_create
and g_part_gpt_recover to the separate function g_gpt_set_defaults.

Reported by:	dwhite
MFC after:	1 week
2011-06-29 05:41:14 +00:00
Andrey V. Elsukov
671dfdbf11 EBR could contain an early stage of boot code. But we do not support it.
Remove message about non empty bootcode, we can not break something
while GEOM_PART_EBR_COMPAT is defined.

But without GEOM_PART_EBR_COMPAT any changes in EBR are allowed and we
can accidentally wipe the boot code. To do not break anything save
the first EBR chunk and keep it untouched each time when we are
changing EBR. Note that we are still not support boot code for EBR.

PR:		kern/141235
MFC after:	1 month
2011-06-27 12:42:48 +00:00
Andrey V. Elsukov
61162e857a MS Windows NT+ uses 4 bytes at offset 0x1b8 in the MBR to identify
disk drive. The boot0cfg(8) utility preserves these 4 bytes when is
writing bootcode to keep a multiboot ability.
Change gpart's bootcode method to keep DSN if it is not zero. Also
do not allow writing bootcode with size not equal to MBRSIZE.

PR:		kern/157819
Tested by:	Eir Nym
MFC after:	1 month
2011-06-27 10:42:06 +00:00
Andrey V. Elsukov
503e6682cd Change the way how we update bootcode for BSD scheme.
Since the only parameter that we check is size of bootcode, then
allow only two sizes: size of boot1 and size of /boot/boot.
This partially protects users from losing ability to boot if incorrect
bootcode is specified.

Requested by:	ru
2011-06-20 12:22:30 +00:00
Justin T. Gibbs
416494d7c9 Plumb device physical path reporting from CAM devices, through GEOM and
DEVFS, and make it accessible via the diskinfo utility.

Extend GEOM's generic attribute query mechanism into generic disk consumers.
sys/geom/geom_disk.c:
sys/geom/geom_disk.h:
sys/cam/scsi/scsi_da.c:
sys/cam/ata/ata_da.c:
	- Allow disk providers to implement a new method which can override
	  the default BIO_GETATTR response, d_getattr(struct bio *).  This
	  function returns -1 if not handled, otherwise it returns 0 or an
	  errno to be passed to g_io_deliver().

sys/cam/scsi/scsi_da.c:
sys/cam/ata/ata_da.c:
	- Don't copy the serial number to dp->d_ident anymore, as the CAM XPT
	  is now responsible for returning this information via
	  d_getattr()->(a)dagetattr()->xpt_getatr().

sys/geom/geom_dev.c:
	- Implement a new ioctl, DIOCGPHYSPATH, which returns the GEOM
	  attribute "GEOM::physpath", if possible.  If the attribute request
	  returns a zero-length string, ENOENT is returned.

usr.sbin/diskinfo/diskinfo.c:
	- If the DIOCGPHYSPATH ioctl is successful, report physical path
	  data when diskinfo is executed with the '-v' option.

Submitted by:	will
Reviewed by:	gibbs
Sponsored by:	Spectra Logic Corporation

Add generic attribute change notification support to GEOM.

sys/sys/geom/geom.h:
	Add a new attrchanged method field to both g_class
	and g_geom.

sys/sys/geom/geom.h:
sys/geom/geom_event.c:
	- Provide the g_attr_changed() function that providers
	  can use to advertise attribute changes.
	- Perform delivery of attribute change notifications
	  from a thread context via the standard GEOM event
	  mechanism.

sys/geom/geom_subr.c:
	Inherit the attrchanged method from class to geom (class instance).

sys/geom/geom_disk.c:
	Provide disk_attr_changed() to provide g_attr_changed() access
	to consumers of the disk API.

sys/cam/scsi/scsi_pass.c:
sys/cam/scsi/scsi_da.c:
sys/geom/geom_dev.c:
sys/geom/geom_disk.c:
	Use attribute changed events to track updates to physical path
	information.

sys/cam/scsi/scsi_da.c:
	Add AC_ADVINFO_CHANGED to the registered asynchronous CAM
	events for this driver.  When this event occurs, and
	the updated buffer type references our physical path
	attribute, emit a GEOM attribute changed event via the
	disk_attr_changed() API.

sys/cam/scsi/scsi_pass.c:
	Add AC_ADVINFO_CHANGED to the registered asynchronous CAM
	events for this driver.  When this event occurs, update
	the physical patch devfs alias for this pass instance.

Submitted by:	gibbs
Sponsored by:	Spectra Logic Corporation
2011-06-14 17:10:32 +00:00
Attilio Rao
d7073a2b3b MFC 2011-06-03 17:09:15 +00:00
Alexander Motin
0330cb3bf7 Update disk's stripesize and stripeoffset parameters on provider open.
They are media-dependent and may change in run-time, same as sectorsize
and/or mediasize.

SCSI devices return physical sector size and offset via READ CAPACITY(16)
command and so can not report it until media inserted or at least until
probe sequence completed. UNMAP support is also reported there.
2011-06-03 13:49:18 +00:00
Andrey V. Elsukov
38c64884ff Add diagnostic message about not aligned partitions.
Idea from:	ivoras
2011-06-03 06:58:24 +00:00
Attilio Rao
3bf1ec3a9a MFC 2011-06-02 14:09:30 +00:00
Andrey V. Elsukov
d15033b3f8 Do not hide stripeoffset from libgeom(3), it may be useful even when
stripesize is zero.

MFC after:	1 week
2011-06-02 12:49:45 +00:00
Attilio Rao
9cb46334ee MFC 2011-05-27 16:09:10 +00:00
Andrey V. Elsukov
9854b4eeee Some partitioning tools may have a different opinion about disk
geometry and partitions may start from withing the first track.
If we found such partitions, then do not reserve space of the
first track, only first sector.
2011-05-27 06:37:42 +00:00
Attilio Rao
7fcdc9a26f MFC 2011-05-26 17:38:00 +00:00
Andrey V. Elsukov
ceef8f2477 Prevent non-aligned reading from provider while tasting. Reject
providers with unsupported sectorsize.

Reported by:	Joerg Wunsch
MFC after:	1 week
2011-05-25 11:14:26 +00:00
Andrey V. Elsukov
6fd1e2e013 Do not truncate available disk space to the closest track boundary. 2011-05-25 09:45:13 +00:00
Andrey V. Elsukov
23a3490034 Do not truncate available disk space to the closest track boundary. 2011-05-25 09:38:12 +00:00
Andrey V. Elsukov
db48d4a92e Do not truncate available disk space to the closest track boundary. 2011-05-25 09:32:19 +00:00
Andrey V. Elsukov
49d12fd5be Remove unused variable.
MFC after:	1 week
2011-05-24 06:46:07 +00:00
Andrey V. Elsukov
e471361279 Remove unused variable.
MFC after:	1 week
2011-05-24 06:44:16 +00:00
Attilio Rao
3ac3f6002b MFC 2011-05-23 23:58:02 +00:00
Pawel Jakub Dawidek
204a4e196a Recognize BIO_FLUSH requests and pass them to userland.
MFC after:	1 week
2011-05-23 21:00:37 +00:00
Attilio Rao
7e7a34e520 MFC 2011-05-16 16:34:03 +00:00
Andrey V. Elsukov
d0c8ecb812 Make diagnostic messages more specific. With bootverbose print out
all inconsistencies of integrity in the partition table, not first
found only.

Requested by:	kib
2011-05-16 15:59:50 +00:00
Andrey V. Elsukov
b6c4978f6f Add diagnostic messages for integrity checks. 2011-05-16 12:00:32 +00:00
Andrey V. Elsukov
6e81b75a3c Add a sysctl kern.geom.part.check_integrity for those who has corrupt
partition tables and lost an ability to boot after r221788.
Also unhide an error message from bootverbose, this would help to
easier determine the problem.
2011-05-15 20:03:54 +00:00
Attilio Rao
447274a88b MFC 2011-05-15 15:47:16 +00:00
Mikolaj Golub
76cc7f6dd6 Fix a memory leak possible in g_eli_key_allocate() if the key with the
same keyno is added while we aren't holding the lock.

Approved by:	pjd (mentor)
MFC after:	1 week
2011-05-15 12:39:30 +00:00
Attilio Rao
ef607a6aa3 MFC 2011-05-12 14:01:40 +00:00
Andrew Thompson
b2901e999b Move the three geom kprocs as threads under a single pid.
Reviewed by:	julian
2011-05-11 21:47:30 +00:00
Andrey V. Elsukov
c63e8fe201 Add basic metadata integrity check. In case when partition table was
probed and read successfull, but it contains invalid values (e.g.
overlapped partitions, offset or size is out of bounds), then table
will be rejected.

MFC after:	1 month
2011-05-11 19:59:43 +00:00
Attilio Rao
521bd6b433 MFC 2011-05-08 14:56:02 +00:00
Andrey V. Elsukov
f30b6bcb60 Limit number of sectors that can be addressed.
MFC after:	1 week
2011-05-08 12:28:13 +00:00
Andrey V. Elsukov
284a82d0bb Limit number of sectors that can be addressed.
MFC after:	1 week
2011-05-08 12:20:30 +00:00
Andrey V. Elsukov
6017ae3fdd Limit number of sectors that can be addressed.
Reject table if blkcount from metadata is greater than provider.
2011-05-08 12:16:39 +00:00
Andrey V. Elsukov
2920db1713 Limit number of sectors that can be addressed.
MFC after:	1 week
2011-05-08 12:11:16 +00:00
Andrey V. Elsukov
4675b2b65f Replace UINT_MAX to UINT32_MAX.
Pointed out by:	kib
MFC after:	1 week
2011-05-08 11:42:51 +00:00
Andrey V. Elsukov
ab0ffb4c88 Limit number of sectors that can be addressed.
MFC after:	1 week
2011-05-08 11:20:27 +00:00
Andrey V. Elsukov
cfbdf6c3c5 Limit number of sectors that can be addressed.
MFC after:	1 week
2011-05-08 11:16:17 +00:00
Pawel Jakub Dawidek
a1f4a8c447 Export GELI class version via sysctl kern.geom.eli.version.
MFC after:	1 week
2011-05-08 09:29:21 +00:00
Pawel Jakub Dawidek
731adc8682 Version 6 is compatible with version 5 when it comes to control commands.
MFC after:	1 week
2011-05-08 09:25:54 +00:00
Pawel Jakub Dawidek
964d172cbe Detect and handle metadata of version 6.
MFC after:	1 week
2011-05-08 09:25:16 +00:00
Pawel Jakub Dawidek
ad0a523639 When support for multiple encryption keys was committed, GELI integrity mode
was not updated to pass CRD_F_KEY_EXPLICIT flag to opencrypto. This resulted in
always using first key.

We need to support providers created with this bug, so set special
G_ELI_FLAG_FIRST_KEY flag for GELI provider in integrity mode with version
smaller than 6 and pass the CRD_F_KEY_EXPLICIT flag to opencrypto only if
G_ELI_FLAG_FIRST_KEY doesn't exist.

Reported by:	Anton Yuzhaninov <citrin@citrin.ru>
MFC after:	1 week
2011-05-08 09:17:56 +00:00
Pawel Jakub Dawidek
9d644a4032 Remove prototype for a function that no longer exist.
MFC after:	1 week
2011-05-08 09:11:04 +00:00
Pawel Jakub Dawidek
937959f0a7 Drop proper key.
MFC after:	1 week
2011-05-08 09:09:49 +00:00
Pawel Jakub Dawidek
9104c920b4 Add magic field to the g_eli_key structure to detect if we are really
operating on proper structures.

MFC after:	1 week
2011-05-08 09:08:50 +00:00
Attilio Rao
aa8b9e0706 MFC 2011-05-06 22:45:33 +00:00
Adrian Chadd
c60fd25d34 Updates to geom_map from the author.
The major update here is to support 64 bit size/offsets.
There's also style related changes.

Submitted by: 	ray@dlink.ua
2011-05-05 14:43:09 +00:00
Attilio Rao
71a19bdc64 Commit the support for removing cpumask_t and replacing it directly with
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).

Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.

The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN

while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.

Some technical notes:
- This commit may be considered an ABI nop for all the architectures
  different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
  accessed avoiding migration, because the size of cpuset_t should be
  considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
  primirally done in order to leave some more space in userland to cope
  with KBI extensions). If you need to access kernel cpuset_t from the
  userland please refer to example in this patch on how to do that
  correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now

The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.

Tested by:	pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by:	jeff, jhb, sbruno
2011-05-05 14:39:14 +00:00
Andrey V. Elsukov
9a7defbda0 Remove unneeded code.
MFC after:	1 week
2011-05-04 18:41:26 +00:00
Andrey V. Elsukov
eb8e9abe72 Remove unneeded code.
MFC after:	1 week
2011-05-04 18:26:45 +00:00
Andrey V. Elsukov
ceb1c69a84 Remove unneeded code.
MFC after:	1 week
2011-05-04 18:17:21 +00:00
Andrey V. Elsukov
2fbefe4829 Removed KASSERT, g_new_providerf() can not fail.
MFC after:	1 week
2011-05-04 18:06:40 +00:00
Andrey V. Elsukov
c211af0352 Remove "for a moment" assignment. struct g_geom zeroed when allocated.
MFC after:	1 week
2011-05-04 17:56:53 +00:00
Andrey V. Elsukov
e62dffbf5d Remove unneeded checks, g_new_xxx functions can not fail.
MFC after:	1 week
2011-05-04 17:37:37 +00:00
Andrey V. Elsukov
370efd743a When checking existence of providers skip those which are orphaned.
PR:		kern/132273
MFC after:	2 week
2011-05-04 12:59:11 +00:00
Alexander Motin
bd5c368604 Use make_dev_alias_p() added in r221397 to create alias dev entry.
It removes panic in case if alias name is already busy for some reason.
2011-05-03 19:12:42 +00:00
Alexander Motin
90f2be2430 Implement relaxed comparision for hardcoded provider names to make it
ignore adX/adaY difference in both directions to simplify migration to
the CAM-based ATA or back.
2011-04-27 00:10:26 +00:00
Alexander Motin
0d307e0905 - Add shim to simplify migration to the CAM-based ATA. For each new adaX
device in /dev/ create symbolic link with adY name, trying to mimic old ATA
numbering. Imitation is not complete, but should be enough in most cases to
mount file systems without touching /etc/fstab.
 - To know what behavior to mimic, restore ATA_STATIC_ID option in cases
where it was present before.
 - Add some more details to UPDATING.
2011-04-26 17:01:49 +00:00
Pawel Jakub Dawidek
16a174b5c5 One key is expected from providers smaller than or equal to (2^20)*sectorsize
bytes. Remove bogus assertion and while here remove another too obvious
assertion.

Reported by:	Fabian Keil <freebsd-listen@fabiankeil.de>
MFC after:	2 weeks
2011-04-24 10:41:13 +00:00
Pawel Jakub Dawidek
5bd8adc750 If number of keys for the given provider doesn't exceed the limit,
allocate all of them at attach time. This allows to avoid moving
keys around in the most-recently-used queue and needs no mutex
synchronization nor refcounting.

MFC after:	2 weeks
2011-04-21 13:35:20 +00:00
Pawel Jakub Dawidek
1e09ff3dc3 Instead of allocating memory for all the keys at device attach,
create reasonably large cache for the keys that is filled when
needed. The previous version was problematic for very large providers
(hundreds of terabytes or serval petabytes). Every terabyte of data
needs around 256kB for keys. Make the default cache limit big enough
to fit all the keys needed for 4TB providers, which will eat at most
1MB of memory.

MFC after:	2 weeks
2011-04-21 13:31:43 +00:00
Alexander Motin
fe51d6c1d1 Reduce geom_raid log verbosity. 2011-04-18 16:15:59 +00:00
Gavin Atkinson
47bae5fd09 Remove an incorrect be16toh() that prevented geom_part_apm from working on
little-endian machines.

Reviewed by:	marcel
MFC after:	2 weeks
2011-04-15 12:32:52 +00:00
Adrian Chadd
27afdbaa51 Introduce geom_map, a GEOM provider designed for use by
embedded flash stores.

Some devices - notably those with uboot - don't have an
explicit partition table (eg like Redboot's FIS.)
geom_map thus provides an easy way to export the hard-coded
flash layout as geom providers for use by filesystems and
other tools.

It also includes a "search" function which allows for
dynamic creation of partition layouts where the device only
has a single hard-coded partition. For example, if
there is a "kernel+rootfs" partition, a single image can
be created which appends the rootfs after the kernel with
an appropriate search string. geom_map can be told to
search for said search string and create a partition
beginning after it.

Submitted by:	Aleksandr Rybalko <ray@dlink.ua>
2011-04-12 08:10:25 +00:00
Mikolaj Golub
90574b0a79 In g_eli_read_done() and g_eli_write_done(), for a bio with
bio_children > 1, g_destroy_bio() is never called and the bio
leaks. Fix this by calling g_destroy_bio() earlier, before the check.

Submitted by:	Victor Balada Diaz <victor@bsdes.net> (initial version)
Approved by:	pjd (mentor)
MFC after:	1 week
2011-04-03 17:38:12 +00:00
Pawel Jakub Dawidek
63a6c5c12b GEOM has an internal mechanism to deal with ENOMEM errors returned via
g_io_deliver(). In such case it increases 'pace' counter on each ENOMEM and
reschedules the request. The 'pace' counter is decreased for each request going
down, but until 'pace' is greater than zero, GEOM will handle at most 10
requests per second. For GEOM GATE users that are proxy to local GEOM providers
(like ggatel(8) and HAST) we can end up with almost permanent slow down of GEOM
down queue. This is because once we reach GEOM GATE queue limit, we return
ENOMEM to the GEOM. This means that we have, eg. 1024 I/O requests in the GEOM
GATE queue. To make room in the queue and stop returning ENOMEM we need to
proceed the requests of course, but those requests are handled by userland
daemons that handle them by reading/writing also from/to local GEOM providers.
For example with HAST, a new requests comes to /dev/hast/data, which is GEOM
GATE provider. GEOM GATE passes the request to hastd(8) and hastd(8)
reads/writes from/to /dev/da0. Once we reach GEOM GATE queue limit, to free up
a slot in GEOM GATE queue, hastd(8) has to read/write from/to /dev/da0, but
this request will also be very slow, because GEOM now slows down all the
requests. We end up with full queue that we can unload at the speed of 10
requests per second. This simply looks like a deadlock.

Fix it by allowing userland daemons that work with both GEOM GATE and local
GEOM providers to specify unlimited queue size, so GEOM GATE will never return
ENOMEM to the GEOM.

MFC after:	1 week
2011-04-02 06:56:06 +00:00
Alexander Motin
14e2cd0a00 Bunch of small bugfixes and cleanups.
Found with:	Clang Static Analyzer
2011-03-31 16:19:53 +00:00
Alexander Motin
636076752a Bunch of small bugfixes and cleanups.
Found with:     Coverity Prevent(tm)
CID:            9656, 9658, 9693, 9705, 9706, 9707, 9808, 9809, 9810,
		9711, 9712, 9713, 9714
2011-03-31 16:14:35 +00:00
Andrey V. Elsukov
53ff3d1e9c Remove unneeded checks, g_new_xxx functions can not return NULL.
Reviewed by:	pjd
MFC after:	1 week
2011-03-31 06:30:59 +00:00
Mikolaj Golub
bd119384c7 Increase debug level on g_gate device destruction and add message on
device creation.

Suggested by:	danger
Approved by:	pjd (mentor)
MFC after:	3 days
2011-03-30 21:40:14 +00:00
Mikolaj Golub
baf63f65ae In g_gate_create() there is a window between when g_gate_softc is
registered in g_gate_units array and when its sc_provider field is
filled. If during this period g_gate_units is accessed by another
thread that is checking for provider name collision the crash is
possible.

Fix this by adding sc_name field to struct g_gate_softc. In
g_gate_create() when g_gate_softc is created but sc_provider is still
not sc_name points to provider name stored in the local array.

Approved by:	pjd (mentor)
Reported by:	Freddie Cash <fjwcash@gmail.com>
MFC after:	1 week
2011-03-27 19:56:55 +00:00
Alexander Motin
89b172238a MFgraid/head:
Add new RAID GEOM class, that is going to replace ataraid(4) in supporting
various BIOS-based software RAIDs. Unlike ataraid(4) this implementation
does not depend on legacy ata(4) subsystem and can be used with any disk
drivers, including new CAM-based ones (ahci(4), siis(4), mvs(4), ata(4)
with `options ATA_CAM`). To make code more readable and extensible, this
implementation follows modular design, including core part and two sets
of modules, implementing support for different metadata formats and RAID
levels.

Support for such popular metadata formats is now implemented:
Intel, JMicron, NVIDIA, Promise (also used by AMD/ATI) and SiliconImage.

Such RAID levels are now supported:
RAID0, RAID1, RAID1E, RAID10, SINGLE, CONCAT.

For any all of these RAID levels and metadata formats this class supports
full cycle of volume operations: reading, writing, creation, deletion,
disk removal and insertion, rebuilding, dirty shutdown detection
and resynchronization, bad sector recovery, faulty disks tracking,
hot-spare disks. For Intel and Promise formats there is support multiple
volumes per disk set.

Look graid(8) manual page for additional details.

Co-authored by:	imp
Sponsored by:	Cisco Systems, Inc. and iXsystems, Inc.
2011-03-24 21:31:32 +00:00
Alexander Motin
c6d4ed3a32 MFgraid/head r218212, r218257:
Introduce new type of BIO_GETATTR -- GEOM::setstate, used to inform lower
GEOM about state of it's providers from the point of upper layers.
Make geom_disk use led(4) subsystem to illuminate states in such fashion:
FAILED - "1" (on), REBUILD - "f5" (slow blink), RESYNC - "f1" (fast blink),
ACTIVE - "0" (off).
LED name should be set for each disk via kern.geom.disk.%s.led sysctl.
Later disk API could be extended to allow disk driver to report this info
in custom way via it's own facilities.
2011-03-24 19:23:42 +00:00
Alexander Motin
06f4c96d39 MFgraid/head r217827:
Change BIO_GETATTR("GEOM::kerneldump") API to make set_dumper() called by
consumer (geom_dev) instead of provider (geom_disk). This allows any geom
insert it's code into the dump call chain, implementing more sophisticated
functionality then just disk partitioning.
2011-03-24 08:37:48 +00:00
Maxim Sobolev
20cc2dc42e Some linux distros put mount point into the ext2fs labels, such as '/', or
'/boot', which confuses the devfs code and can cause userland programs to
fail reading /dev/ext2fs directory with weird error code, such as any
program that uses pwlib.

Strip any leading slashes before feeding the label to the geom_label code.

Sponsored by:	Sippy Software, Inc.

MFC after:	1 week
2011-03-08 17:00:31 +00:00
Nathan Whitehorn
65cb6238bd Add the disk ident and a human-meaningful description (here, the disk model
string) to the geom_disk config XML so that they are easily accessible from
userland.

MFC after:	1 week
2011-02-26 14:58:54 +00:00
Alexander Leidinger
cb08c2cc83 Add some FEATURE macros for various GEOM classes.
No FreeBSD version bump, the userland application to query the features will
be committed last and can serve as an indication of the availablility if
needed.

Sponsored by:	Google Summer of Code 2010
Submitted by:	kibab
Reviewed by:	silence on geom@ during 2 weeks
X-MFC after:	to be determined in last commit with code from this project
2011-02-25 10:24:35 +00:00
Rebecca Cran
6bccea7c2b Fix typos - remove duplicate "the".
PR:	bin/154928
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after: 	3 days
2011-02-21 09:01:34 +00:00
Yoshihiro Takahashi
9f0f6d5fd7 Add support to set a slice name. 2011-02-19 11:09:38 +00:00
Luigi Rizzo
67c1af9d00 Correct a subtle bug in the 'gsched_rr' disk scheduler.
The algorithm is supposed to work as follows:
in order to prevent starvation, when a new client starts being served we
record the start time and reset the counter of bytes served.
We then switch to a new client after a certain amount of time or bytes,
even if the current one still has pending requests.
To avoid charging a new client the time of the first seek,
we start counting time when the first request is served.

Unfortunately a bug in the previous version of the code failed
to set the start time in certain cases, resulting in some processes
exceeding their timeslice.

The fix (in this patch) is trivial, though it took a while to find
out and replicate the bug.
Thanks to Tommaso Caprai for investigating and fixing the problem.

Submitted by:	Tommaso Caprai
MFC after:	1 week
2011-02-14 08:09:02 +00:00
Marcel Moolenaar
1e189c0839 Use the preload_fetch_addr() and preload_fetch_size() convenience
functions to obtain the address and size of the preloaded key files.

Sponsored by: Juniper Networks.
2011-02-13 19:34:48 +00:00
Yoshihiro Takahashi
5d627bb558 Add support to write boot menu. 2011-02-11 13:18:00 +00:00
Andrey V. Elsukov
88007f6102 Add new user-friendly aliases for partition types for the MBR and
EBR schemes: fat32, ebr, linux-data, linux-raid, linux-swap and
linux-lvm. Add bios-boot GUID and alias for the GPT scheme. It used by
GRUB 2 loader. Also do sorting definitions of types in diskmbr.h
and in g_part.c.

PR:		bin/120990, kern/147664
MFC after:	2 weeks
2011-01-28 11:13:01 +00:00
Andrey V. Elsukov
1313160649 While inspecting the disklabel check that start offset of partition is
within provider's bounds. If not then reject this disklabel.
Mark bbarea as NULL to do not free it again in destroy method.

MFC after:	1 week
2011-01-27 08:02:26 +00:00
Matthew D Fleming
73d6f8516d Remove the CTLFLAG_NOLOCK as it seems to be both unused and
unfunctional.  Wiring the user buffer has only been done explicitly
since r101422.

Mark the kern.disks sysctl as MPSAFE since it is and it seems to have
been mis-using the NOLOCK flag.

Partially break the KPI (but not the KBI) for the sysctl_req 'lock'
field since this member should be private and the "REQ_LOCKED" state
seems meaningless now.
2011-01-26 22:48:09 +00:00
Konstantin Belousov
0ea2e01412 Treat async buffer writes from the gjournal switcher thread the same as
from syncer. We shall not sleep on running buffer space when suspending.

Reproduced and tested by:	pho
PR:	kern/154228
MFC after:	1 week
2011-01-26 10:34:21 +00:00
Andrey V. Elsukov
799eac8c3d Limit maximum number of GPT entries to 4k. It is most realistic value
and can prevent kernel memory exhausting when big value is specified
from command line.

Split reading and writing operation to several iteration to do not
trigger KASSERT when data length is greater than MAXPHYS.

PR:             kern/144962, kern/147851
MFC after:      2 weeks
2011-01-18 09:52:53 +00:00
Matthew D Fleming
0c2b0e03f7 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the geom piece.
2011-01-12 19:54:07 +00:00
Andrey V. Elsukov
95959703e1 Sector size can not be greater than MAXPHYS. Since GRAID3 calculates
sector size from user-specified block size, report to user about
big blocksize.

PR:		kern/147851
MFC after:	1 week
2011-01-12 13:55:01 +00:00
Andrey V. Elsukov
e76dc5129a Sector size can not be greater than MAXPHYS.
MFC after:	1 week
2011-01-12 12:26:10 +00:00
Andrey V. Elsukov
eaaef50811 Remove redundant check.
MFC after:	1 week
2011-01-11 13:22:20 +00:00
Andrey V. Elsukov
f2b3e9e870 Round GNOP provider's mediasize to its sectorsize. This prevents KASSERT
in g_io_request when geom classes doing tasting.

PR:		kern/147852
MFC after:	1 week
2011-01-11 11:42:22 +00:00
Matthew D Fleming
ed7beddc48 Fix a memory overflow where the input length to g_gpt_utf8_to_utf16()
was specified incorrectly, causing the bzero to run past the end of a
malloc(9)'d object.

Submitted by:	Eric Youngblut < eyoungblut AT isilon DOT com >
MFC after:	3 days
2011-01-07 16:46:20 +00:00
Nathan Whitehorn
e76b061420 Add an entry to the gpart XML to determine if the geom has pending changes
that need to be committed (or undone).

MFC after:	2 weeks
2011-01-06 03:36:04 +00:00
Konstantin Belousov
23b70c1ae2 Finish r210923, 210926. Mark some devices as eternal.
MFC after:	2 weeks
2011-01-04 10:59:38 +00:00
Konstantin Belousov
d91e813c7b Add reporting of GEOM::candelete BIO_GETATTR for md(4) and geom_disk(4).
Non-zero value of attribute means that device supports BIO_DELETE.

Suggested and reviewed by:	pjd
Tested by:	pho
MFC after:	1 week
2010-12-29 12:11:07 +00:00
Andrey V. Elsukov
f25481193e Allow destroying EBR in COMPAT (default) mode.
MFC after:	2 week
2010-12-28 08:42:12 +00:00
Andrey V. Elsukov
d3507dff37 Make EBR probe method less strictly to be able detect EBRs with
small non fatal inconsistency. EBR may contain boot loader and sometimes
it just has some garbage data. Now this does not prevent FreeBSD to use
extended partitions. But since we do not support bootcode for EBR we mark
tables which have non empty boot area as corrupt. This does make them
readonly and we can not damage this data.

PR:		kern/141235
MFC after:	1 month
2010-12-28 08:36:44 +00:00
Rebecca Cran
fa5f3816c4 Don't warn if a partition appears not to be aligned on a track boundary.
Modern disks use LBA and create a fake CHS geometry that doesn't have any
relation to the on-disk layout of data.
2010-12-07 20:46:11 +00:00
Ivan Voras
e5c723f123 Add a note about the magic number 20. Actually, 22.75 entries fit in
a 512 byte sector but when choosing magic numbers, 20 looks nicer.

Discussed with:	marcel
2010-12-02 19:47:27 +00:00
Jaakko Heinonen
e5a2338118 - Report an error when a label with invalid name is attempted to be
created with glabel(8).
- Fix a typo in an error message.
- Fix comment typos.

Approved by:	pjd
2010-12-01 19:24:07 +00:00
Jaakko Heinonen
f7842e00f5 Use g_eventlock to protect against losing wakeups in the g_event process
and replace tsleep(9) with msleep(9) which doesn't use a timeout. The
previously used timeout caused the event process to wake up ten times
per second on an idle system.

one_event() is now called with the topology lock held and it returns
with both the topology and event locks held when there are no more
events in the queue.

Reported by:	mav, Marius Nünnerich
Reviewed by:	freebsd-geom
2010-11-22 16:47:53 +00:00
Ed Schouten
eb4c31fd41 Add support for asterisk characters when filling in the GELI password
during boot.

Change the last argument of gets() to indicate a visibility flag and add
definitions for the numerical constants. Except for the value 2, gets()
will behave exactly the same, so existing consumers shouldn't break. We
only use it in two places, though.

Submitted by:	lme (older version)
2010-11-14 14:12:43 +00:00
Andrey V. Elsukov
55514bdfc0 Fix regression introduced in r215088: gpart(8) reports
"arg0 'provider': Invalid argument" after creating new partition
table.
Move code for search of existing geom into g_part_find_geom
function and use this function instead of g_part_parm_geom
in g_part_ctl_create.

Approved by:	kib (mentor)
2010-11-11 12:13:41 +00:00
Andrey V. Elsukov
7085c3bc98 In r212554 name of G_PART_PARM_GEOM and G_PART_PARM_PROVIDER
ctlreq parameters was changed to "arg0". Fix the last place where
it is used.

Approved by:	kib (mentor)
2010-11-10 14:38:51 +00:00
Jaakko Heinonen
9d142a6ee6 Extend the g_eventlock mutex coverage in one_event() to include setting
of the EV_DONE flag and use the mutex to protect against losing wakeups
in g_waitfor_event().

Reported by:	davidxu
Tested by:	davidxu
Discussed on:	freebsd-current
2010-11-03 16:19:35 +00:00
Andrey V. Elsukov
e7926a3703 Reimplemented "gpart destroy -F". Now it does all work in kernel.
This was needed for recover implementation.

Implement the recover command for GPT. Now GPT will marked as
corrupt when any of three types of corruption will be detected:
1. Damaged primary GPT header or table
2. Damaged secondary GPT header or table
3. Secondary header is not located in the last LBA
Marked GPT becomes read-only. Any changes with corrupt table
are prohibited. Only "destroy" and "recover" commands are allowed.

Discussed with:	geom@ (mostly silence)
Tested by:	Ilya A. Arhipov
Approved by:	mav (mentor)
MFC after:	2 weeks
2010-10-25 16:23:35 +00:00
Pawel Jakub Dawidek
0d2f5a4eaa - Improve error messages, so instead of 'Not fully done', the user will get
information that device is already suspended or that device is using
  one-time key and suspend is not supported.
- 'geli suspend -a' silently skips devices that use one-time key, this is fine,
  but because we log which device were suspended on the console, log also which
  devices were skipped.
2010-10-22 22:58:00 +00:00
Pawel Jakub Dawidek
2f2d7830b5 Close a race between checking if device is already suspended and suspending it. 2010-10-22 22:54:26 +00:00
Pawel Jakub Dawidek
d8d61ef8fc Add State tag, so 'geli status' will report active/suspended status, eg:
# geli status
	   Name     Status  Components
	da0.eli  SUSPENDED  da0
	da1.eli     ACTIVE  da1
2010-10-22 22:45:26 +00:00
Pawel Jakub Dawidek
4f294e1289 Encryption keys array might be NULL if device is suspended. Check for this, so
we don't panic when we detach suspended device.
2010-10-22 22:44:09 +00:00
Pawel Jakub Dawidek
1d0214411e Move sc_akeyctx and sc_ivctx initialization to the g_eli_mkey_propagate()
function which eliminates code duplication and will ensure proper order
of operation.
2010-10-22 22:13:11 +00:00
Pawel Jakub Dawidek
3ac01bc2ae Free opencrypto sessions on suspend, as they also might keep encryption keys. 2010-10-21 19:44:28 +00:00
Pawel Jakub Dawidek
738ffa9780 Fix a bug introduced in r213067 where we use authentication key before
initializing it.
2010-10-21 12:58:26 +00:00
Pawel Jakub Dawidek
5ad4a7c74a Bring in geli suspend/resume functionality (finally).
Before this change if you wanted to suspend your laptop and be sure that your
encryption keys are safe, you had to stop all processes that use file system
stored on encrypted device, unmount the file system and detach geli provider.

This isn't very handy. If you are a lucky user of a laptop where suspend/resume
actually works with FreeBSD (I'm not!) you most likely want to suspend your
laptop, because you don't want to start everything over again when you turn
your laptop back on.

And this is where geli suspend/resume steps in. When you execute:

	# geli suspend -a

geli will wait for all in-flight I/O requests, suspend new I/O requests, remove
all geli sensitive data from the kernel memory (like encryption keys) and will
wait for either 'geli resume' or 'geli detach'.

Now with no keys in memory you can suspend your laptop without stopping any
processes or unmounting any file systems.

When you resume your laptop you have to resume geli devices using 'geli resume'
command. You need to provide your passphrase, etc. again so the keys can be
restored and suspended I/O requests released.

Of course you need to remember that 'geli suspend' won't clear file system
cache and other places where data from your geli-encrypted file system might be
present. But to get rid of those stopping processes and unmounting file system
won't help either - you have to turn your laptop off. Be warned.

Also note, that suspending geli device which contains file system with geli
utility (or anything used by 'geli resume') is not very good idea, as you won't
be able to resume it - when you execute geli(8), the kernel will try to read it
and this read I/O request will be suspended.
2010-10-20 20:50:55 +00:00
Pawel Jakub Dawidek
056638c469 - Add missing comments.
- Make a comment consistent with others.
2010-10-20 20:01:45 +00:00
Jaakko Heinonen
bc2589f5b7 Use make_dev_p(9) with the MAKEDEV_CHECKNAME flag instead of make_dev(9)
and print a diagnostic if the call fails.

This avoids a panic when a device with an invalid name is attempted to
be registered. For example the label class gets device names from
untrusted input.

Reviewed by:	freebsd-geom
2010-10-19 16:48:49 +00:00
Rui Paulo
42a783c16a The canonical way to print __func__ when using KASSERT() is to write
("%s", __func__). This avoids clang's -Wformat-string warnings.
2010-10-13 11:35:59 +00:00
Andrey V. Elsukov
21bf062e7e Replace strlen(_PATH_DEV) with sizeof(_PATH_DEV) - 1.
Suggested by:	kib
Approved by:	kib (mentor)
MFC after:	5 days
2010-10-09 20:20:27 +00:00
Ulf Lilleengen
de02b15928 - Check flag with the bitwise operator, not the logical operator.
Submitted by:	arundel
MFC after:	1 week
2010-10-01 06:12:13 +00:00
Andrey V. Elsukov
b1da166ef1 Some schemes can allocate memory for internal purposes but when
GEOM does withering this memory doesn't freed. Add G_PART_DESTROY
call to g_part_wither. Also add missed g_free() call to G_PART_READ
method for MBR and PC98 schemes.

Submitted by:	jh (previous version)
Reviewed by:	pjd
Approved by:	kib (mentor)
2010-09-25 18:27:29 +00:00
Pawel Jakub Dawidek
f95168e08d Change g_eli_debug to int, so one can turn off any GELI output by setting
kern.geom.eli.debug sysctl to -1.

MFC after:	2 weeks
2010-09-25 10:32:04 +00:00
Pawel Jakub Dawidek
350e8df8de Ignore errors from BIO_FLUSH. It might confuse users that provider wasn't
really killed. What we really care about are write errors only.

MFC after:	2 weeks
2010-09-25 10:31:05 +00:00
Pawel Jakub Dawidek
cec283baf4 Allow to configure GPT attributes. It shouldn't be allowed to set bootfailed
attribute (it should be allowed only to unset it), but for test purposes it
might be useful, so the current code allows it.

Reviewed by:	arch@ (Message-ID: <20100917234542.GE1902@garage.freebsd.pl>)
MFC after:	2 weeks
2010-09-24 19:33:47 +00:00
Pawel Jakub Dawidek
9839c97b4d Update copyright years.
MFC after:	1 week
2010-09-23 12:02:08 +00:00
Pawel Jakub Dawidek
9a5a1d1e1e Add support for AES-XTS. This will be the default now.
MFC after:	1 week
2010-09-23 11:58:36 +00:00
Pawel Jakub Dawidek
c6a26d4c88 Implement switching of data encryption key every 2^20 blocks.
This ensures the same encryption key won't be used for more than
2^20 blocks (sectors). This will be the default now.

MFC after:	1 week
2010-09-23 11:49:47 +00:00
Pawel Jakub Dawidek
1f0fb66f30 Make the code similar to the code in g_eli_integrity.c.
MFC after:	1 week
2010-09-23 11:23:10 +00:00
Pawel Jakub Dawidek
b35bfe7e10 Define default overwrite count, so that userland can use it.
MFC after:	1 week
2010-09-23 11:19:48 +00:00
Pawel Jakub Dawidek
5e6dce4bf0 When trashing metadata, flush after each write.
MFC after:	1 week
2010-09-23 10:43:37 +00:00
Brian Somers
0f81f3046d Support attaching version 4 metadata
Reviewed by:	pjd
2010-09-19 10:45:53 +00:00
Alexander Motin
659f684ea0 Add support for dumping kernel to gconcat.
Dumping goes to the component, where dump partition begins.
2010-09-16 17:24:25 +00:00
Pawel Jakub Dawidek
2738b715ea Change message when setting or unsetting attribute less confusing.
Before:

	ada0 has <attrib> set

After:

	<attrib> set on ada0

MFC after:	2 weeks
2010-09-15 21:15:00 +00:00
Pawel Jakub Dawidek
2f4e9a099b Make the message that informs about bootcode being written to disk less
confusing.

Note there is still no information about 'partcode' being written to disk
(gpart bootcode -p <partcode> <disk>).

Maybe in the future all the messages printed by gpart(8) on success could be
hidden under -v?

PR:		bin/150239
Reported by:	Roddi <roddi@me.com>
Submitted by:	arundel
MFC after:	2 weeks
2010-09-15 20:59:13 +00:00
Pawel Jakub Dawidek
8107ecf892 - Change all places where G_TYPE_ASCNUM is used to G_TYPE_NUMBER.
It turns out the new type wasn't really needed.
- Reorganize code a little bit.
2010-09-14 16:21:13 +00:00
Pawel Jakub Dawidek
b312136354 Simplify the code a bit. 2010-09-14 11:42:07 +00:00
Pawel Jakub Dawidek
946e2f3595 - Remove gc_argname field. It was introduced for gpart(8), but if I
understand everything correctly, we don't really need it.
- Provide default numeric value as strings. This allows to simplify
  a lot of code.
- Bump version number.
2010-09-13 13:48:18 +00:00
Pawel Jakub Dawidek
a478ea7490 - Allow to specify value as const pointers.
- Make optional string values always an empty string.
2010-09-13 08:56:07 +00:00
Justin T. Gibbs
f03f7a0ca3 Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic.
Add the BIO_ORDERED flag for struct bio and update bio clients to use it.

The barrier semantics of bioq_insert_tail() were broken in two ways:

 o In bioq_disksort(), an added bio could be inserted at the head of
   the queue, even when a barrier was present, if the sort key for
   the new entry was less than that of the last queued barrier bio.

 o The last_offset used to generate the sort key for newly queued bios
   did not stay at the position of the barrier until either the
   barrier was de-queued, or a new barrier (which updates last_offset)
   was queued.  When a barrier is in effect, we know that the disk
   will pass through the barrier position just before the
   "blocked bios" are released, so using the barrier's offset for
   last_offset is the optimal choice.

sys/geom/sched/subr_disk.c:
sys/kern/subr_disk.c:
	o Update last_offset in bioq_insert_tail().

	o Only update last_offset in bioq_remove() if the removed bio is
	  at the head of the queue (typically due to a call via
	  bioq_takefirst()) and no barrier is active.

	o In bioq_disksort(), if we have a barrier (insert_point is non-NULL),
	  set prev to the barrier and cur to it's next element.  Now that
	  last_offset is kept at the barrier position, this change isn't
	  strictly necessary, but since we have to take a decision branch
	  anyway, it does avoid one, no-op, loop iteration in the while
	  loop that immediately follows.

	o In bioq_disksort(), bypass the normal sort for bios with the
	  BIO_ORDERED attribute and instead insert them into the queue
	  with bioq_insert_tail().  bioq_insert_tail() not only gives
	  the desired command order during insertion, but also provides
	  barrier semantics so that commands disksorted in the future
	  cannot pass the just enqueued transaction.

sys/sys/bio.h:
	Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.

sys/cam/ata/ata_da.c:
sys/cam/scsi/scsi_da.c
	Use an ordered command for SCSI/ATA-NCQ commands issued in
	response to bios with the BIO_ORDERED flag set.

sys/cam/scsi/scsi_da.c
	Use an ordered tag when issuing a synchronize cache command.

	Wrap some lines to 80 columns.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
sys/geom/geom_io.c
	Mark bios with the BIO_FLUSH command as BIO_ORDERED.

Sponsored by:	Spectra Logic Corporation
MFC after:	1 month
2010-09-02 19:40:28 +00:00
Pawel Jakub Dawidek
efb46508ce Correct offset conversion to little endian. It was implemented in version 2,
but because of a bug it was a no-op, so we were still using offsets in native
byte order for the host. Do it properly this time, bump version to 4 and set
the G_ELI_FLAG_NATIVE_BYTE_ORDER flag when version is under 4.

MFC after:	2 weeks
2010-08-28 08:30:20 +00:00
Alexander Motin
3d7cfb15f5 Remove bintime_cmp() function, unused since r200086.
MFC after:	1 week
2010-08-18 15:38:10 +00:00
Andrey V. Elsukov
d02dc4cd41 Check that gsp is not NULL before access. It can be NULL
for some cases.

Approved by:	kib (mentor)
MFC after:	1 week
2010-08-03 11:21:17 +00:00
Andrey V. Elsukov
a45f4c6e2c Check that table is not NULL before access, it can be NULL
for some cases.

Approved by:	mav (mentor)
MFC after:	2 weeks
2010-08-03 09:10:48 +00:00
Andrey V. Elsukov
a80f05bb73 Forward ioctl requests to original geom.
PR:		148540
Silence from:	luigi
Reviewed by:	pjd
Approved by:	mav (mentor)
MFC after:	2 weeks
2010-08-02 10:30:49 +00:00
Andrey V. Elsukov
b6d4028166 Release access for consumers that are opened, but will be destroyed
indirectly by orphan method.

PR:		148688
Silence from:	marcel
Approved by:	mav (mentor)
MFC after: 	2 weeks
2010-08-02 10:26:15 +00:00
Alexander Motin
8edcf69406 Export PCI IDs of ATA/SATA controllers through CAM and ata(4) layers to
GEOM. This information needed for proper soft-RAID's on-disk metadata
reading and writing.
2010-07-25 15:43:52 +00:00
Andrey V. Elsukov
733a9e2783 Prevent access after free to table entry in case when
user deletes partition that not yet created (changes doesn't
committed to disk).

PR:		148687
Approved by:	mav (mentor)
MFC after:	7 days
2010-07-23 06:30:01 +00:00
Ruslan Ermilov
cf1457e4fd Fixed cache size decoding read from a label.
PR:		kern/144732
Submitted by:	Eugene Grosbein
MFC after:	3 days
2010-07-14 08:22:00 +00:00
Rui Paulo
c6b2b6fce6 Add NTFS partition type to GEOM_MBR. 2010-06-26 13:20:40 +00:00
Pawel Jakub Dawidek
2aa15ffdab 'unit' can be negative, so use signed type for it.
Found by:	Coverity Prevent
CID:		3731
MFC after:	3 days
2010-06-14 21:58:55 +00:00
Pawel Jakub Dawidek
15725379d0 BIO_DELETE contains range we want to delete and doesn't provide any useful
data, so there is no need to copy it to userland.

MFC after:	3 days
2010-06-14 21:56:24 +00:00
Andriy Gapon
1bdfff2252 fix a few cases where a string is passed via format argument instead of
via %s

Most of the cases looked harmless, but this is done for the sake of
correctness.  In one case it even allowed to drop an intermediate buffer.

Found by:	clang
MFC after:	2 week
2010-06-11 19:27:21 +00:00
Edward Tomasz Napierala
7ce513a52a Untangle g_print_bio(), silencing Coverity.
Found with:	Coverity Prevent
CID:		3566, 3567
2010-06-10 17:49:36 +00:00
Matt Jacob
59ccfe8176 Try and narrow the gap in which you act on an event that has been canceled.
Obtained from:	Jaako Heinonen
MFC after:	1 month
2010-06-08 22:40:02 +00:00
Edward Tomasz Napierala
c01eb2f36b Make sure not to pass NULL to g_orphan_provider().
Found with:	Coverity Prevent
CID:		3411
2010-06-05 08:00:52 +00:00
Marius Strobl
36066952e5 Don't leak memory on destruction.
Reviewed by:	marcel
MFC after:	3 days
2010-06-02 17:17:11 +00:00
Andriy Gapon
56b3acd001 g_label: fix possible NULL pointer dereference
in case glabel debug level is >= 1 and gp->provider list is empty
for some reason

Found by:	clang static analyzer
MFC after:	4 days
2010-05-31 09:10:39 +00:00
Marius Strobl
785c3f7ea4 Fix some whitespace nits. 2010-05-24 17:33:02 +00:00
Nathan Whitehorn
0532c3a5a5 Teach gpart about bootcode on APM. 2010-05-16 22:21:33 +00:00
Matt Jacob
87e7f7be89 Yet another potential dereference of a dead provider.
Sponsored by:   Panasas
MFC after:	1 week
2010-05-14 21:27:39 +00:00
Matt Jacob
1371a457d9 Make sure to check that the active provider pointer points to something before
dereferencing the pointer.

Sponsored by:   Pansas
MFC after:	1 week
2010-05-14 16:56:18 +00:00
Jaakko Heinonen
3535526b15 - Don't return EAGAIN from gv_unload(). It was used to work around the
deadlock fixed in r207671.
- Wait for worker process to exit at class unload. The worker process
  was not guaranteed to exit before the linker unloaded the module.
- Use 0 as the worker process exit status instead of ENXIO and style
  the NOTREACHED comment.

Reviewed by:	lulf
X-MFC after:	r207671
2010-05-10 19:12:23 +00:00
Jaakko Heinonen
5a279fc5fc In g_zero_destroy_geom(), return 0 instead of EBUSY in the success case.
EBUSY was probably used as a workaround for the deadlock fixed in r207671.

Approved by:	pjd
X-MFC after:	r207671
2010-05-10 19:08:53 +00:00
Ulf Lilleengen
42a9ad6697 - Remove obsolete flags.
MFC after:	1 week
2010-05-08 16:19:17 +00:00
Jaakko Heinonen
9061251f9a Fix deadlock between GEOM class unloading and withering. Withering can't
proceed while g_unload_class() blocks the event thread. Fix this by not
running g_unload_class() as a GEOM event and dropping the topology lock
when withering needs to proceed.

PR:		kern/139847
Silence on:	freebsd-geom
2010-05-05 18:53:24 +00:00
Marcel Moolenaar
c74f160cb0 Re-calculate a geometry when reprobing as well.
PR:		kern/145452
Reported by:	"Andrey V. Elsukov" <bu7cher@yandex.ru>
2010-04-25 01:56:39 +00:00
Marcel Moolenaar
6f702278e6 Fix undo for schemes that have internal partitions. Internal partitions
do not constitute user-visible or active partitions and as such should
not prevent undoing pending operations.

While here, initialize the last usable sector for the placeholder geom
based on the null scheme, created to allow undoing the destruction of
a scheme. This gives consistent output with "gpart show".

Based on a patch from:	"Andrey V. Elsukov" <bu7cher@yandex.ru>
2010-04-25 00:54:11 +00:00
Marcel Moolenaar
3f71c319f4 Implement the resize verb and add support for resizing partitions
for all schemes but EBR. Quality work by Andrey!

Submitted by:	"Andrey V. Elsukov" <bu7cher@yandex.ru>
2010-04-23 03:11:39 +00:00
Jaakko Heinonen
002d1d1c38 Fix ddb(4) "show geom addr" command when INVARIANTS is enabled. Don't
assert that the topology lock is held when g_valid_obj() is called from
debugger.

MFC after:	1 week
2010-04-19 20:07:35 +00:00
Pawel Jakub Dawidek
31c4cef715 Use lower priority for GELI worker threads. This improves system
responsiveness under heavy GELI load.

MFC after:	3 days
2010-04-15 16:34:06 +00:00
Andriy Gapon
2a842317eb g_io_check: respond to zero pp->mediasize with ENXIO
Previsouly this condition was reported with EIO by bio_offset > mediasize
check.
Perhaps that check should be extended to bio_offset+bio_length > mediasize.

MFC after:	1 week
2010-04-15 08:39:56 +00:00
Luigi Rizzo
83f8218814 fix copyright format, as requested by Joel Dahl 2010-04-13 09:56:17 +00:00
Luigi Rizzo
c36cf6fbbc make code compile with KTR 2010-04-13 09:53:08 +00:00
Luigi Rizzo
1831a90ac5 Bring in geom_sched, support for scheduling disk I/O requests
in a device independent manner. Also include an example anticipatory
scheduler, gsched_rr, which gives very nice performance improvements
in presence of competing random access patterns.

This is joint work with Fabio Checconi, developed last year
and presented at BSDCan 2009. You can find details in the
README file or at

http://info.iet.unipi.it/~luigi/geom_sched/
2010-04-12 16:37:45 +00:00
Andriy Gapon
8f128ff559 g_vfs_open: allow only one mount per device vnode
In other words, deny multiple read-only mounts of the same device.
Shared read-only mounts should theoretically be possible, but,
unfortunately, can not be implemented correctly using current
buffer cache code/interface and results in an eventual system crash.
Also, using nullfs seems to be a more efficient way to achieve the same
goal.

This gets us back to where we were before GEOM and where other BSDs are.

Submitted by:	pjd (idea for checking for shared mounting)
Discussed with:	phk, pjd
Silence from:	fs@, geom@
MFC after:	2 weeks
2010-04-03 08:53:53 +00:00
Andriy Gapon
1b4bc5f851 bo_bsize: revert r205860 and take an alternative approch in getblk
In r205860 I missed the fact that there is code that strongly assumes
that devvp bo_bsize is equal to underlying provider's sectorsize.
In those places it is hard to obtain the sectorsize in an alternative
way if devvp bo_bsize is set to something else.
So, I am reverting bo_bsize assigment in g_vfs_open.
Instead, in getblk I use DEV_BSIZE block size for b_offset calculation
if vp is a disk vp as reported by vn_isdisk.  This should coinside with
vp being a devvp.

Reported by:	Mykola Dzham <i@levsha.me>
Tested by:	Mykola Dzham <i@levsha.me>
Pointyhat to:	avg
MFC after:	2 weeks
X-ToDo:		convert bread(devvp) in all fs to use bo_bsize-d blocks
2010-04-02 15:12:31 +00:00
Andriy Gapon
0c04f06072 g_vfs_open: correctly set devvp.v_bufobj.bo_bsize to DEV_BSIZE
Because of how breadn -> bufstrategy -> g_vfs_strategy are currently
implemented, bread on devvp always expects DEV_BSIZE block size.
Thus, devvp bo_bsize must always be DEV_BSIZE irrespective of media
properties or filesystem implementation details.

Reviewed by:	mckusick
MFC after:	2 weeks
2010-03-29 20:34:25 +00:00
Matt Jacob
2b4969ff9e Change how multipath labels are created and managed. This makes it easier
to support various storage boxes which really aren't active-active.

We only write the label on the *first* provider. For all other providers
we just "add" the disk. This also allows for an "add" verb.

A usage implication is that you should specificy the currently active
storage path as the first provider.

Note that this does not add RDAC-like functionality, but better allows for
autovolumefailover configurations (additional checkins elsewhere will support
this).

Sponsored by:	Panasas
MFC after:	1 month
2010-03-29 18:04:06 +00:00
Alexander Motin
a5be8eb530 Do not fetch precise time of request start when stats collection disabled.
Reviewed by:	pjd, phk
2010-03-24 18:04:25 +00:00
Matt Jacob
b5dce617d8 Add 'rotate' and 'getactive' verbs to provide some control and information
about what the currently active path is.

Sponsored by:	Panasas
MFC after:	1 month
2010-03-21 15:02:47 +00:00
Jaakko Heinonen
a41aa4a789 Escape characters unsafe for XML output in GEOM class, instance and
provider names.

- Characters in range 0x01-0x1f except '\t', '\n', and '\r' are replaced
  with '?'. Those characters are disallowed in XML.
- '&', '<', '>', '\'', '"' and characters in range 0x7f-0xff are
  replaced with XML numeric character reference.

If the kern.geom.confxml sysctl provides invalid XML, libgeom
geom_xml2tree() fails and utilities using it do not work. Unsafe
characters are common in msdosfs and cd9660 labels.

PR:		kern/104389
Submitted by:	Doug Steinwand (original version)
Reviewed by:	pjd
Discussed on:	freebsd-geom
MFC after:	3 weeks
2010-03-20 16:16:13 +00:00
Pawel Jakub Dawidek
b0990a1dae Simplify loops. 2010-03-18 13:11:43 +00:00
Ulf Lilleengen
77d2a01ea8 - Set missing flag when initiating a plex rebuild with the rebuildparity
command.
- Check if plex is already syncing or rebuilding before initiating a parity
  rebuild or check.
2010-03-08 21:16:28 +00:00
Pawel Jakub Dawidek
32115b105a Please welcome HAST - Highly Avalable Storage.
HAST allows to transparently store data on two physically separated machines
connected over the TCP/IP network. HAST works in Primary-Secondary
(Master-Backup, Master-Slave) configuration, which means that only one of the
cluster nodes can be active at any given time. Only Primary node is able to
handle I/O requests to HAST-managed devices. Currently HAST is limited to two
cluster nodes in total.

HAST operates on block level - it provides disk-like devices in /dev/hast/
directory for use by file systems and/or applications. Working on block level
makes it transparent for file systems and applications. There in no difference
between using HAST-provided device and raw disk, partition, etc. All of them
are just regular GEOM providers in FreeBSD.

For more information please consult hastd(8), hastctl(8) and hast.conf(5)
manual pages, as well as http://wiki.FreeBSD.org/HAST.

Sponsored by:	FreeBSD Foundation
Sponsored by:	OMCnet Internet Service GmbH
Sponsored by:	TransIP BV
2010-02-18 23:16:19 +00:00
Pawel Jakub Dawidek
12f35a615a - Style fixes.
- Prefer strlcpy() over strncpy().
2010-02-18 22:29:35 +00:00
Pawel Jakub Dawidek
f24bf7522d Correct comment. 2010-02-18 22:28:12 +00:00
Pawel Jakub Dawidek
e5131ab452 Log attach just like we log detach. 2010-02-18 22:27:38 +00:00
Oleksandr Tymoshenko
45a7687f90 - Give geom_redboot taste of flash/spi. Now there is another provider
of redboot partitions. This patch was missed during merge from
    projects/mips.
2010-02-03 01:12:19 +00:00
Xin LI
38907b4cc7 Prevent NULL deference by checking return value of
gctl_get_asciiparam.

MFC after:	2 weeks
2010-02-02 22:25:22 +00:00
Marcel Moolenaar
cd18ad8347 Export the UUID of the partition in the XML. The partition UUID is used
by EFI's device path to identify a partition. In order for FreeBSD to
add EFI boot options, proper device paths need to be constructed.
2010-01-30 23:13:19 +00:00
Ivan Voras
49e232f2c9 Go through with write_metadata() non-error-handling and make it return "void".
This is mostly to avoid dead variable assignment warning by LLVM.
No functional change.

Pointed out by:	trasz
Approved by:	gnn (mentor)
2010-01-25 20:51:40 +00:00
Edward Tomasz Napierala
fdf64c5752 Remove unneeded variables.
Found with:	clang
2010-01-25 17:00:21 +00:00
Edward Tomasz Napierala
1373012510 Remove pointless assignment.
Found with:	clang
2010-01-25 16:58:58 +00:00
Edward Tomasz Napierala
dc9098605e Remove some pointless variable assignments.
Found with:	clang
2010-01-25 16:55:30 +00:00
Edward Tomasz Napierala
0a36cb97a8 Remove unused variable.
Found with:	clang
2010-01-25 16:10:22 +00:00
Xin LI
35daa28f30 Expose stripe offset and stripe size through libgeom and geom(8) userland
utilities.

Reviewed by:	pjd, mav (earlier version)
2010-01-17 06:20:30 +00:00
Edward Tomasz Napierala
b3f9d8c804 Add gmountver, disk mount verification GEOM class.
Note that due to e.g. write throttling ('wdrain'), it can stall all the disk
I/O instead of just the device it's configured for.  Using it for removable
media is therefore not a good idea.

Reviewed by:	pjd (earlier version)
2010-01-16 09:52:49 +00:00
Alexander Motin
0c8fd0c8ac Change the way in which zero stripesize is handled. Instead of reporting
zero stripeoffset in such case (as if device has no stripes), report offset
from the beginning of the media (as if device has single infinite stripe).

This gives partitioning tools information, required to guess better
partition alignment, in case if hardware doesn't report it's stripe size.
For example, it should give disklabel info about odd offset made by fdisk.
2010-01-06 13:14:37 +00:00
Alexander Motin
8de5811320 Move wakeup() out of mutex to reduce contention. 2010-01-05 10:52:21 +00:00
Alexander Motin
86de0ca52c Move wakeup() out of mutex to reduce contention. 2010-01-05 10:30:56 +00:00
Alexander Motin
06b215fd3a Slightly optimize XOR calculation. 2010-01-05 02:06:05 +00:00
Marcel Moolenaar
665bb830e2 Properly return the UUID represented by the alias.
PR:		142174
Submitted by:	Przemyslaw Laczynski <torindel@gmail.com>
Pointy hat to:	rpaulo
2010-01-02 01:02:59 +00:00
Alexander Motin
0d883b11e3 Call wakeup() only for the first request on the queue. 2009-12-30 17:23:27 +00:00
Antoine Brodin
13e403fdea (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR:		137213
Submitted by:	Eygene Ryabinkin (initial version)
MFC after:	1 month
2009-12-28 22:56:30 +00:00
Alexander Motin
1c80ec0a6b Add BIO_DELETE support to ada(4):
- For SSDs use TRIM feature of DATA SET MANAGEMENT command, as defined by
ACS-2 specification working draft.
- For CompactFlash use CFA ERASE command, same as ad(4) does.

With this patch, `newfs -E /dev/ada1` was able to restore write speed of
my heavily weared OCZ Vertex SSD (firmware 1.4) up to the initial level
for the most part of it's capacity. Previous 1.3 firmware, even reportiong
TRIM capabilty bit set, was not working, reporting ABORT error for every
DSM command.

I have no idea whether it is normal, but for some reason it takes 200ms
to handle any TRIM command on this drive, that was making delete extremely
slow. But TRIM command is able to accept long list of LBAs and the length of
that list seems doesn't affect it's execution time. Implemented request
clusting algorithm allowed me to rise delete rate up to reasonable numbers,
when many parallel DELETE requests running.
2009-12-28 20:08:01 +00:00
Alexander Motin
5f9b1143ac Make geom_concat to passthrough stripe parameters of the first component,
hoping that rest will fit.
2009-12-24 14:32:21 +00:00
Alexander Motin
113d8e5046 As soon as geom_raid3 reports it's own stripe as sector size, report largest
underlying provider's stripe, multiplied by number of data disks in array,
due to transformation done, as array stripe.
2009-12-24 13:38:02 +00:00
Alexander Motin
92f60381d9 As soon as mirror has no own stripes, report largest stripe of unrerlying
components, hoping others fit, if they are not equal.
2009-12-24 12:17:22 +00:00
Alexander Motin
8b30323843 Add two disk ioctls, giving user-level tools information about disk/array
stripe (optimal access block) size and offset.
2009-12-24 11:05:23 +00:00
Alexander Motin
f00919d2fc Make geom_stripe report it's stripe size to upper layers. 2009-12-24 10:43:44 +00:00
Alexander Motin
d4060fa67d Make graid3 fallback to malloc() when component request size is bigger
then maximal prepared UMA zone size. This fixes crash with MAXPHYS > 128K.
2009-12-21 23:31:03 +00:00
Rui Paulo
33f7a4124d Add Microsoft and NetBSD partition types handling. 2009-12-14 20:26:27 +00:00
Rui Paulo
f13174303d Simplify partition type parsing by using a data-oriented model.
While there add more Apple and Linux partition types.
2009-12-14 20:04:06 +00:00
Alexander Motin
891852cc12 Change 'load' balancing mode algorithm:
- Instead of measuring last request execution time for each drive and
choosing one with smallest time, use averaged number of requests, running
on each drive. This information is more accurate and timely. It allows to
distribute load between drives in more even and predictable way.
- For each drive track offset of the last submitted request. If new request
offset matches previous one or close for some drive, prefer that drive.
It allows to significantly speedup simultaneous sequential reads.

PR:		kern/113885
Reviewed by:	sobomax
2009-12-03 21:47:51 +00:00
Edward Tomasz Napierala
3ce9ca8947 Provide a set of sysctls and tunables to disable device node creation
for specific "kinds" of disk labels - for example, GPT UUIDs.  Reason
for this is that sometimes, other GEOM classes attach to these device
nodes instead of the proper ones - e.g. they attach to /dev/gptid/XXX
instead of /dev/ada0p2, which is annoying.

Reviewed by:	pjd (earlier version)
MFC after:	1 month
2009-11-28 11:57:43 +00:00
Rui Paulo
f9d551f7df Add a missing check for Apple HFS partitions.
MFC after:	1 week
2009-11-12 19:30:49 +00:00
Robert Noland
a59a131093 We need to allocate space for the header in the create path also.
This fixes a null pointer dereference with "gpart create -s GPT" after
the previous commit.

Reported by:	Yuri Pankov
Pointyhat to:	me
MFC after:	1 week
2009-11-12 16:28:39 +00:00
Robert Noland
1c2dee3cc9 Fix handling of GPT headers when size is > 92 bytes.
It is valid for an on-disk GPT header to report a header size which is
greater than 92 bytes.  Previously, we would read in the sector and copy
only the 92 bytes that we know how to deal with before calculating the
checksum for comparison.  This meant that when we did the checksum, we
overshot the buffer and took in random memory, so the checksum would fail.

We now determine the size of the header and allocate enough space to
preserve the entire on-disk contents.  This allows us to be correctly
calculate the checksum and be able to modify and write the header back
to the disk, while preserving data that we might not understand.

Reported by:	Kris Weston
Approved by:	marcel@
MFC after:	2 weeks
2009-11-07 17:29:03 +00:00
Robert Noland
e80d42dda2 Set the active flag in the PMBR when we install bootcode on a GPT
partitioned disk.  Some BIOS require this to be set before they will
boot the device.

Approved by:	marcel
MFC after:	2 weeks
2009-10-14 19:24:01 +00:00
Pawel Jakub Dawidek
f8727e71d7 If provider is open for writing when we taste it, skip it for classes that
depend on on-disk metadata. This was we won't attach to providers that are used
by other classes. For example we don't want to configure partitions on da0 if
it is part of gmirror, what we really want is partitions on mirror/foo.

During regular work it works like this: if provider is open for writing a class
receives the spoiled event from GEOM and detaches, once provider is closed the
taste event is send again and class can rediscover its metadata if it is still
there.  This doesn't work that way when new class arrives, because GEOM gives
all existing providers for it to taste, also those open for writing. Classes
have to decided on their own if they want to deal with such providers (eg.
geom_dev) or not (classes modified by this commit).

Reported by:	des, Oliver Lehmann <lehmann@ans-netz.de>
Tested by:	des, Oliver Lehmann <lehmann@ans-netz.de>
Discussed with:	phk, marcel
Reviewed by:	marcel
MFC after:	3 days
2009-10-09 09:42:22 +00:00
Ulf Lilleengen
a8a3cd7d9d - Improve error message consistency and wording. 2009-10-05 08:44:31 +00:00
Marcel Moolenaar
b61808630d The first 96 bytes may not be zeroes. It can contain trivial boot
code that merely emits an error and waits for a key press before
rebooting. The error being that extended partitions are not
bootable. The origin is presumed to be Windows 2000; Windows XP
does not do this...

For now, ignore the first 96 bytes when checking that the EBR is
(for the most part) all zeroes.

Tested by:	Mario Lobo <mlobo@digiart.art.br>
MFC after:	1 week
2009-09-28 23:52:47 +00:00
Marcel Moolenaar
87f4470620 Don't create more partitions than can fit in the table by checking
that the index is within bounds.
2009-09-24 06:00:49 +00:00
Edward Tomasz Napierala
bb3fd7ff4f Remove unused variable. 2009-09-08 17:20:17 +00:00
Alexander Motin
18e42503ed Do not check proper request alignment here in geom_dev in production.
It will be checked any way later by g_io_check() in g_io_schedule_down().
It is only needed here to not trigger panic from additional check, when
INVARIANTS enabled. So cover it with #ifdef INVARIANTS. It saves two
64bit divisions per request.
2009-09-08 05:46:38 +00:00
Alexander Motin
7fc019af65 MFp4:
Remove msleep() timeout from g_io_schedule_up/down(). It works fine
without it, saving few percents of CPU on high request rates without
need to rearm callout twice per request.
2009-09-06 19:33:13 +00:00
Pawel Jakub Dawidek
b740e905a4 Add support for changing providers priority.
Submitted by:	Mel Flynn
2009-09-06 06:52:06 +00:00
Alexander Motin
af582ea7af Remove artificial MAX_IO_SIZE constant, equal to DFLTPHYS * 2. Use MAXPHYS
instead. It is NULL change for GENERIC kernel, but allows 'fast' mode to
work on systems with increased MAXPHYS.
2009-09-04 19:20:46 +00:00
Pawel Jakub Dawidek
e93f5e4d25 Simplify g_disk_ident_adjust() function and allow any printable character
in serial number.

Discussed with:	trasz
Obtained from:	Wheel Sp. z o.o. (http://www.wheel.pl)
2009-09-04 09:39:06 +00:00
Pawel Jakub Dawidek
07a93e6b3c There's no need for checking result of M_WAITOK allocation. 2009-08-27 08:40:51 +00:00
Pawel Jakub Dawidek
c16ce31b31 Fix an obvious topology lock leak.
MFC after:	3 days
2009-08-27 08:28:34 +00:00
Marcel Moolenaar
8530137252 The start of the EFI GPT partition in the PMBR can always be represented
by CHS addressing. Don't define these fields as 0xff, but rather define
them correctly. This prevents boot problems on PCs where GPT is being
used.

PR:		115406
Submitted by:	Kent Hauser <kent@khauser.net>
Approved by:	re (kib)
2009-08-17 16:16:46 +00:00
Ulf Lilleengen
b79cac0f92 - Fix the issue with read access count modification on RAID-5 plexes properly.
If the access counts were not increased and decreased in equal numbers by
  gvinum consumers, the read access count would be inconsistent with the write
  access count. Instead, modify the read access count with the write access
  count directly to prevent any inconsistencies.

Approved by:	re (kib)
2009-07-18 11:12:48 +00:00
Marcel Moolenaar
f43b57e32a Revert revisions 188839 and 188868. Use of the ioctl in geom_dev.c
is invalid because the ioctl happens without prior open. The ioctl
got introduced to provide backward compatibility for extended
partitions, but it ended up not being used because it didn't work
as expected. Since there are no consumers of the ioctl and the
implementation is broken, the best fix is to remove the code
entirely.

Spotted by:	phk
Approved by:	re (kensmith)
2009-07-08 05:56:14 +00:00
Edward Tomasz Napierala
8edfe76ab5 Fix a panic which (reportedly) can happen when unmounting a filesystem
with I/O requests in flight on kernels compiled with "options INVARIANTS".
Also, make it obvious it's not right to call g_valid_obj() (and macros
using it, e.g. G_VALID_CONSUMER()) without topology lock held.

Approved by:	re (kib)
Reported by:	pho
2009-07-01 20:16:29 +00:00
Edward Tomasz Napierala
fb231f3627 Make gjournal work with kernel compiled with "options DIAGNOSTIC".
Previously, it would panic immediately.

Reviewed by:	pjd
Approved by:	re (kib)
2009-06-30 14:34:06 +00:00
Ulf Lilleengen
ac2a008e69 - Apply the same naming rules of LVM names as done in the LVM code itself.
PR:		kern/135874
2009-06-24 22:09:30 +00:00
John Hay
65a4957806 Do not stop the loop when an empty or deleted directory entry is found.
Rather just skip over it.
2009-06-24 06:42:13 +00:00
Ivan Voras
63f4d880e0 Fix tabs, slightly improve comments.
Approved by:	gnn (mentor) (original)
Noticed by:	stas
2009-06-18 11:12:11 +00:00
Ivan Voras
452f657cb9 Add support for labels derived from GPT metadata.
Approved by:	gnn (mentor)
Reviewed by:	pjd
PR:		128398
Submitted by:	Marius Nuennerich < marius at nuenneri.ch >
2009-06-13 00:27:03 +00:00
Luigi Rizzo
6231f75bcf As discussed in the devsummit, introduce two fields in the
struct bio to store classification information, and a hook
for classifier functions that can be called by g_io_request().

This code is from Fabio Checconi as part of his GSOC work.
2009-06-11 09:55:26 +00:00
Pawel Jakub Dawidek
cb9b72ce4a Simplify. 2009-06-05 23:35:43 +00:00
Doug Barton
8b3bfb0509 Crank the debug level necessary to display the "Label foo is removed"
and "Label for provider ..." messages up from 0 to 1.
2009-05-30 22:31:52 +00:00
Jamie Gritton
76ca6f88da Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
Ulf Lilleengen
4147dd02cd - Unbreak 64 bit platforms by casting off_t to intmax. 2009-05-26 14:15:06 +00:00
Ulf Lilleengen
6d66da20b7 - Fix wrong print on BIO_DONE.
- Use db_printf instead of printf. While here, apply this to other ddb commands
  as well.

Pointed out by:		pjd
2009-05-26 10:03:44 +00:00
Ulf Lilleengen
bf7d2c1797 - Add 'show bio' DDB command.
MFC after:	3 weeks
2009-05-26 07:29:17 +00:00
Edward Tomasz Napierala
916cd41c47 Check return value of gctl_get_asciiparam().
Found with:	Coverity Prevent(tm)
CID:		1118
2009-05-12 16:59:50 +00:00
Attilio Rao
dfd233edd5 Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS.  Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled.  Bump __FreeBSD_version in order to signal such
situation.
2009-05-11 15:33:26 +00:00
Ulf Lilleengen
d8d015cddc - Split up the BIO queue into a queue for new and one for completed requests.
This is necessary for two reasons:
  1) In order to avoid collisions with the use of a BIOs flags set by a consumer
     or a provider
  2) Because GV_BIO_DONE was used to mark a BIO as done, not enough flags was
     available, so the consumer flags of a BIO had to be misused in order to
     support enough flags. The new queue makes it possible to recycle the
     GV_BIO_DONE flag into GV_BIO_GROW.
  As a consequence, gvinum will now work with any other GEOM class under it or
  on top of it.

- Use bio_pflags for storing internal flags on downgoing BIOs, as the requests
  appear to come from a consumer of a gvinum volume. Use bio_cflags only for
  cloned BIOs.
- Move gv_post_bio to be used internally for maintenance requests.
- Remove some cases where flags where set without need.

PR:		kern/133604
2009-05-06 19:34:32 +00:00
Ulf Lilleengen
41944888fe - Fix a case where a RAID5 volume would think that it is supposed to grow a new
subdisk after a parity rebuild.
2009-05-06 19:18:19 +00:00
Ulf Lilleengen
11c4adc49e - Check if any plexes are doing internal maintenance before removing them. 2009-05-06 19:06:28 +00:00
Ulf Lilleengen
5a0fa8531c - Add forgotten KASSERT. 2009-05-06 18:37:32 +00:00
Ulf Lilleengen
1d8dfc60f4 - Fix a bug where the bio_data field of the wrong BIO is freed if an error
occurs when doing a RAID5 request.
2009-05-06 18:27:28 +00:00
Ulf Lilleengen
451b95f489 - GV_BIO_RETRY is not used, and it is actually impossible with more than 8
values for bio_cflags/bio_pflags.
2009-05-06 18:24:56 +00:00
Ulf Lilleengen
040272465d - Split the queue mutex into one for the event queue and one for the BIO queue,
as they do not really relate and to prepare for an additional queue to be
  covered by the BIO queue mutex.
- Implement wrappers for fetching the next element from the event queue as well
  as for putting a new element into the BIO queue.
2009-05-06 18:21:48 +00:00
Ulf Lilleengen
ad75dd77e0 - Make the gvinum softc invisible to userland, as it is not needed. 2009-05-04 17:30:20 +00:00
Ulf Lilleengen
697ab8be86 - Remove assertion of topology lock remaining from 7.x gvinum. It is not needed,
as the renaming only changes internal gvinum names and will not alter the geom
  topology.
- The topology lock was not held when calling g_wither_geom after renaming.
2009-04-18 16:36:27 +00:00
Marcel Moolenaar
cce94b6583 Precision '*' expects an int and strlen() returns a size_t.
Compensate.
2009-04-16 05:52:47 +00:00
Marcel Moolenaar
6ad9a99f21 Add a compat option to the EBR scheme that controls the
naming of the partitions (GEOM_PART_EBR_COMPAT).  When
compatibility is enabled, changes to the partitioning are
disallowed.

Remove the device name aliasing added previously to provide
backward compatibility, but which in practice doesn't give
us anything.

Enable compatibility on amd64 and i386.
2009-04-15 22:38:22 +00:00
Ulf Lilleengen
1de45ea74d - Move out allocation part of different gvinum objects into its own routine and
make use of it in the gvinum userland code.
2009-04-10 08:50:14 +00:00
Andrew Thompson
853a10a581 Revert r190676,190677
The geom and CAM changes for root_hold are the wrong solution for USB design
quirks.

Requested by:	scottl
2009-04-10 04:08:34 +00:00
Marcel Moolenaar
94fe30d0ca Don't use hexadecimal in the EBR partition names, because 'a'..'f'
are more commonly known as BSD partition names.

Discussed with: ivoras@
2009-04-08 16:18:16 +00:00
Andrew Thompson
31da42bab2 Add interleaving root hold tokens from the CAM probe to disk_create and geom
provider tasting. This is needed for disk attachments that happen after threads
are running in the boot process.

Tested by:	rnoland
2009-04-03 19:49:33 +00:00
Andrew Thompson
626fc9fe3d Add a how argument to root_mount_hold() so it can be passed NOWAIT and be called
in situations where sleeping isnt allowed.
2009-04-03 19:46:12 +00:00
Marcel Moolenaar
daca55549f The 9 bytes immediately prior to the partition table can contain
signatures or disk serial numbers. Don't assume those to be zero
in all cases. This fixes a false negative.

Tested by: avatar@mmlab.cse.yzu.edu.tw
2009-04-03 05:54:49 +00:00
Marcel Moolenaar
c146965cd1 Sharpen the saw:
o  PC98 uses 32-bit block numbers. Limit the scheme to 2^32-1
   blocks when the media is larger. The 32-bit block numbers
   are implicit (16-bit cylinder * 8-bit head * 8-bit sector).
2009-03-30 01:03:58 +00:00
Marcel Moolenaar
6154e492ec Sharpen the saw:
o  MBR uses 32-bit block numbers. Limit the scheme to 2^32-1
   blocks when the media is larger.
2009-03-30 00:53:46 +00:00
Marcel Moolenaar
f5f875ed84 Sharpen the saw:
o  EBR uses 32-bit block numbers. Limit the scheme to 2^32-1
   blocks when the media is larger.
o  Calculate the number of entries based on the rounded media
   size, rather than the raw media size.
2009-03-30 00:48:42 +00:00
Marcel Moolenaar
2a1c00ff2f Sharpen the saw:
o  Don't create a GPT scheme underneath another scheme when
   the probe doesn't allow it.
2009-03-30 00:33:43 +00:00
Ulf Lilleengen
bd9337ce80 - Add files that should have been added in r190507. 2009-03-28 21:06:59 +00:00
Ulf Lilleengen
c0b9797aa8 Import the gvinum work that have been done during and after Summer of Code 2007.
The work have been under testing and fixing since then, and it is mature enough
to be put into HEAD for further testing.

A lot have changed in this time, and here are the most important:
- Gvinum now uses one single workerthread instead of one thread for each
  volume and each plex. The reason for this is that the previous scheme was
  very complex, and was the cause of many of the bugs discovered in gvinum.
  Instead, gvinum now uses one worker thread with an event queue, quite
  similar to what used in gmirror.
- The rebuild/grow/initialize/parity check routines no longer runs in
  separate threads, but are run as regular I/O requests with special flags.
  This made it easier to support mounted growing and parity rebuild.
- Support for growing striped and raid5-plexes, meaning that one can extend the
  volumes for these plex types in addition to the concat type. Also works while
  the volume is mounted.
- Implementation of many of the missing commands from the old vinum:
  attach/detach, start (was partially implemented), stop (was partially
  implemented), concat, mirror, stripe, raid5 (shortcuts for creating volumes
  with one plex of these organizations).
- The parity check and rebuild no longer goes between userland/kernel, meaning
  that the gvinum command will not stay and wait forever for the rebuild to
  finish. You can instead watch the status with the list command.
- Many problems with gvinum have been reported since 5.x, and some has been hard
  to fix due to the complicated architecture. Hopefully, it should be more
  stable and better handle edge cases that previously made gvinum crash.
- Failed drives no longer disappears entirely, but now leave behind a dummy
  drive that makes sure the original state is not forgotten in case the system
  is rebooted between drive failures/swaps.
- Update manpage to reflect new commands and extend it with some examples.

Sponsored by:   Google Summer of Code 2007
Mentored by:    le
Tested by:      Rick C. Petty <rick-freebsd2008 -at- kiwi-computer.com>
2009-03-28 17:20:08 +00:00
Marcel Moolenaar
ee94c7ef01 Sharpen the saw:
o  BSD uses 32-bit block numbers. Limit the scheme to 2^32-1
   blocks when the media is larger.
2009-03-27 05:48:42 +00:00
Marcel Moolenaar
d01a198be7 Sharpen the saw:
o  Don't create an APM scheme underneath another scheme when
   the probe doesn't allow it.
o  APM uses 32-bit block numbers. Limit the scheme to 2^32-1
   blocks when the media is larger.
2009-03-27 05:35:12 +00:00
Marcel Moolenaar
f3548c023e Change the priority from high to normal. This makes sure that
the BSD or GPT schemes can take precedence as appropriate.
2009-03-26 16:42:24 +00:00
Ivan Voras
f7b16839ba Create GEOM labels from UFS IDs, e.g. /dev/ufsid/49c97b1faa2adc43. UFS IDs
are always present and can be used to identify file systems (useful if
hardware devices move often).

Actually-by:	pjd
Approved by:	gnn (mentor)
2009-03-25 20:38:57 +00:00
Ivan Voras
15c48b9a20 Be more explicit and complain if kernel dumps are perfomed on unsupported
partition types. This is to help users used to the old behaviour.

Reviewed by:	marcel
Approved by:	gnn (mentor)
2009-03-22 00:29:48 +00:00
Ivan Voras
94dd506d54 Make GEOM provider names starting with "/dev/" acceptable as well as their
"raw" names. While there, change the formatting of extended MSDOS partitions
so that the dot (".") is not used to separate two numbers (which kind of
looks like the whole is a decimal number). Use "+" instead, which also
hints that the second part of the name is the offset from the start of
the partition in the first part of the name. Also change the offset from
decimal to hexadecimal notation, simply for aesthetic reasons and future
compatibility.

GEOM_PART is the default in 8-CURRENT but not yet in 7-STABLE so this
changeset can be MFC-ed without causing major problems from the second
part.

Reviewed by:	marcel
Approved by:	gnn (mentor)
MFC after:	2 weeks
2009-03-19 14:23:17 +00:00
Pawel Jakub Dawidek
c5d387d010 Detach GELI providers on shutdown/reboot, which will allow providers underneath
to close properly.

Reported, reviewed and tested by:	guido
MFC after:	1 week
2009-03-16 19:31:08 +00:00
Guido van Rooij
921eec2694 Backout this commit whil a better solution is developed 2009-03-13 08:13:51 +00:00
Yoshihiro Takahashi
8753b93d95 Move the PC98_[MS]ID_* defines from g_part_pc98.c to diskpc98.h.
Reviewed by:	marcel
2009-03-11 13:15:42 +00:00
Sam Leffler
664c0b48bf o disallow write to RedBoot and FIS directory partitions; these are painful
to resurrect (maybe honor foot shooting bit in kern.geom_debugflags)
o fix match macro so we now recognize we want to merge FIS dir with RedBoot
  config parameters even if we don't actually do it
2009-03-11 01:12:52 +00:00
Guido van Rooij
c5f79858ff When attaching a geli on boot make sure that it is detached
upon last close. (needed for a gmirror to properly shutdown
upon reboot when a geli is on top the gmirror)
2009-03-10 15:23:43 +00:00
Yoshihiro Takahashi
9541ada9a0 Restore the return statement. It was accidentally removed by rev 188429. 2009-03-10 11:14:03 +00:00
Sam Leffler
443f1e7991 add geom_redboot, a geom module that exports RedBoot FIS partitions as named
slices in dev/redboot/*
2009-03-09 23:18:36 +00:00
Marcel Moolenaar
232c8bf888 o When creating the EBR scheme, set the number of entries
properly. Otherwise the minimum of 1 is used and you can
   only insert a single partition/slice and only at sector
   0 (index 1).
o  When adding a partition/slice, recalculate the index after
   the start and size of the partition/slice are adjusted to
   make them a multiple of the track size. Since the precheck
   method sets the index based on the start of the partition
   as provided by the user, we know that we're off by at most
   1 and adjusting the index is safe.
2009-02-21 19:25:13 +00:00
Marcel Moolenaar
4dedfc44e7 Add bootcode handling. 2009-02-21 07:01:21 +00:00
Marcel Moolenaar
59c532c500 Provide compatibility symlink for logical partitions:
1.  Extend geom_dev by having it create the symlink (i.e. call
    make_dev_alias) based on the DIOCGPROVIDERALIAS ioctl.
    In this way the functionaility is generic and thus usable
    by any geom/provider.
2.  Have g_part handle said ioctl through the devalias method,
    so that it's under control of the scheme itself. By design
    the alias will not be created for newly added partitions.
2009-02-20 04:48:40 +00:00
Marcel Moolenaar
507a0d4a6c Fix an infinite loop created when the last logical partition is
removed.
2009-02-20 04:10:31 +00:00
Marcel Moolenaar
09a278e14d Add a default implementation for pre-check. It should
always succeed if not implemented.

Pointy hat: marcel
2009-02-17 18:24:58 +00:00
Marcel Moolenaar
832cdc2ca7 Remove gpt_offset and related code. It was introduced for use
by the BSD scheme, ended up not to be needed. Remove to avoid
abuse and to keep the bloat to a minimum.
2009-02-17 04:12:10 +00:00
Marcel Moolenaar
e24c8a3fb8 Add support to add, delete and modify logical partitions, as well
as to create and destroy the extended partitioning scheme. In
other words: full support.
2009-02-16 03:54:28 +00:00
Marcel Moolenaar
7ca4fa83ec Add method precheck to the g_part interface. The precheck
method allows schemes to reject the ctl request, pre-check
the parameters and/or modify/set parameters. There are 2
use cases that triggered the addition:
1.  When implementing a R/O scheme, deletes will still
    happen to the in-memory representation. The scheme is
    not involved in that operation. The pre-check method
    can be used to fail the delete up-front. Without this
    the write to disk will typically fail, but at that
    time the delete already happened.
2.  The EBR scheme uses a linked list to record slices.
    There's no index. The EBR scheme defines the index
    as a function of the start LBA of the partition. The
    add verb picks an index for the range and then invokes
    the add method of the scheme to fill in the blanks. It
    is too late for the add method to change the index.
    The pre-check is used to set the index up-front. This
    also (silently) overrides/nullifies any (pointless)
    user-specified index value.
2009-02-15 22:18:16 +00:00
Ulf Lilleengen
af2c6a1332 - Use the correct argument when determining the buffer size.
PR:		kern/131575
MFC after:	2 days
2009-02-11 18:13:20 +00:00
Warner Losh
51f53a08e0 Fix g_part_dumpconf and g_part_name prototpyes.
Submitted by:	marcel@
2009-02-10 02:43:07 +00:00
Marcel Moolenaar
5d68db5bc8 Add the EBR scheme. The EBR scheme supports the Extended Boot Records
found inside extended partitions and used to create logical partitions.
At this time write/modify support is not (yet) present.
The EBR and MBR schemes both check the parent scheme. The MBR will
back-off when nested under another MBR, whereas the EBR only nests
under a MBR.
2009-02-08 23:51:44 +00:00
Marcel Moolenaar
665557a4ec Allow gpe_offset to be set by the scheme. When gpe_offset is zero,
or invalid, initialize it to the start of the partition. Adjust
the mediasize when the offset lies somewhere inside the partition.
2009-02-08 23:39:30 +00:00
Marcel Moolenaar
165651a553 o Add the "PART::scheme" attribute that returns the name of the
underlying partitioning scheme.
o  Put the start and end of the partition in the XML configuration.
   The start and end are the LBAs of the first and last sector
   (resp.) of the partition. They are currently identical to the
   offset and size attributes, which describe the partition as an
   offset and size in bytes, but may not in the future. The start
   and end will be used for the logical partition boundaries and
   may include metadata. The offset and size will always represent
   the useful storage space within the partition. Typically these
   two notions are the same, but for logical partitions in an
   extended partition, the EBR is more naturally treated as being
   part of the partition.
2009-02-08 20:15:08 +00:00
Warner Losh
f4fddf53c7 Fix g_part_*dumpconf to return void to match kobj definition.
Fix g_part_*name to return a const char * rather than a char *.
2009-02-08 07:05:23 +00:00
Marcel Moolenaar
da3ee90988 In g_handleattr(), set bp->bio_completed also for the case
where len is 0. Otherwise g_getattr() will never succeed
when it is handled by g_handleattr_str().
2009-02-03 07:07:13 +00:00
Marcel Moolenaar
709a626613 Constify val in g_handleattr() and str in g_handleattr_str().
This allows passing string constants to g_handleattr_str().
2009-02-01 01:50:09 +00:00
Ed Schouten
739b705c7e Remove unused unrhdr from GEOM character device module.
Now that make_dev() doesn't require unit numbers to be unique, there is
no need to use an unrhdr here to generate the numbers. Remove the entire
init-routine, because it is optional.
2009-01-24 18:23:19 +00:00
Edward Tomasz Napierala
38153e80f7 Prevent a panic that happens on SMP machines when removing a disk with
many writes queued up.

Reviewed by:	phk, scottl
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2009-01-11 13:51:04 +00:00
Marius Strobl
c825397790 - Don't enforce an upper-bound to the number of sectors or heads,
allowing the full 16-bit width of the corresponding fields in the
  VTOC8 label to be used. The removed limits basically only held
  true for providers labeled using the synthetic geometry provided
  by cam_calc_geometry(9) but neither SCSI disks labeled with Solaris
  nor sufficiently large ATA disks.
- Given that providers (originally) labeled with Solaris typically
  use the native geometry as reported by the target while FreeBSD
  typically uses a synthetic one put the message complaining about
  mismatching geometries between what the label indicates and what
  GEOM thinks the provider has, which we generally can't help,
  under bootverbose in order to not unnecessarily scare users.
- For informational purposes add the non-matching values to the
  message complaining about them, similar to what r186501 did for
  g_part_bsd_read() except also indicating the origin of the
  values.
- Make it clear that the messages emitted by this code refer to
  the VTOC8 support rather than to another existing scheme or to
  VTOC32.
2009-01-06 14:10:10 +00:00
Marcel Moolenaar
3d556594df Don't enforce an upper-bound to the number of sectors or heads
that that the provider has. The limits we imposed were PC BIOS
specific and not always applicable.
2009-01-06 06:47:53 +00:00
Marcel Moolenaar
90886ebcc2 Improve probing.
o   Don't check the dummy fields.
o   The entry is unused if either dp_mid is 0 or dp_sid is 0.
o   The start or end cylinder cannot be 0.
o   The start CHS cannot be equal to the end CHS.

Submitted by:	nyan
2009-01-04 07:32:06 +00:00
Ulf Lilleengen
2155338ac1 - Fix an issue with access permissions to underlying disks used by a gvinum
plex. If the plex is a raid5 plex, and is being written to, parity data might
  have to be read from the underlying disks, requiring them to be opened for
  reading as well as writing.

MFC after:	1 week
2008-12-27 14:32:39 +00:00
David E. O'Brien
62a353c0bd When the geometry does not match the label, print out the values. 2008-12-26 20:27:32 +00:00
Edward Tomasz Napierala
ce8be7b8b0 Implement g_vfs_orphan(). Without it, the filesystem never closes
the device, which means refcount on periph drivers never drops,
which means cam_sim_free() never returns, which results in umass
sleeping there ad infinitum.

Submitted by:	pjd
Reviewed by:	scottl, pjd
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2008-12-16 17:04:52 +00:00
Ulf Lilleengen
fa13e9bb0b - Add missing word in comment. 2008-12-08 17:09:02 +00:00
Edward Tomasz Napierala
d27a975f72 Make it possible to use gjournal for the root filesystem. Previously,
an unclean shutdown would make it impossible to mount rootfs at boot.

PR:		kern/128529
Reviewed by:	pjd
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2008-12-06 11:33:10 +00:00
Ivan Voras
499c86ebd5 Trivial patch to show on which geom has the error been detected.
Submitted by:	Rick C. Petty
Approved by:	gnn (mentor)
MFC after:	1 month
2008-12-01 15:02:00 +00:00
Marcel Moolenaar
6647711279 Allow boot code to be smaller than what the scheme expects.
This effectively changes the boot code size to be an upper
bound and makes the interface more flexible.
2008-12-01 00:07:17 +00:00
Marcel Moolenaar
95fc269897 Allow dumpon to a partition of type FS_UNUSED as well. 2008-11-26 05:18:27 +00:00
Ulf Lilleengen
251048a1ab - Fix a potential NULL pointer reference. Note that this should not happen in
practice, but it is a good programming practice and allows the kernel to not
  depend on userland correctness.
- While there, make sizeof usage match the rest of the code.

Found with:	Coverity Prevent(tm)
CID:		660, 662
2008-11-25 20:28:33 +00:00
Ulf Lilleengen
7e11b694f4 - Fix a potential NULL pointer reference. Note that this cannot happen in
practice, but it is a good programming practice nontheless and it allows the
  kernel to not depend on userland correctness.

Found with:   Coverity Prevent(tm)
CID:          655-659, 664-667
2008-11-25 19:13:58 +00:00
Marcel Moolenaar
1a696559de Partition type FS_UNUSED does not mean the partition entry
is unused. Unused partition entries have a partition size
of zero. Therefore, partitions can have type FS_UNUSED.

MFC after:	3 days
2008-11-18 05:55:58 +00:00
Marcel Moolenaar
dd0db05da2 Fix a panic caused by a corrupted table when the header is
still valid. We were checking the state of the header and
not the table.

PR:		119868
Based on a patch from:	Jaakko Heinonen <jh@saunalahti.fi>
MFC after: 1 week
2008-11-06 16:51:33 +00:00
Attilio Rao
83b3bdbc8a Improve VFS locking:
- Implement real draining for vfs consumers by not relying on the
  mnt_lock and using instead a refcount in order to keep track of lock
  requesters.
- Due to the change above, remove the mnt_lock lockmgr because it is now
  useless.
- Due to the change above, vfs_busy() is no more linked to a lockmgr.
  Change so its KPI by removing the interlock argument and defining 2 new
  flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the
  old version (which was unlinked from the lockmgr alredy) and
  MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx
  once the mnt interlock is held (ability still desired by most consumers).
- The stub used into vfs_mount_destroy(), that allows to override the
  mnt_ref if running for more than 3 seconds, make it totally useless.
  Remove it as it was thought to work into older versions.
  If a problem of "refcount held never going away" should appear, we will
  need to fix properly instead than trust on such hackish solution.
- Fix a bug where returning (with an error) from dounmount() was still
  leaving the MNTK_MWAIT flag on even if it the waiters were actually
  woken up. Just a place in vfs_mount_destroy() is left because it is
  going to recycle the structure in any case, so it doesn't matter.
- Remove the markercnt refcount as it is useless.

This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and
__FreeBSD_version will be modified accordingly.

Discussed with:	kib
Tested by:	pho
2008-11-02 10:15:42 +00:00
Warner Losh
0952268ecd Add support for reading Tivo Series 1 partitioning. This likely needs
a little refinement, but is good enough to commit as is.

# Should look to see if I should move swab(3) into the kernel or just
# provide the unoptimized routine here.

Reviewed by:	marcel@
2008-11-02 03:02:56 +00:00
Konstantin Belousov
f5dfdb519f Revert r184136. Instead, push the check for crashdumpmap overflow into the
MD i386 and amd64 dump code.

Requested by:	jhb
Retested by:	pho
MFC after:	3 days (+ 176304 + 184136)
2008-10-31 10:11:35 +00:00
Ulf Lilleengen
86b3c6f5bc - Import macros used in gmirror for printing gvinum debug messages and making
the output more standardized.
- Add a sysctl to set the verbosity of the debug messages.
- While there, fixup typos and wording in the messages.
2008-10-26 17:20:37 +00:00
Marcel Moolenaar
38a2db2eb0 Invalid BSD disklabels have been created by sysinstall and
are possibly still being created. The d_secperunit field
contains the number of sectors of the disk and not of the
slice/partition to which the disklabel applies.
Rather than reject the disklabel, we now silently adjust
the field. Existing code, like bslabel(8), does not seem
to check the label that extensively and seems to adjust
fields as a side-effect as well.
In other words, it's not that important apparently, so
gpart should not be too strict about it.

Reported by: nyan@
Reported by: Andriy Gapon <avg@icyb.net.ua>
2008-10-25 17:21:46 +00:00
Marcel Moolenaar
66fedfca7b Allow dumps to partitions with a tag of 0. The legacy
sunlabel implementation in FreeBSD does not use VTOC
information and as such as no partition types.
2008-10-22 02:08:54 +00:00
Konstantin Belousov
7a882637fe Do not overflow crashdumpmap.
Reported and tested by:	pho
Reviewed by:	jhb
MFC after:	1 week
2008-10-21 18:52:38 +00:00
Marcel Moolenaar
e1d7111a53 The active and bootable flags are not part of the type.
Export the active and bootable flags as attributes in
the configuration XML and allow them to be manipulated
with the set/unset commands.

Since libdisk treats the flags as part of the partition
type, preserve behavior by keeping them included in the
configuration text.
2008-10-20 04:50:47 +00:00
Attilio Rao
0d7935fd01 Remove the struct thread unuseful argument from bufobj interface.
In particular following functions KPI results modified:
- bufobj_invalbuf()
- bufsync()

and BO_SYNC() "virtual method" of the buffer objects set.
Main consumers of bufobj functions are affected by this change too and,
in particular, functions which changed their KPI are:
- vinvalbuf()
- g_vfs_close()

Due to the KPI breakage, __FreeBSD_version will be bumped in a later
commit.

As a side note, please consider just temporary the 'curthread' argument
passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP

Reviewed by:	kib
Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-10-10 21:23:50 +00:00
Ulf Lilleengen
64d62b289e - Use the new gv_write_header function to write out the header when removing a
drive to make sure that the header is in the correct format.
2008-10-02 10:01:05 +00:00
Ulf Lilleengen
d4b28d5b0f - Remove unneeded macro since the config_length field in the header was changed
to 64 bit in the new format.
2008-10-02 09:35:47 +00:00
Ulf Lilleengen
46ceb66ad3 - Make gvinum header on-disk structure consistent on all platforms by storing
the gvinum header in fields of fixed size and in a big endian byte order
  rather than the size and byte order of the actual platform.

Note that the change is backwards compatible with the old gvinum configuration
format, but will save the configuration in the new format when the 'saveconfig'
command is executed.

Submitted by:	Rick C. Petty <rick-freebsd -at- kiwi-computer.com>
2008-10-01 14:50:36 +00:00
Marcel Moolenaar
a87faebb8c Return G_PART_PROBE_PRI_HIGH instead of G_PART_PROBE_PRI_NORM
if the probe succeeds. This guarantees that the BSD scheme
wins over the MBR scheme when MBR gets to probe first. Build-
or link-time conditions can cause schemes to end up in the
linker set in a different order. Normally BSD is before MBR
in the linker set and as such get to probe first. But typically
when the kernel gets rebuild or relinked, this can change.
2008-09-29 02:48:22 +00:00
Marcel Moolenaar
1e2fbcfa44 Insert the null scheme at the head. This does not change any
functionality, but creates an invariant: the first element
on the list is always the null scheme.
2008-09-29 02:39:02 +00:00
Marcel Moolenaar
4bf0267894 Export the partition name in the conftxt and confxml output.
The conftxt output is used by libdisk, and the confxml
output is used by gpart itself (gpart show -l).

Submitted by:	nyan@
2008-09-27 19:58:11 +00:00
Marcel Moolenaar
a455de093b Hold the root mount while we're tasting. It is possible
that a nested partition (typically the BSD disklabel)
is not done tasting while the root file system is being
mounted. While this is rare, it's still possible.
2008-09-27 19:29:52 +00:00
Marcel Moolenaar
404cfb5e20 Allow 255 sectors/track for the BSD disklabel. The previous limit
of 63 sectors/track is too PC BIOS specific. On pc98, where the
BSD disklabel is used as well, 255 sectors/track is not uncommon.

Submitted by:	nyan@
2008-09-27 15:28:15 +00:00
Ed Schouten
d3ce832719 Remove unit2minor() use from kernel code.
When I changed kern_conf.c three months ago I made device unit numbers
equal to (unneeded) device minor numbers. We used to require
bitshifting, because there were eight bits in the middle that were
reserved for a device major number. Not very long after I turned
dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
The unit2minor() and minor2unit() macro's were no-ops.

We'd better not remove these four macro's from the kernel, because there
is a lot of (external) code that may still depend on them. For now it's
harmless to remove all invocations of unit2minor() and minor2unit().

Reviewed by:	kib
2008-09-26 14:19:52 +00:00
Sean Bruno
c4901b6798 Just a fixup for a KTRACE message I stumbled upon many moons ago.
Reviewed by:	Scott Long
MFC after:	2 days
2008-09-18 15:02:19 +00:00
Ulf Lilleengen
f805f204b6 - Add a new ioctl for getting the provider name of a geom provider.
- Add a routine for looking up a device and checking if it is a valid geom
  provider given a partial or full path to its device node.

Reviewed by:	phk
Approved by:	pjd (mentor)
2008-09-07 13:54:57 +00:00
Rui Paulo
89ea496520 Fix build. 2008-09-05 18:11:18 +00:00
Rui Paulo
87662ab391 Keep entries sorted. 2008-09-05 18:09:49 +00:00
Rui Paulo
35fa2f1bcd Include the vendor in the partition name. 2008-09-05 16:54:07 +00:00
Rui Paulo
d7255ff42e Detect Apple HFS GPT slices. 2008-09-05 12:49:14 +00:00
Attilio Rao
59d4932531 Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.
Manpages are updated accordingly.

Tested by:	Diego Sardina <siarodx at gmail dot com>
2008-08-31 14:26:08 +00:00
Bjoern A. Zeeb
603724d3ab Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
Pawel Jakub Dawidek
ed6c3e478f Style(9). 2008-08-12 20:19:08 +00:00
Dag-Erling Smørgrav
2616144e43 Add sbuf_new_auto as a shortcut for the very common case of creating a
completely dynamic sbuf.

Obtained from:	Varnish
MFC after:	2 weeks
2008-08-09 11:14:05 +00:00
Peter Wemm
fbbc785240 Trivial commit to attempt to diagnose a svn problem. Add
comment that Tivo disks are APM, but do not have a DDR record.
2008-07-22 18:05:50 +00:00
Pawel Jakub Dawidek
5527ecd9a5 Clear passphrase buffer after use.
Submitted by:	Fabian Keil <fk@fabiankeil.de> (a bit different version)
2008-07-20 19:56:13 +00:00
Ulf Lilleengen
14e96b45e8 - When renaming a drive, also set the drive name in the gvinum header.
PR:		kern/125632
Approved by:	pjd (mentor)
MFC after:	3 days
2008-07-19 13:53:11 +00:00
Ulf Lilleengen
56af4c6141 - Fix a logic error when updating plex configuration.
Approved by:	pjd (mentor)
2008-07-11 16:46:29 +00:00
Robert Watson
4f7d1876d5 Introduce a new lock, hostname_mtx, and use it to synchronize access
to global hostname and domainname variables.  Where necessary, copy
to or from a stack-local buffer before performing copyin() or
copyout().  A few uses, such as in cd9660 and daemon_saver, remain
under-synchronized and will require further updates.

Correct a bug in which a failed copyin() of domainname would leave
domainname potentially corrupted.

MFC after:	3 weeks
2008-07-05 13:10:10 +00:00
Xin LI
6c97c325ff Avoid NULL deference.
Reviewed by:	ivoras
2008-06-30 15:21:42 +00:00
Ulf Lilleengen
7e7a4e1d18 - Fix spelling errors.
Approved by:    kib (mentor)
PR:             kern/124788
Submitted by:   Hywel Mallett <Hywel -at- hmallett.co.uk>
2008-06-20 19:48:18 +00:00
Marcel Moolenaar
f6aa3fccce Add the set and unset verbs used to set and clear attributes for
partition entries. Implement the setunset method for the MBR
scheme to control the active flag.
2008-06-18 01:13:34 +00:00
Marcel Moolenaar
d3532631de Finish the support for partition labels and add it to the XML. 2008-06-12 19:34:07 +00:00
Marcel Moolenaar
9a764aac3f Add the raw partition type to the XML. 2008-06-12 06:34:14 +00:00
Marcel Moolenaar
eab484f822 Add the raw partition type to the XML. 2008-06-12 06:26:36 +00:00
Marcel Moolenaar
a3354bb4a7 Add the raw partition type to the XML. 2008-06-12 05:56:03 +00:00
Marcel Moolenaar
0c132595dd Add the raw partiton type to the XML. 2008-06-12 05:28:47 +00:00
Marcel Moolenaar
40b075d366 Add the raw partition type to the XML. 2008-06-12 05:27:23 +00:00
Marcel Moolenaar
ab1e8f04c8 Add the partition label and the raw partition type to the XML. 2008-06-12 04:43:34 +00:00
Ed Schouten
06d425f92e Remove the distinction between device minor and unit numbers.
Even though we got rid of device major numbers some time ago, device
drivers still need to provide unique device minor numbers to make_dev().
These numbers are only used inside the kernel. They are not related to
device major and minor numbers which are visible in devfs. These are
actually based on the inode number of the device.

It would eventually be nice to remove minor numbers entirely, but we
don't want to be too agressive here.

Because the 8-15 bits of the device number field (si_drv0) are still
reserved for the major number, there is no 1:1 mapping of the device
minor and unit numbers. Because this is now unused, remove the
restrictions on these numbers.

The MAXMAJOR definition was actually used for two purposes. It was used
to convert both the userspace and kernelspace device numbers to their
major/minor pair, which is why it is now named UMINORMASK.

minor2unit() and unit2minor() have now become useless. Both minor() and
dev2unit() now serve the same purpose. We should eventually remove some
of them, at least turning them into macro's. If devfs would become
completely minor number unaware, we could consider using si_drv0 directly,
just like si_drv1 and si_drv2.

Approved by:	philip (mentor)
2008-05-29 12:50:46 +00:00