Commit Graph

1857 Commits

Author SHA1 Message Date
Edward Tomasz Napierala
16fac6c92a Make it possible to submit FLUSH bios through geom_dev strategy. This
is required for CTL to work with device-backed LUNs.

Reviewed by:	mav
2013-04-06 10:32:06 +00:00
Alexander Motin
0fb832fdf0 Following r241022, replace iteration over the provider list on media events
by taking first one and asserting that there is no others.

MFC after:	1 week
2013-04-05 13:11:28 +00:00
Alexander Motin
7868ec506b geom_slice.c and its consumers like GEOM_LABEL are not touching the data
unless hotspots are used.  Pass G_PF_ACCEPT_UNMAPPED flag through except
such rare cases (obsolete GEOM_SUNLABEL and GEOM_BSD).
2013-03-26 07:55:24 +00:00
Alexander Motin
6c6e13b6e1 GEOM NOP does not touch the data, so pass G_PF_ACCEPT_UNMAPPED flag through. 2013-03-26 05:58:49 +00:00
Alexander Motin
a93c0ed463 Remove extra bio_data and bio_length copying to child request after calling
g_clone_bio(), that already copied them.
2013-03-26 05:42:12 +00:00
Alexander Kabaev
31932fae1e Do not pass unmapped buffers to drivers that cannot handle them
In physio, check if device can handle unmapped IO and pass an
appropriately mapped buffer to the driver strategy routine. The
only driver in the tree that can handle unmapped buffers is one
exposed by GEOM, so mark it as such with the new flag in the
driver cdevsw structure.

This fixes insta-panics on hosts, running dconschat, as /dev/fwmem
is an example of the driver that makes use of physio routine, but
bypasses the g_down thread, where the buffer gets mapped normally.

Discussed with: kib (earlier version)
2013-03-26 01:17:06 +00:00
Alexander Motin
f4673017b3 Make GEOM MULTIPATH to report unmapped bio support if underling path report
it.  GEOM MULTIPATH itself never touches the data and so transparent.
2013-03-25 07:24:58 +00:00
Alexander Motin
30ba747160 In GEOM DISK:
- Replace single done mutex with per-disk ones.  On system with several
disks on several HBAs that removes small, but measurable lock congestion.
 - Modify disk destruction process to not destroy the mutex prematurely.
 - Remove some extra pointer derefences.
2013-03-25 05:45:24 +00:00
Alexander Motin
3c330aff3f Fix long known deadlock between geom dev destruction and d_close() call.
Use destroy_dev_sched_cb() to not wait for device destruction while holding
GEOM topology lock (that actually caused deadlock).  Use request counting
protected by mutex to properly wait for outstanding requests completion in
cases of device closing and geom destruction.  Unlike r227009, this code
does not block taskqueue thread for indefinite time, waiting for completion.
2013-03-24 10:14:25 +00:00
Alexander Motin
50199fa0d0 Make g_wither_washer() to not loop by itself, but only when there was some
more topology change done that may require its attention.  Add few missing
g_do_wither() calls in respective places to signal it.

This fixes potential infinite loop here when some provider is withered, but
still opened or connected for some reason and so can not be destroyed.  For
example, see r227009 and r227510.
2013-03-24 03:15:20 +00:00
Konstantin Belousov
e808788c05 Correct the page count when excess length is trimmed from the bio.
Reported and tested by:	Ivan Klymenko <fidaj@ukr.net
2013-03-21 22:36:43 +00:00
Konstantin Belousov
6c83fce371 Assert that transient mapping of the bio is only done when unmapped
buffers are allowed.

Sponsored by:	The FreeBSD Foundation
2013-03-21 07:26:33 +00:00
Konstantin Belousov
db7bfaa8ce The geom_part provider supports unmapped bio iff the underlying
provider does so, since geom_part never inspects the bio_data.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:50:24 +00:00
Konstantin Belousov
f8c19ba466 A flag for the geom disk driver to indicate that it accepts the
unmapped i/o requests.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:49:15 +00:00
Konstantin Belousov
ee75e7de7b Implement the concept of the unmapped VMIO buffers, i.e. buffers which
do not map the b_pages pages into buffer_map KVA.  The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.

The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer.  For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.

When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.

Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer.  The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.

The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings.  Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.

Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags.  Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.

In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.

By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.

Sponsored by:	The FreeBSD Foundation
Discussed with:	jeff (previous version)
Tested by:	pho, scottl (previous version), jhb, bf
MFC after:	2 weeks
2013-03-19 14:13:12 +00:00
Pawel Jakub Dawidek
c4d2d401f8 We don't need buffer to handle BIO_DELETE, so don't check buffer size for it.
This fixes handling BIO_DELETE larger than MAXPHYS.
2013-03-14 23:07:01 +00:00
Sean Bruno
bd9fba0cfe Add legacy support to geom raid to create a /dev/arX device for support
of upgrading older machines using ataraid(4) to newer releases.

This optional parameter is controlled via kern.geom.raid.legacy_aliases
and will create a /dev/ar0 device that will point at /dev/raid/r0 for
example.

Tested on Dell SC 1425 DDF-1 format software raid controllers installing from
stable/7 and upgrading to stable/9 without having to adjust /etc/fstab

Reviewed by:	mav
Obtained from:	Yahoo!
MFC after:	2 Weeks
2013-03-08 20:07:32 +00:00
Jean-Sébastien Pédron
f5c1ef84f9 g_label_ntfs_taste: Abort taste is recsize == 0
This will avoid a 0-byte read (in g_read_data()) leading to a panic, if
previously read data are erroneous.

Suggested by:	John-Mark Gurney <jmg@funkthat.com>
2013-03-08 18:07:43 +00:00
Gavin Atkinson
10f29053d2 Support the FAT16 partition type in gpart(8)
PR:		kern/174714
Submitted by:	4721 at hushmail dot com
MFC after:	1 week
2013-03-07 22:32:41 +00:00
Alexander Motin
34d3281c57 Fix panic when Secondary_Element_Count == 1 and Secondary_Element_Seq
is not set (255).

Reported by:	sbruno
MFC after:	1 week
2013-03-07 18:55:37 +00:00
Jean-Sébastien Pédron
5943eed4b9 g_label_ntfs.c: Mark structures as __packed
Without this, read data is mis-interpreted. This could trigger a panic,
as was the case on one computer where computed "recsize" was zero,
leading to a call to g_read_page() asking for 0 bytes.
2013-03-05 11:02:05 +00:00
Attilio Rao
0f90e981cb Remove ntfs headers dependency for g_label_ntfs.c by redefining the
used structs and values.

This patch is not targeted for MFC.
2013-03-02 18:23:59 +00:00
Kirk McKusick
2bc1a1fe5c Add barrier write capability to the VFS buffer interface. A barrier
write is a disk write request that tells the disk that the buffer
being written must be committed to the media along with any writes
that preceeded it before any future blocks may be written to the drive.

Barrier writes are provided by adding the functions bbarrierwrite
(bwrite with barrier) and babarrierwrite (bawrite with barrier).

Following a bbarrierwrite the client knows that the requested buffer
is on the media. It does not ensure that buffers written before that
buffer are on the media. It only ensure that buffers written before
that buffer will get to the media before any buffers written after
that buffer. A flush command must be sent to the disk to ensure that
all earlier written buffers are on the media.

Reviewed by: kib
Tested by:   Peter Holm
2013-02-16 14:51:30 +00:00
Andriy Gapon
1f1088b843 g_mirror: g_getattr() failure should not be fatal
This allows to use gmirror e.g. on top of ZVOLs.

PR:		kern/175323
Submitted by:	Alexei.Volkov@softlynx.ru, mav
Reported by:	Alexei.Volkov@softlynx.ru
Tested by:	Alexei.Volkov@softlynx.ru
Reviewed by:	ae, mav, pjd
MFC after:	1 week
2013-01-26 10:50:04 +00:00
Alexander Motin
c3ec009a97 - Fix rebuild position broken at r245522.
- Identify one more metadata field.
2013-01-17 03:27:08 +00:00
Alexander Motin
821a0f639e For Promise/AMD metadata add support for disks with capacity above 2TiB
and for volumes with sector size above 512 bytes.
2013-01-17 00:50:25 +00:00
Alexander Motin
ed8180e665 Recalculate volume size only for real CONCATs. For SINGLE trust volume
size given by metadata, as it should be correct and in some cases can be
smaller then subdisk size.
2013-01-17 00:09:50 +00:00
Alexander Motin
2c6a273750 Allow to insert new component to geom_raid3 without specifying number.
PR:		kern/160562
MFC after:	2 weeks
2013-01-15 10:06:35 +00:00
Alexander Motin
f62c1a47d6 Alike to r242314 for GRAID make GRAID3 more aggressive in marking volumes
as clean on shutdown and move that action from shutdown_pre_sync stage to
shutdown_post_sync to avoid extra flapping.

ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID
to shutdown gracefully.  To handle that, mark volume as clean just when
shutdown time comes and there are no active writes.

MFC after:	2 weeks
2013-01-15 01:27:04 +00:00
Alexander Motin
cbab616174 Alike to r242314 for GRAID make GMIRROR more aggressive in marking volumes
as clean on shutdown and move that action from shutdown_pre_sync stage to
shutdown_post_sync to avoid extra flapping.

ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID
to shutdown gracefully.  To handle that, mark volume as clean just when
shutdown time comes and there are no active writes.

PR:		kern/113957
MFC after:	2 weeks
2013-01-15 01:13:55 +00:00
Alexander Motin
4c10c25e33 Keep value of orig_config_id metadata field. Windows driver writes there
previous value of config_id when it is changed in some cases.  I guess it
may be used do avoid some split-brain conditions.
2013-01-14 20:31:45 +00:00
Alexander Motin
eb84fc957c Small cosmetic tuning of the IRRT status constants. 2013-01-14 16:38:43 +00:00
Alexander Motin
511c69d9ce Print some more metadata fields. 2013-01-14 13:06:35 +00:00
Alexander Motin
898a4b74f4 Windows driver writes relative volume IDs to metadata field. Use that value
as a hint for raid/rX device number to make it persistent across reboots.
2013-01-14 00:38:51 +00:00
Alexander Motin
f9462b9bbe - Add checks for Intel metadata version and attributes. Ignore disks with
unsupported metadata types like Intel Smart Response to not corrupt them.
 - Improve setting of these things during metadata writing to protect from
incapable BIOS'es and other implementations.
2013-01-13 23:00:40 +00:00
Alexander Motin
b99586c25f Improve support for disabled disks. If disabled disk disconnected and then
reconnected back, leave it as disconnected. If new disk inserted instead of
disabled, rebuild it and leave as enabled.
2013-01-13 14:30:37 +00:00
Alexander Motin
865aea63c3 Windows handles INIT and VERIFY as array-wide and it doesn't specify which
disks should be rebuilt. Our rebuild code is same time disk-centric.  To
handle this situation  properly check all disks for RBLD flags, and if no
disk specified try rebuild/resync all of them except newly inserted.
2013-01-12 21:51:49 +00:00
Alexander Motin
4c95a24141 Implement migration from single disk to RAID1/IRRT for Intel metadata.
Windows driver uses such migration when it creates new arrays.  While GEOM
RAID has no mechanism to implement migration in general case, this specifc
case still can be handled easily via degraded RAID1 creation followed by
regular rebuild.
2013-01-12 18:25:48 +00:00
Alexander Motin
26c538bc0b Add basic support for Intel Rapid Recover Technology (Intel RRT).
It is alike to RAID1, but with dedicating master and recovery disks and
providing manual control over synchronization.  It allows to use recovery
disk as snapshot of the master disk from the time of the last sync.

This implementation is not functionaly complete comparing to Windows,
but it is better then silent conversion to RAID1 on first boot.
2013-01-12 09:35:44 +00:00
Konstantin Belousov
ddd6b3fc33 Add flags argument to vfs_write_resume() and remove
vfs_write_resume_flags().

Sponsored by:	The FreeBSD Foundation
2013-01-11 06:08:32 +00:00
Pawel Jakub Dawidek
6011443800 Reset provider-specific fields when resending I/O request in low memory
conditions. This fixes assertion which checks those fields when kernel is
compiled with DIAGNOSTIC.

Reported by:	kib, pho
MFC after:	1 week
2012-12-26 20:07:47 +00:00
Jaakko Heinonen
efec959c2c Mangle label names containing spaces, non-printable characters '%' or
'"'.  Mangling is only done for label names read from file system
metadata. Encoding resembles URL encoding. For example, the space
character becomes %20.

Help by:	kib
Discussed with:	imp, kib, pjd
2012-12-22 13:43:12 +00:00
Jaakko Heinonen
02c62349c9 - Don't pass geom and provider names as format strings.
- Add __printflike() attributes.
- Remove an extra argument for the g_new_geomf() call in swapongeom_ev().

Reviewed by:	pjd
2012-11-20 12:32:18 +00:00
Alfred Perlstein
bad7e7f3dd Provide a device name in the sysctl tree for programs to query the
state of crashdump target devices.

This will be used to add a "-l" (ell) flag to dumpon(8) to list the
currently configured dumpdev.

Reviewed by:	phk
2012-11-01 17:01:05 +00:00
Edward Tomasz Napierala
549f62fa42 Fix problem with geom_label(4) not recognizing UFS labels on filesystems
extended using growfs(8).  The problem here is that geom_label checks if
the filesystem size recorded in UFS superblock is equal to the provider
(i.e. device) size.  This check cannot be removed due to backward
compatibility.  On the other hand, in most cases growfs(8) cannot set
fs_size in the superblock to match the provider size, because, differently
from newfs(8), it cannot recompute cylinder group sizes.

To fix this problem, add another superblock field, fs_providersize, used
only for this purpose.  The geom_label(4) will attach if either fs_size
(filesystem created with newfs(8)) or fs_providersize (filesystem expanded
using growfs(8)) matches the device size.

PR:		kern/165962
Reviewed by:	mckusick
Sponsored by:	FreeBSD Foundation
2012-10-30 21:32:10 +00:00
Alexander Motin
650e245ebf Minor addition to r242323:
Alike to BIO_WRITE, report success if at least one subdisk succeeded with
BIO_DELETE.  But unlike BIO_WRITE don't fail disk on BIO_DELETE error.

Sponsored by:	iXsystems, Inc.
MFC after:	1 month
2012-10-29 21:08:06 +00:00
Alexander Motin
609a74746a Add basic BIO_DELETE support to GEOM RAID class for all RAID levels.
If at least one subdisk in the volume supports it, BIO_DELETE requests
will be propagated down.  Unfortunatelly, for RAID levels with redundancy
unmapped blocks will be mapped back during first rebuild/resync process.

Sponsored by:	iXsystems, Inc.
MFC after:	1 month
2012-10-29 18:04:38 +00:00
Edward Tomasz Napierala
1af2d09b49 Fix locking problem in disk_resize(); previously it would run without
topology lock, resulting in assertion when running with DIAGNOSTIC.

Reviewed by:	mav (earlier version)
2012-10-29 17:52:43 +00:00
Alexander Motin
a479c51be3 Make GEOM RAID more aggressive in marking volumes as clean on shutdown
and move that action from shutdown_pre_sync to shutdown_post_sync stage
to avoid extra flapping.

ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID
to shutdown gracefully.  To handle that, mark volume as clean just when
shutdown time comes and there are no active writes.

MFC after:	2 weeks
2012-10-29 14:18:54 +00:00
Konstantin Belousov
5050aa86cf Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
Attilio Rao
682ee99e7a It seems that it is preferable to keep support for glabel also for
filesystems that we don't support natively.
Revert part of r241636 to do so.

This patch is not targeted for MFC.

Requested by:	gleb, jhb
2012-10-18 22:18:11 +00:00
Attilio Rao
a42ac676f5 Disconnect non-MPSAFE NTFS from the build in preparation for dropping
GIANT from VFS. This code is particulary broken and fragile and other
in-kernel implementations around, found in other operating systems,
don't really seem clean and solid enough to be imported at all.
If someone wants to reconsider in-kernel NTFS implementation for
inclusion again, a fair effort for completely fixing and cleaning it
up is expected.

In the while NTFS regular users can use FUSE interface and ntfs-3g
port to work with their NTFS partitions.

This is not targeted for MFC.
2012-10-17 11:30:00 +00:00
Alexander Motin
c6f0cd57e3 NULL-ify last previously used pointer instead of last possible pointer.
This should be only a cosmetic change.

Found by:	Clang Static Analyzer
2012-10-10 20:41:37 +00:00
Alexander Motin
6871a543f9 Make graid command line a bit more friendly by allowing volume name or
provider name to be specified instead of geom name (first argument in all
subcommands except label).  In most cases there is only one array used
any way, so it is not really useful to make user type ugly geom names like
Intel-f0bdf223 or SiI-732c2b9448cf.  Though they can be used in some cases.

Sponsored by:	iXsystems, Inc.
MFC after:	1 month
2012-10-07 19:30:16 +00:00
Andriy Gapon
a90c9dfeab g_part_taste: directly destroy consumer and geom here, no need for withering
Besides withered but still alive consumers may interfere with
re-tatsing.

MFC after:	16 days
2012-10-06 19:52:50 +00:00
Pawel Jakub Dawidek
5d8a6a1078 Remove the topology lock from disk_gone(), it might be called with regular
mutexes held and the topology lock is an sx lock.

The topology lock was there to protect traversing through the list of providers
of disk's geom, but it seems that disk's geom has always exactly one provider.

Change the code to call g_wither_provider() for this one provider, which is
safe to do without holding the topology lock and assert that there is indeed
only one provider.

Discussed with:	ken
MFC after:	1 week
2012-09-28 08:22:51 +00:00
Pawel Jakub Dawidek
171f6b3a34 Use the topology lock to protect list of providers while withering them.
It is possible that provider is destroyed while we are iterating over the
list.

Reported by:	Brian Parkison <parkison@panzura.com>
Discussed with:	phk
MFC after:	1 week
2012-09-22 12:41:49 +00:00
Andriy Gapon
85f5b9aa70 g_disk_flushcache definitely should not be traced under G_T_TOPOLOGY
... use G_T_BIO instead

MFC after:	1 week
2012-09-18 07:57:34 +00:00
Alexander Motin
c89d2fbe18 Add global and per-module sysctls/tunables to enable/disable metadata taste.
That should help to handle some cases when disk has some RAID metadata that
should be ignored, especially during boot.

MFC after:	3 days
2012-09-13 13:27:09 +00:00
Gleb Smirnoff
4a7f7b10b5 When synchronizing, include in the config dump amount of
bytes syncronized.
  The rationale behind this is the following: for large disks the
percent synchronisation counter ticks too seldom, and monitoring
software (as well as human operator) can't tell whether
synchronisation goes on or one of disks got stuck. On an idle
server one can look into gstat and see whether synchronisation goes
on or not, but on a busy server that won't work. Also, new value
monitored can be differentiated obtaining the synchronisation speed
quite precisely.

Submitted by:	Konstantin Kukushkin <dark ramtel.ru>
Reviewed by:	pjd
2012-09-11 20:20:13 +00:00
Pawel Jakub Dawidek
769afdc71e Allow to pass providers with /dev/ prefix to g_provider_by_name().
MFC after:	3 days
2012-09-01 10:52:19 +00:00
Ed Schouten
24d1105dde Remove unneeded G_PF_CANDELETE flag.
This flag is only used by GEOM so it can be propagated to the character
device's SI_CANDELETE. Unfortunately, SI_CANDELETE seems to do nothing.
2012-08-28 19:28:31 +00:00
Thomas Quinot
8fb378d6b1 (g_multipath_rotate): Fix algorithm so that it does rotate over all good
providers, not just the last two.

PR:		kern/170379
Reviewed by:	mav
MFC after:	2 weeks
2012-08-25 10:36:31 +00:00
Pawel Jakub Dawidek
9d18043979 Always initialize sc_ekey, because as of r238116 it is always used.
If GELI provider was created on FreeBSD HEAD r238116 or later (but before this
change), it is using very weak keys and the data is not protected.
The bug was introduced on 4th July 2012.

One can verify if its provider was created with weak keys by running:

	# geli dump <provider> | grep version

If the version is 7 and the system didn't include this fix when provider was
initialized, then the data has to be backed up, underlying provider overwritten
with random data, system upgraded and provider recreated.

Reported by:	Fabian Keil <fk@fabiankeil.de>
Tested by:	Fabian Keil <fk@fabiankeil.de>
Discussed with:	so
MFC after:	3 days
2012-08-10 18:43:29 +00:00
Alexander Motin
d9d6849693 Add missing FAILED event to g_raid_subdisk_event2str() to print it properly
in debug messages.

Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
2012-08-10 13:36:33 +00:00
Jim Harris
82a6ae1009 Clone BIO_ORDERED flag, for disk drivers (namely CAM) that try to
consume it.

Sponsored by: Intel
Discussed with: gibbs, scottl
2012-08-07 20:16:10 +00:00
Mikolaj Golub
1d9db37c77 In g_gate_dumpconf() always check the result of g_gate_hold().
This fixes "Negative sc_ref" panic possible when sysctl_kern_geom_confxml()
is run simultaneously with destroying GATE device.

Reviewed by:	pjd
MFC after:	3 days
2012-08-07 18:50:33 +00:00
Jim Harris
c1d00eabe8 In virstor_ctl_stop(), check for a valid softc before trying to update
metadata.

Sponsored by:		Intel
Reported and tested by:	Marcelo Gondim <gondim at bsdinfo dot com dot br>
PR:			kern/170199
MFC after:		3 days
2012-08-03 20:24:16 +00:00
Thomas Quinot
71ee4ef0d9 New command "gmultipath prefer" to force selection of a specified
provider in an Active/Passive configuration.

Reviewed by:	mav
MFC after:	4 weeks
2012-08-03 14:55:35 +00:00
Alexander Motin
e521fb0558 Partially revert r238886 in part of GEOM_VFS spoiling.
This change triggered interesting foot shooting condition in GEOM when
RW access to root partition by fsck spoils VFS geom there, which has it
opened RO at the same time.  Seems spoiling concept needs some rework.
2012-07-29 20:04:09 +00:00
Alexander Motin
3631c6382f Implement media change notification for DA and CD removable media devices.
It includes three parts:
 1) Modifications to CAM to detect media media changes and report them to
disk(9) layer. For modern SATA (and potentially UAS) devices it utilizes
Asynchronous Notification mechanism to receive events from hardware.
Active polling with TEST UNIT READY commands with 3 seconds period is used
for incapable hardware. After that both CD and DA drivers work the same way,
detecting two conditions: "NOT READY: Medium not present" after medium was
detected previously, and "UNIT ATTENTION: Not ready to ready change, medium
may have changed". First one reported to disk(9) as media removal, second
as media insert/change. To reliably receive second event new
AC_UNIT_ATTENTION async added to make UAs broadcasted to all periphs by
generic error handling code in cam_periph_error().
 2) Modifications to GEOM core to handle media remove and change events.
Media removal handled by spoiling all consumers attached to the provider.
Media change event also schedules provider retaste after spoiling to probe
new media. New flag G_CF_ORPHAN was added to consumers to reflect that
consumer is in process of destruction. It allows retaste to create new
geom instance of the same class, while previous one is still dying.
 3) Modifications to some GEOM classes: DEV -- to report media change
events to devd; VFS -- to handle spoiling same as orphan to prevent
accessing replaced media. PART class already handles spoiling alike to
orphan.

Reviewed by:	silence on geom@ and scsi@
Tested by:	avg
Sponsored by:	iXsystems, Inc. / PC-BSD
MFC after:	2 months
2012-07-29 11:51:48 +00:00
Mikolaj Golub
a277f47bd2 Reorder things in g_gate_create() so at the moment when g_new_geomf()
is called name is properly initialized.

Discussed with:	pjd
MFC after:	2 weeks
2012-07-28 16:30:50 +00:00
Edward Tomasz Napierala
a1cf7f75a6 Make it possible to resize opened partitions.
Sponsored by:	FreeBSD Foundation
2012-07-20 17:51:20 +00:00
Edward Tomasz Napierala
3a3ef28e15 Add missing free. 2012-07-18 07:26:20 +00:00
Kenneth D. Merry
edad9799e8 Add back spare fields consumed in r237545. It seems that these should only
be consumed to maintain backward compatibility in stable, but should not be
consumed in head.

Submitted by:	trasz, attilio (indirectly)
2012-07-17 22:16:10 +00:00
Edward Tomasz Napierala
9e9d445ed1 The resize GEOM event has no references, thus cannot be canceled. 2012-07-16 17:41:38 +00:00
Edward Tomasz Napierala
8fe7677998 Add back spare fields reused in r238213. According to Attilio, the rule
is to use reuse spares only when MFC-ing, not in CURRENT.
2012-07-16 16:50:28 +00:00
Edward Tomasz Napierala
7027e4dac4 Add trivial resize handling to gnop(8).
Reviewed by:	mav
Sponsored by:	FreeBSD Foundation
2012-07-07 22:22:13 +00:00
Edward Tomasz Napierala
74badfa6ba Add trivial resize handling to gmountver(8).
Reviewed by:	mav
Sponsored by:	FreeBSD Foundation
2012-07-07 22:20:47 +00:00
Edward Tomasz Napierala
bc97ce36f7 Add disk_resize(), to make it possible for the disk drivers such as da(4)
to notify GEOM about LUN size change.

Reviewed by:	mav (earlier version)
Sponsored by:	FreeBSD Foundation
2012-07-07 21:28:31 +00:00
Edward Tomasz Napierala
245899cc97 Add a new GEOM method, resize(), which is called after provider size changes.
Add a new routine, g_resize_provider(), to use to notify GEOM about provider
change.

Reviewed by:	mav
Sponsored by:	FreeBSD Foundation
2012-07-07 20:13:40 +00:00
Edward Tomasz Napierala
ad624005b3 Fix orphan() methods of several GEOM classes to not assume that there
is an error set on the provider.  With GEOM resizing, class can become
orphaned when it doesn't implement resize() method and the provider size
decreases.

Reviewed by:	mav
Sponsored by:	FreeBSD Foundation
2012-07-07 17:09:44 +00:00
Edward Tomasz Napierala
aaaf515fde Fix typo in the comment. 2012-07-06 15:46:38 +00:00
Pawel Jakub Dawidek
e08ec03778 Extend GEOM Gate class to handle read I/O requests directly within the kernel.
This will allow HAST to read directly from the local component without
even communicating userland daemon.

Sponsored by:	Panzura, http://www.panzura.com
MFC after:	1 month
2012-07-04 20:16:28 +00:00
Pawel Jakub Dawidek
457bbc4f3a Use correct part of the Master-Key for generating encryption keys.
Before this change the IV-Key was used to generate encryption keys,
which was incorrect, but safe - for the XTS mode this key was unused
anyway and for CBC mode it was used differently to generate IV
vectors, so there is no risk that IV vector collides with encryption
key somehow.

Bump version number and keep compatibility for older versions.

MFC after:	2 weeks
2012-07-04 17:54:17 +00:00
Pawel Jakub Dawidek
3d47ea3324 Correct comment.
MFC after:	3 days
2012-07-04 17:44:39 +00:00
Pawel Jakub Dawidek
ec58140a27 Correct a comment and correct style of a flag check.
MFC after:	3 days
2012-07-04 17:43:25 +00:00
Gleb Smirnoff
d89862ac87 Make geom_mirror more friendly to SSDs. To properly support TRIM,
we need to pass BIO_DELETE requests down to providers that support
it. Also, we need to announce our support for BIO_DELETE to upper
consumer. This requires:

- In g_mirror_start() return true for "GEOM::candelete" request.
- In g_mirror_init_disk() probe below provider for "GEOM::candelete"
  attribute, and mark disk with a flag if it does support BIO_DELETE.
- In g_mirror_register_request() distribute BIO_DELETE requests only
  to those disks, that do support it.

Note that we announce "GEOM::candelete" as true unconditionally of
whether we have TRIM-capable media down below or not. This is made
intentionally, because upper consumer (usually UFS) requests the
attribite only once at mount time. And if user ever migrates his
mirror from HDDs to SSDs, then he/she would get TRIM working without
remounting filesystem.

Reviewed by:	pjd
2012-07-01 15:43:52 +00:00
Gleb Smirnoff
b0ae63ca25 In g_mirror_regular_request() upon successful delivery treat
BIO_DELETE requests same way as BIO_WRITE removing them from
queue. This fixes panic with BIO_DELETE operations on geom_mirror.

Reviewed by:	pjd
2012-07-01 15:30:43 +00:00
Warner Losh
a920522660 Use %j to match intmax_t. 2012-07-01 05:22:13 +00:00
Brooks Davis
9e81f117f9 MFP4 #212266
Fix compile on MIPS64.

Sponsored by:	DARPA, AFRL
2012-06-29 20:15:00 +00:00
Kenneth D. Merry
c76a6fe732 In g_disk_providergone(), don't continue if the softc is NULL. This may be
the case if we've already gone through g_disk_destroy().

Reported by:	Michael Butler <imb@protected-networks.net>
MFC after:	3 days
2012-06-27 16:05:09 +00:00
Kenneth D. Merry
365e076ed2 Consume spare fields for the providergone pointers added to the g_class and
g_geom structures in change 237518.  The original change would have broken
the ABI.

Suggested by:	ae
MFC after:	4 days
2012-06-25 04:26:10 +00:00
Kenneth D. Merry
c3fb2891f0 Fix a bug which causes a panic in daopen(). The panic is caused by
a da(4) instance going away while GEOM is still probing it.

In this case, the GEOM disk class instance has been created by
disk_create(), and the taste of the disk is queued in the GEOM
event queue.

While that event is queued, the da(4) instance goes away.  When the
open call comes into the da(4) driver, it dereferences the freed
(but non-NULL) peripheral pointer provided by GEOM, which results
in a panic.

The solution is to add a callback to the GEOM disk code that is
called when all of its resources are cleaned up.  This is
implemented inside GEOM by adding an optional callback that is
called when all consumers have detached from a provider, and the
provider is about to be deleted.

scsi_cd.c,
scsi_da.c:	In the register routine for the cd(4) and da(4)
		routines, acquire a reference to the CAM peripheral
		instance just before we call disk_create().

		Use the new GEOM disk d_gone() callback to register
		a callback (dadiskgonecb()/cddiskgonecb()) that
		decrements the peripheral reference count once GEOM
		has finished cleaning up its resources.

		In the cd(4) driver, clean up open and close
		behavior slightly.  GEOM makes sure we only get one
		open() and one close call, so there is no need to
		set an open flag and decrement the reference count
		if we are not the first open.

		In the cd(4) driver, use cam_periph_release_locked()
		in a couple of error scenarios to avoid extra mutex
		calls.

geom.h:		Add a new, optional, providergone callback that
		is called when a provider is about to be deleted.

geom_disk.h:	Add a new d_gone() callback to the GEOM disk
		interface.

		Bump the DISK_VERSION to version 2.  This probably
		should have been done after a couple of previous
		changes, especially the addition of the d_getattr()
		callback.

geom_disk.c:	Add a providergone callback for the disk class,
		g_disk_providergone(), that calls the user's
		d_gone() callback if it exists.

		Bump the DISK_VERSION to 2.

geom_subr.c:	In g_destroy_provider(), call the providergone
		callback if it has been provided.

		In g_new_geomf(), propagate the class's
		providergone callback to the new geom instance.

blkfront.c:	Callers of disk_create() are supposed to pass in
		DISK_VERSION, not an explicit disk API version
		number.  Update the blkfront driver to do that.

disk.9:		Update the disk(9) man page to include information
		on the new d_gone() callback, as well as the
		previously added d_getattr() callback, d_descr
		field, and HBA PCI ID fields.

MFC after:	5 days
2012-06-24 04:29:03 +00:00
Andrey V. Elsukov
d4746e107f Always reconstruct partition entries in the PMBR when Boot Camp is
disabled. This helps to easily recover from situations when PMBR is
damaged and contains no entries.

MFC after:	1 week
2012-06-14 11:17:54 +00:00
Alexander Motin
a839e33278 Add missing newlines into XML output.
MFC after:	3 days
Sponsored by:	iXsystems, Inc.
2012-06-05 16:46:34 +00:00
Marcel Moolenaar
f24a8224b2 Add a partition type for nandfs to the apm, bsd, gpt and vtoc8 schemes.
The gpart alias for these partition types is "freebsd-nandfs".
2012-05-25 20:33:34 +00:00
Edward Tomasz Napierala
d87e55886e Revert r235918 for now and add comment explaining the reason for the
size check.
2012-05-25 10:08:48 +00:00
Edward Tomasz Napierala
202f0f2a02 Make g_label(4) ignore provider size when looking for UFS labels.
Without it, it fails to create labels for filesystems resized by
growfs(8).

PR:		kern/165962
Submitted by:	Olivier Cochard-Labbe <olivier at cochard dot me>
2012-05-24 16:48:33 +00:00
Xin LI
8287ee1bbe - Correct signedness for casts;
- Wrap long line while I'm there.

Noticed by:	pjd, avg
2012-05-23 20:51:21 +00:00