Commit Graph

124 Commits

Author SHA1 Message Date
mav
4505b89481 When searching for provider by name, prefer non-withered one.
MFC after:	2 weeks
2015-03-26 11:02:29 +00:00
ae
26ea220350 Add a topology trace to the g_spoil_event.
MFC after:	1 week
2014-05-19 16:08:15 +00:00
bdrewery
4d8489ab7f Make g_access() KASSERT() more useful.
Sponsored by:	EMC / Isilon Storage Division
Obtained from:	Isilon OneFS
MFC after:	2 weeks
2014-04-15 14:41:41 +00:00
smh
5c7a6f5d92 Improve ZFS N-way mirror read performance by using load and locality
information.

The existing algorithm selects a preferred leaf vdev based on offset of the zio
request modulo the number of members in the mirror. It assumes the devices are
of equal performance and that spreading the requests randomly over both drives
will be sufficient to saturate them. In practice this results in the leaf vdevs
being under utilized.

The new algorithm takes into the following additional factors:
* Load of the vdevs (number outstanding I/O requests)
* The locality of last queued I/O vs the new I/O request.

Within the locality calculation additional knowledge about the underlying vdev
is considered such as; is the device backing the vdev a rotating media device.

This results in performance increases across the board as well as significant
increases for predominantly streaming loads and for configurations which don't
have evenly performing devices.

The following are results from a setup with 3 Way Mirror with 2 x HD's and
1 x SSD from a basic test running multiple parrallel dd's.

With pre-fetch disabled (vfs.zfs.prefetch_disable=1):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s

With pre-fetch enabled (vfs.zfs.prefetch_disable=0):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s

In addition to the performance changes the code was also restructured, with
the help of Justin Gibbs, to provide a more logical flow which also ensures
vdevs loads are only calculated from the set of valid candidates.

The following additional sysctls where added to allow the administrator
to tune the behaviour of the load algorithm:
* vfs.zfs.vdev.mirror.rotating_inc
* vfs.zfs.vdev.mirror.rotating_seek_inc
* vfs.zfs.vdev.mirror.rotating_seek_offset
* vfs.zfs.vdev.mirror.non_rotating_inc
* vfs.zfs.vdev.mirror.non_rotating_seek_inc

These changes where based on work started by the zfsonlinux developers:
https://github.com/zfsonlinux/zfs/pull/1487

Reviewed by:	gibbs, mav, will
MFC after:	2 weeks
Sponsored by:	Multiplay
2013-10-23 09:54:58 +00:00
des
140807754c Introduce a kern.geom.notaste sysctl that can be used to temporarily
disable GEOM tasting to avoid the "bouncing GEOM" problem where, when
you shut down the consumer of a provider which can be viewed in multiple
ways (typically a mirror whose members are labeled partitions), GEOM
will immediately taste that provider's alter ego and reattach the
consumer.

Approved by:	re (glebius)
2013-09-24 20:05:16 +00:00
mav
2ebdf6d693 Make g_wither_washer() to not loop by itself, but only when there was some
more topology change done that may require its attention.  Add few missing
g_do_wither() calls in respective places to signal it.

This fixes potential infinite loop here when some provider is withered, but
still opened or connected for some reason and so can not be destroyed.  For
example, see r227009 and r227510.
2013-03-24 03:15:20 +00:00
pjd
12303b5db5 Allow to pass providers with /dev/ prefix to g_provider_by_name().
MFC after:	3 days
2012-09-01 10:52:19 +00:00
ed
099a431e7f Remove unneeded G_PF_CANDELETE flag.
This flag is only used by GEOM so it can be propagated to the character
device's SI_CANDELETE. Unfortunately, SI_CANDELETE seems to do nothing.
2012-08-28 19:28:31 +00:00
mav
24017b5387 Implement media change notification for DA and CD removable media devices.
It includes three parts:
 1) Modifications to CAM to detect media media changes and report them to
disk(9) layer. For modern SATA (and potentially UAS) devices it utilizes
Asynchronous Notification mechanism to receive events from hardware.
Active polling with TEST UNIT READY commands with 3 seconds period is used
for incapable hardware. After that both CD and DA drivers work the same way,
detecting two conditions: "NOT READY: Medium not present" after medium was
detected previously, and "UNIT ATTENTION: Not ready to ready change, medium
may have changed". First one reported to disk(9) as media removal, second
as media insert/change. To reliably receive second event new
AC_UNIT_ATTENTION async added to make UAs broadcasted to all periphs by
generic error handling code in cam_periph_error().
 2) Modifications to GEOM core to handle media remove and change events.
Media removal handled by spoiling all consumers attached to the provider.
Media change event also schedules provider retaste after spoiling to probe
new media. New flag G_CF_ORPHAN was added to consumers to reflect that
consumer is in process of destruction. It allows retaste to create new
geom instance of the same class, while previous one is still dying.
 3) Modifications to some GEOM classes: DEV -- to report media change
events to devd; VFS -- to handle spoiling same as orphan to prevent
accessing replaced media. PART class already handles spoiling alike to
orphan.

Reviewed by:	silence on geom@ and scsi@
Tested by:	avg
Sponsored by:	iXsystems, Inc. / PC-BSD
MFC after:	2 months
2012-07-29 11:51:48 +00:00
trasz
de6625db1c Add missing free. 2012-07-18 07:26:20 +00:00
trasz
7686baf110 The resize GEOM event has no references, thus cannot be canceled. 2012-07-16 17:41:38 +00:00
trasz
f2e3e9e073 Add a new GEOM method, resize(), which is called after provider size changes.
Add a new routine, g_resize_provider(), to use to notify GEOM about provider
change.

Reviewed by:	mav
Sponsored by:	FreeBSD Foundation
2012-07-07 20:13:40 +00:00
ken
be54b17782 Fix a bug which causes a panic in daopen(). The panic is caused by
a da(4) instance going away while GEOM is still probing it.

In this case, the GEOM disk class instance has been created by
disk_create(), and the taste of the disk is queued in the GEOM
event queue.

While that event is queued, the da(4) instance goes away.  When the
open call comes into the da(4) driver, it dereferences the freed
(but non-NULL) peripheral pointer provided by GEOM, which results
in a panic.

The solution is to add a callback to the GEOM disk code that is
called when all of its resources are cleaned up.  This is
implemented inside GEOM by adding an optional callback that is
called when all consumers have detached from a provider, and the
provider is about to be deleted.

scsi_cd.c,
scsi_da.c:	In the register routine for the cd(4) and da(4)
		routines, acquire a reference to the CAM peripheral
		instance just before we call disk_create().

		Use the new GEOM disk d_gone() callback to register
		a callback (dadiskgonecb()/cddiskgonecb()) that
		decrements the peripheral reference count once GEOM
		has finished cleaning up its resources.

		In the cd(4) driver, clean up open and close
		behavior slightly.  GEOM makes sure we only get one
		open() and one close call, so there is no need to
		set an open flag and decrement the reference count
		if we are not the first open.

		In the cd(4) driver, use cam_periph_release_locked()
		in a couple of error scenarios to avoid extra mutex
		calls.

geom.h:		Add a new, optional, providergone callback that
		is called when a provider is about to be deleted.

geom_disk.h:	Add a new d_gone() callback to the GEOM disk
		interface.

		Bump the DISK_VERSION to version 2.  This probably
		should have been done after a couple of previous
		changes, especially the addition of the d_getattr()
		callback.

geom_disk.c:	Add a providergone callback for the disk class,
		g_disk_providergone(), that calls the user's
		d_gone() callback if it exists.

		Bump the DISK_VERSION to 2.

geom_subr.c:	In g_destroy_provider(), call the providergone
		callback if it has been provided.

		In g_new_geomf(), propagate the class's
		providergone callback to the new geom instance.

blkfront.c:	Callers of disk_create() are supposed to pass in
		DISK_VERSION, not an explicit disk API version
		number.  Update the blkfront driver to do that.

disk.9:		Update the disk(9) man page to include information
		on the new d_gone() callback, as well as the
		previously added d_getattr() callback, d_descr
		field, and HBA PCI ID fields.

MFC after:	5 days
2012-06-24 04:29:03 +00:00
gibbs
094b6aca1d Plumb device physical path reporting from CAM devices, through GEOM and
DEVFS, and make it accessible via the diskinfo utility.

Extend GEOM's generic attribute query mechanism into generic disk consumers.
sys/geom/geom_disk.c:
sys/geom/geom_disk.h:
sys/cam/scsi/scsi_da.c:
sys/cam/ata/ata_da.c:
	- Allow disk providers to implement a new method which can override
	  the default BIO_GETATTR response, d_getattr(struct bio *).  This
	  function returns -1 if not handled, otherwise it returns 0 or an
	  errno to be passed to g_io_deliver().

sys/cam/scsi/scsi_da.c:
sys/cam/ata/ata_da.c:
	- Don't copy the serial number to dp->d_ident anymore, as the CAM XPT
	  is now responsible for returning this information via
	  d_getattr()->(a)dagetattr()->xpt_getatr().

sys/geom/geom_dev.c:
	- Implement a new ioctl, DIOCGPHYSPATH, which returns the GEOM
	  attribute "GEOM::physpath", if possible.  If the attribute request
	  returns a zero-length string, ENOENT is returned.

usr.sbin/diskinfo/diskinfo.c:
	- If the DIOCGPHYSPATH ioctl is successful, report physical path
	  data when diskinfo is executed with the '-v' option.

Submitted by:	will
Reviewed by:	gibbs
Sponsored by:	Spectra Logic Corporation

Add generic attribute change notification support to GEOM.

sys/sys/geom/geom.h:
	Add a new attrchanged method field to both g_class
	and g_geom.

sys/sys/geom/geom.h:
sys/geom/geom_event.c:
	- Provide the g_attr_changed() function that providers
	  can use to advertise attribute changes.
	- Perform delivery of attribute change notifications
	  from a thread context via the standard GEOM event
	  mechanism.

sys/geom/geom_subr.c:
	Inherit the attrchanged method from class to geom (class instance).

sys/geom/geom_disk.c:
	Provide disk_attr_changed() to provide g_attr_changed() access
	to consumers of the disk API.

sys/cam/scsi/scsi_pass.c:
sys/cam/scsi/scsi_da.c:
sys/geom/geom_dev.c:
sys/geom/geom_disk.c:
	Use attribute changed events to track updates to physical path
	information.

sys/cam/scsi/scsi_da.c:
	Add AC_ADVINFO_CHANGED to the registered asynchronous CAM
	events for this driver.  When this event occurs, and
	the updated buffer type references our physical path
	attribute, emit a GEOM attribute changed event via the
	disk_attr_changed() API.

sys/cam/scsi/scsi_pass.c:
	Add AC_ADVINFO_CHANGED to the registered asynchronous CAM
	events for this driver.  When this event occurs, update
	the physical patch devfs alias for this pass instance.

Submitted by:	gibbs
Sponsored by:	Spectra Logic Corporation
2011-06-14 17:10:32 +00:00
mav
79493720f3 Implement relaxed comparision for hardcoded provider names to make it
ignore adX/adaY difference in both directions to simplify migration to
the CAM-based ATA or back.
2011-04-27 00:10:26 +00:00
jh
70139e5fa6 Fix deadlock between GEOM class unloading and withering. Withering can't
proceed while g_unload_class() blocks the event thread. Fix this by not
running g_unload_class() as a GEOM event and dropping the topology lock
when withering needs to proceed.

PR:		kern/139847
Silence on:	freebsd-geom
2010-05-05 18:53:24 +00:00
jh
b6e7ef0d61 Fix ddb(4) "show geom addr" command when INVARIANTS is enabled. Don't
assert that the topology lock is held when g_valid_obj() is called from
debugger.

MFC after:	1 week
2010-04-19 20:07:35 +00:00
pjd
bcd34167f7 Log attach just like we log detach. 2010-02-18 22:27:38 +00:00
trasz
bde8460ebe Fix a panic which (reportedly) can happen when unmounting a filesystem
with I/O requests in flight on kernels compiled with "options INVARIANTS".
Also, make it obvious it's not right to call g_valid_obj() (and macros
using it, e.g. G_VALID_CONSUMER()) without topology lock held.

Approved by:	re (kib)
Reported by:	pho
2009-07-01 20:16:29 +00:00
pjd
6796cdb325 Simplify. 2009-06-05 23:35:43 +00:00
lulf
8765020747 - Unbreak 64 bit platforms by casting off_t to intmax. 2009-05-26 14:15:06 +00:00
lulf
66e14dfc33 - Fix wrong print on BIO_DONE.
- Use db_printf instead of printf. While here, apply this to other ddb commands
  as well.

Pointed out by:		pjd
2009-05-26 10:03:44 +00:00
lulf
6ffe643641 - Add 'show bio' DDB command.
MFC after:	3 weeks
2009-05-26 07:29:17 +00:00
thompsa
39714cb212 Revert r190676,190677
The geom and CAM changes for root_hold are the wrong solution for USB design
quirks.

Requested by:	scottl
2009-04-10 04:08:34 +00:00
thompsa
2d53d4304d Add interleaving root hold tokens from the CAM probe to disk_create and geom
provider tasting. This is needed for disk attachments that happen after threads
are running in the boot process.

Tested by:	rnoland
2009-04-03 19:49:33 +00:00
marcel
976d8b4239 In g_handleattr(), set bp->bio_completed also for the case
where len is 0. Otherwise g_getattr() will never succeed
when it is handled by g_handleattr_str().
2009-02-03 07:07:13 +00:00
marcel
41ccd088c6 Constify val in g_handleattr() and str in g_handleattr_str().
This allows passing string constants to g_handleattr_str().
2009-02-01 01:50:09 +00:00
lulf
6e104c3f3b - Add missing word in comment. 2008-12-08 17:09:02 +00:00
des
c2c1c946ae Add sbuf_new_auto as a shortcut for the very common case of creating a
completely dynamic sbuf.

Obtained from:	Varnish
MFC after:	2 weeks
2008-08-09 11:14:05 +00:00
pjd
e225ca8138 - Assert that we don't send new provider event for a provider which has
G_PF_WITHER flag set.
- Fix typo in assertion condition (sorry, but I forgot who report that).
2008-05-18 22:50:50 +00:00
pjd
21a0be64c7 Play nice with DDB pager.
Educated by:	jhb's BSDCan presentation
2008-05-18 21:13:10 +00:00
marcel
dd866faa70 When retasting, wither any existing GEOMs of the same class. This
allows the class to create a different GEOM for the same provider
as well as avoid that we end up with multiple GEOMs of the same
class with the same name.

For example, when a disk contains a PC98 partition table but
only MBR is supported, then the partition table can be treated
as a MBR. If support for PC98 is later loaded as a module, the
MBR scheme is pre-empted for the PC98 scheme as expected.
2008-03-28 06:31:12 +00:00
marcel
31a163ef06 Add g_retaste(), which given a class will present all non-open providers
to it for tasting. This is useful when the class, through means outside
the scope of GEOM, can claim providers previously unclaimed.

The g_retaste() function posts an event which is handled by the
g_retaste_event().

Event suggested by: phk
2008-03-23 01:23:35 +00:00
pjd
ddfa2416f5 - Implement helper g_handleattr_str() function for string attributes
handling.
- Extend g_handleattr() to treat attribute as string when len=0.

OK'ed by:	phk
2007-05-05 16:33:44 +00:00
pjd
556424a17a Add 'show geom [addr]' ddb(4) command, which prints entire GEOM topology if
no additional argument is given or details about the given GEOM object
(class, geom, provider or consumer).

Approved by:	phk
2006-09-15 16:36:45 +00:00
marcel
c168f9530e Add g_wither_provider() to abstract the details of destroying a
particular provider. Use this function where g_orphan_provider()
is being called so that the flags are updated correctly and
g_orphan_provider() is called only when allowed.
2006-04-10 03:55:13 +00:00
jdp
536960dbba Fix a bug that caused some /dev entries to continue to exist after
the underlying drive had been hot-unplugged from the system.  Here
is a specific example.  Filesystem code had opened /dev/da1s1e.
Subsequently, the drive was hot-unplugged.  This (correctly) caused
all of the associated /dev/da1* entries to be deleted.  When the
filesystem later realized that the drive was gone it closed the
device, reducing the write-access counts to 0 on the geom providers
for da1s1e, da1s1, and da1.  This caused geom to re-taste the
providers, resulting in the devices being created again.  When the
drive was hot-plugged back in, it resulted in duplicate /dev entries
for da1s1e, da1s1, and da1.

This fix adds a new disk_gone() function which is called by CAM when a
drive goes away.  It orphans all of the providers associated with the
drive, setting an error condition of ENXIO in each one.  In addition,
we prevent a re-taste on last close for writing if an error condition
has been set in the provider.

Sponsored by:   Isilon Systems
Reviewed by:    phk
MFC after:      1 week
2005-11-18 02:43:49 +00:00
phk
55543da1eb fix a "modify after free" bug which is practically impossible to
experience.

Found by:	Coverity (id #540 #541)
2005-03-26 21:07:35 +00:00
phk
269b7a298a Add g_wither_geom_close() function. 2004-10-29 09:19:03 +00:00
phk
80dc3e1bc2 Don't call g_waitidle(), it happens automagically now. 2004-10-23 20:52:15 +00:00
arr
b510c7b809 - Turn KASSERT()s into warning printf()'s in the g_class_load() routine.
This removes a panic that will occur if you build with GENERIC and
  attempt to kldload a GEOM module that is already in the kernel.

Reviewed by: phk
2004-10-22 22:16:24 +00:00
green
c125751bac When loading GEOM modules, we expect the actual load process to be done
by the time that kldload(8) returns.  Satisfy that by making the GEOM
module load event -- only when the kernel is !cold -- wait until the
GEOM module init function has finished instead of returning immediately.

This is the other half of fixing md(8) (actually, "mfs" in fstab(5))
that is similar to r1.128 of src/sys/dev/md/md.c.  This bug would be
why RAM disks would often fail on boot and the first call to mdconfig(8)
would probably fail.

pjd has ideas for not requiring kldload(8) to work synchronously for
control devices that could make this obsolete.

Silence on:	-arch
2004-10-12 04:44:54 +00:00
phk
38ae102f61 For removable devices without media we set a zero mediasize but a non-zero
sectorsize in order to avoid a lot of checks around various divisions etc.

Enforce the sectorsize being > 0 with a KASSERT on successful open.

Fix scsi_cd.c to return 2k sectors when no media inserted.
2004-09-05 21:15:58 +00:00
phk
685cf0583f OK, now check geom class version numbers. 2004-08-08 08:34:46 +00:00
phk
2568a59fae OOps, that check was a bit premature. Allow zero versions as well. 2004-08-08 07:30:47 +00:00
phk
e7ff003728 Give classes a version number and refuse to touch classes which are not
understood.  This makes room for additional binary compatibility in the
future.

Put fields in the class for the geom's methods and initialize the methods
of a new geom from these fields.  This saves some code in all classes.
2004-08-08 06:46:27 +00:00
phk
5be6baa85c Only detach consumers which are attached when we wither stuff away.
Pointed out by:	pjd
2004-07-09 14:06:17 +00:00
phk
042d10f8ef Make withering water tight.
When we orphan/wither a provider, an attached geom+consumer could
end up being withered as a result and it may be in front of us in
the normal object scanning order so we need to do multi-pass.  On
the other hand, there may be withering stuff we can't get rid off
(yet), so we need to keep track of both the existence of withering
stuff and if there is more we can do at this time.
2004-07-08 16:17:14 +00:00
phk
a0a0195602 Fail normally rather than KASSERT if attempt to open a spoiled consumer. 2004-07-08 10:34:09 +00:00
pjd
5fae899223 Move "is consumer attached?" check before G_VALID_PROVIDER() check,
because if consumer is not attached, its provider never will be valid,
so we never reach this check.

Approved by:	phk
2004-03-18 07:17:10 +00:00