Commit Graph

181 Commits

Author SHA1 Message Date
Mark Johnston
7653e6d781 Release the queue lock before restarting the worker loop.
Reported and tested by:	pho
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
2018-01-08 15:41:49 +00:00
Mark Johnston
1787c3feb4 Fix some I/O ordering issues in gmirror.
- BIO_FLUSH requests were dispatched to the disks directly from
  g_mirror_start() rather than going through the mirror's I/O request
  queue, so they could have been reordered with preceding writes.
  Address this by processing such requests from the queue, avoiding
  direct dispatch.
- Handling for collisions with synchronization requests was too
  fine-grained and could cause reordering of writes. In particular,
  BIO_ORDERED was not being honoured. Address this by effectively
  freezing the request queue any time a collision with a synchronization
  request occurs. The queue is unfrozen once the collision with the
  first frozen request is over.
- The above-mentioned collision handling allowed reads to jump ahead
  of writes to the same offset. Address this by freezing all request
  types when a collision occurs, not just BIO_WRITEs and BIO_DELETEs.

Also add some more fail points for use in testing error handling.

Reviewed by:	imp
MFC after:	3 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13559
2018-01-02 18:11:54 +00:00
Mark Johnston
9abe2e7e98 Avoid using bioq_* in gmirror.
gmirror does not perform any sorting of I/O requests, so the bioq API
doesn't provide any advantages over plain TAILQs. The API also does not
provide operations needed by an upcoming change.

No functional change intended. The diff shrinks the geom_mirror.ko
text and the gmirror softc slightly.

Tested by:	pho (part of a larger patch)
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-12-19 17:13:04 +00:00
Mark Johnston
68eadcec0f Give a couple of predication functions a bool return type.
No functional change intended.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-12-15 19:14:21 +00:00
Mark Johnston
204d94f161 Typo.
MFC after:	1 week
2017-12-15 19:03:03 +00:00
Mark Johnston
8b93770503 Address a possible lost wakeup for gmirror events.
g_mirror_event_send() acquires the I/O queue lock to deliver a wakeup
to the worker thread, and this is done after enqueuing the event.
So it's sufficient to check the event queue before atomically releasing
the queue lock and going to sleep.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-12-12 17:29:34 +00:00
Mark Johnston
b634781eac Give g_mirror_event_get() a more accurate name.
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-12-12 17:25:25 +00:00
Mark Johnston
a3584ee355 Decrement sc_writes when BIO_DELETE requests complete.
Otherwise a gmirror that has received a BIO_DELETE request will never be
marked clean (unless sc_writes overflows).

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-12-12 17:24:30 +00:00
Mark Johnston
2ceafb776e Update gmirror metadata less frequently when synchronizing.
We periodically record synchronization progress in the metadata
block of the disk being synchronized; this allows an interrupted
synchronization to be resumed. However, the frequency of these
updates heavily pessimized synchronization time on some media. This
change modifies gmirror to update metadata based on a time period,
and adds a sysctl to control that period. The default value results
in a much lower update frequency and increases the completion time
for an interrupted rebuild only marginally.

Reported by:	Andre Albsmeier <andre@fbsd.e4m.org>
MFC after:	3 weeks
2017-11-30 20:36:29 +00:00
Pedro F. Giffuni
3728855a0f sys/geom: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:17:37 +00:00
Mark Johnston
0349817103 Allow kern.geom.mirror.debug to be negative.
A negative value can be used to suppress all prints from the gmirror
kernel code, which can be useful when attempting to trigger race
conditions using stress tests.

MFC after:	1 week
2017-11-23 14:07:52 +00:00
Mark Johnston
cef5abd140 Fix a lock leak in g_mirror_destroy().
g_mirror_destroy() is supposed to unlock the softc before indicating
success, but it wasn't doing so if the caller raced with another
thread destroying the mirror.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-10-27 17:05:14 +00:00
Andriy Gapon
7103ac8ad6 gmirror: treat ENXIO as disk disconnect, not media error
In theory, all data access errors mean that a member is out of sync
at most.  But they were treated as more serious errors to avoid the
situation where a flaky disk gets repeatedly disconnected, re-synchronized,
reconnected and then disconnected again.

ENXIO is a special error that means that the member disk disappeared,
so it should get the same handling as the GEOM orphaning event.
There is a better chance that when the disk is reconnected, it will be
a good member again.

When ENXIO happens on a read we use the exisiting G_MIRROR_BUMP_SYNCID
mechanism which means that the mirror's syncid is increased as soon
as there is a write to the mirror.  That's because no data has got out
of sync yet, but the problematic memeber is disconnected, so the future
write will make it stale.

When ENXIO happens on a write we use a new G_MIRROR_BUMP_SYNCID_NOW
mechanism which means that we update the mirror metadata as soon as
possible because the problematic memeber is already behind.

Reviewed by:	markj, imp
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D9463
2017-09-15 13:57:08 +00:00
Mark Johnston
db7c508323 Synchronize unclean mirrors before adding them to a running gmirror.
During gmirror startup, if component mirrors are found to be dirty as is
typical after a system crash, the mirrors are synchronized to the mirror
with highest priority. However if a gmirror starts without all of its
mirrors present, for example because of some transient delays during
tasting, the remaining mirrors must be synchronized before they may become
active.

MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-05-02 23:29:42 +00:00
Mark Johnston
a7d94fcc3e Rename two gmirror state flags to make their meanings slightly clearer.
No functional change.

MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-04-14 17:13:57 +00:00
Mark Johnston
1e91412e40 Don't set the mirror GEOM softc to NULL in g_mirror_destroy().
At this point we have not rendezvous'ed with the mirror worker thread, and
I/O may still be in flight. Various I/O completion paths expect to be able
to obtain a reference to the mirror softc from the GEOM, so setting it to
NULL may result in various NULL pointer dereferences if the mirror is
stopped with -f or the kernel is shut down while a mirror is
synchronizing. The worker thread will clear the softc pointer before
exiting.

Tested by:	pho
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-04-14 17:08:37 +00:00
Mark Johnston
77011eac86 Check for a provider error before enqueuing mirror I/O.
We are otherwise susceptible to a race with a concurrent teardown of the
mirror provider, causing the I/O to be left uncompleted after the mirror
started withering.

Tested by:	pho
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-04-14 17:03:32 +00:00
Mark Johnston
a65d524afc Stop mirror synchronization before draining the I/O queue.
Regular I/O requests may be blocked by concurrent synchronization requests
targeted to the same LBAs, in which case they are moved to a holding queue
until the conflicting I/O completes. We therefore want to stop
synchronization before completing pending I/O in g_mirror_destroy_provider()
since this ensures that blocked I/O requests are completed as well.

Tested by:	pho
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-04-14 16:54:50 +00:00
Mark Johnston
a4834289d6 Handle NULL entries in gmirror disk ds_bios arrays.
Entries may be removed and freed if an I/O error occurs during mirror
synchronization, so we cannot assume that all entries of ds_bios are
valid.

Also ensure that a synchronization BIO's array index is preserved after
a successful write.

Reported and tested by:	pho
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-04-10 17:15:59 +00:00
Mark Johnston
0d75d0dfbc Avoid sleeping when the mirror I/O queue is non-empty.
A request may be queued while the queue lock is dropped when the mirror is
being destroyed. The corresponding wakeup would be lost, possibly resulting
in an apparent hang of the mirror worker thread.

Tested by:	pho (part of a larger patch)
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-03-29 19:39:07 +00:00
Mark Johnston
c1ab409cba Remove an unneeded g_mirror_destroy_provider() call.
The worker thread will destroy the mirror provider as part of its teardown
sequence. The call made sense in the initial revision of gmirror, but
became unnecessary in r137248.

Tested by:	pho (part of a larger diff)
MFC afteR:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-03-29 19:30:22 +00:00
Mark Johnston
819cd913f4 Refine r301173 a bit.
- Don't execute any of g_mirror_shutdown_post_sync() when panicking. We
  cannot safely idle the mirror or stop synchronization in that state, and
  the current attempts to do so complicate debugging of gmirror itself.
- Check for a non-NULL panicstr instead of using SCHEDULER_STOPPED(). The
  latter was added for use in the locking primitives.

Reviewed by:	mav, pjd
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-03-27 16:25:58 +00:00
Alexander Motin
b6fe583c55 Add gmirror create subcommand, alike to gstripe, gconcat, etc.
It is quite specific mode of operation without storing on-disk metadata.
It can be useful in some cases in combination with some external control
tools handling mirror creation and disks hot-plug.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2016-11-30 09:27:08 +00:00
Alexander Motin
dc399583ba Use providergone method to cover race between destroy and g_access().
Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2016-11-13 03:56:26 +00:00
Mark Johnston
5c2ac5cf2a gmirror: Add a subroutine to free synchronization BIOs.
This addresses a memory leak that occurs upon an I/O error during a mirror
synchronization.

MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2016-10-20 23:08:40 +00:00
Mark Johnston
b450976dc2 gmirror: Release pending regular requests when synchronization stops.
Normally gmirror allows colliding requests to proceed whenever a
synchronization request completes and advances to the next offset. However
if an I/O request collides with one of the final g_mirror_syncreqs, nothing
releases it once synchronization completes, resulting in an apparent I/O
hang. The same problem can occur if synchronization is aborted by an
I/O error. Therefore, be sure to requeue pending requests when
mirror synchronization is stopped for any reason.

While here, remove some dead code from g_mirror_regular_release().

MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2016-10-20 23:02:30 +00:00
Alexander Motin
5a236b0ef9 Fix possible geom destruction before final provider close.
Introduce internal counter to track opens.  Using provider's counters is
not very successfull after calling g_wither_provider().

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2016-10-06 15:20:05 +00:00
Mark Johnston
4dea20be45 gmirror: Write an updated syncid before queuing writes.
When a syncid bump is pending, any write to the mirror results in the
updated syncid being written to each component's metadata block. However,
the update was only being performed after the writes to the mirror
componenents were queued. Instead, synchronously update the metadata block
first.

MFC after:	3 weeks
Sponsored by:	Dell EMC Isilon
2016-10-06 00:13:55 +00:00
Mark Johnston
903618cd65 gmirror: Bump the syncid if broken disks are found during startup.
Consider a mirror with two components, m1 and m2. Suppose a hardware error
results in the removal of m2, with m1's genid bumped. Suppose further that
a replacement mirror component m3 is created and synchronized, after which
the system is shut down uncleanly. During a subsequent bootup, if gmirror
tastes m1 and m2 first, m2 will be removed from the mirror because it is
broken, but the mirror will be started without bumping the syncid on m1
because all elements of the mirror are accounted for. Then m3 will be
added to the already-running mirror with the same syncid as m1, so the
components will not be synchronized despite the unclean shutdown.

Handle this scenario by bumping the syncid of healthy components if any
broken mirrors are discovered during mirror startup.

MFC after:	3 weeks
Sponsored by:	Dell EMC Isilon
2016-10-06 00:05:45 +00:00
Mark Johnston
fff048e4bc gmirror: Use bool instead of boolean_t.
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2016-10-05 23:55:01 +00:00
Alexander Motin
8b64f3ca6c Use g_wither_provider() where applicable.
It is just a helper function combining G_PF_WITHER setting with
g_orphan_provider().
2016-09-23 21:29:40 +00:00
Mark Johnston
4bfb585351 Don't treat an error from g_mirror_clear_metadata() as fatal.
Such errors can occur as the result of a write error or because the disk
backing the mirror element was removed. They result in a generation ID bump
on all active elements of the mirror, so we can safely disconnect the mirror
component rather than destroy it.

MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D7750
2016-09-06 23:42:59 +00:00
Mark Johnston
40c5032d32 Add some fail points to gmirror.
These are useful for testing changes to I/O error handling, and for
reproducing existing bugs in a controlled manner. The fail points are

    g_mirror_regular_request_read
    g_mirror_regular_request_write
    g_mirror_sync_request_read
    g_mirror_sync_request_write
    g_mirror_metadata_write

They all effectively allow one to inject an error value into the bio_error
field of a corresponding BIO request as it is being completed.

MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2016-09-06 23:35:48 +00:00
Mark Johnston
7d31c3939a Move some gmirror metadata update messages to a higher debug level.
These can be printed quite frequently from a mostly-idle mirror, cluttering
the console.

MFC after:	1 week
2016-07-14 00:40:24 +00:00
Mark Johnston
be20fc2e90 Do not complete pending gmirror BIOs when tearing down the provider.
This will result in lock recursion and is more generally incorrect since
the completion handlers will just reinsert the BIOs into the queue we're
trying to drain.

Reviewed by:	imp, ngie
Approved by:	re (gjb)
MFC after:	3 weeks
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D6908
2016-06-22 21:00:28 +00:00
Gleb Smirnoff
a7c5163b5f When we are in panic, always go the asynchronous path in g_mirror_destroy(),
otherwise the system will hang.

This is a temporarily least intrusive crutch to get certain panicing systems
dumping. The proper fix should question is g_mirror_destroy() should be called
on a panicing system at all.

Discussed with:	mav
2016-06-01 22:11:54 +00:00
Konstantin Belousov
4e2732b550 Removal of Giant droping wrappers for GEOM classes.
Sponsored by:	The FreeBSD Foundation
2016-05-20 08:25:37 +00:00
Pedro F. Giffuni
e8d5712284 sys/geom: spelling fixes in comments.
No functional change.
2016-04-29 20:56:58 +00:00
Pedro F. Giffuni
b99bce73e2 geom: unsign some types to match their definitions and avoid overflows.
In struct:gctl_req, nargs is unsigned.

In mirror:
g_mirror_syncreqs is unsigned.

In raid:
in struct:g_raid_volume, v_disks_count is unsigned.

In virstor:
in struct:g_virstor_softc, n_components is unsigned.

MFC after:	2 weeks
2016-04-27 15:10:40 +00:00
Warner Losh
9a8fa125c1 Bump bio_cmd and bio_*flags from 8 bits to 16.
Differential Revision: https://reviews.freebsd.org/D5784
2016-04-14 05:10:41 +00:00
Warner Losh
c55f57071a Create an API to reset a struct bio (g_reset_bio). This is mandatory
for all struct bio you get back from g_{new,alloc}_bio. Temporary
bios that you create on the stack or elsewhere should use this before
first use of the bio, and between uses of the bio. At the moment, it
is nothing more than a wrapper around bzero, but that may change in
the future. The wrapper also removes one place where we encode the
size of struct bio in the KBI.
2016-02-17 17:16:02 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Alexander Motin
5d85cd2d11 Remove extra semicolon.
MFC after:	1 week
2015-03-27 12:45:20 +00:00
Alexander Motin
3ab0187add Remove request sorting from GEOM_MIRROR and GEOM_RAID.
When CPU is not busy, those queues are typically empty.  When CPU is busy,
then one more extra sorting is the last thing it needs.  If specific device
(HDD) really needs sorting, then it will be done later by CAM.

This supposed to fix livelock reported for mirror of two SSDs, when UFS
fires zillion of BIO_DELETE requests, that totally blocks I/O subsystem by
pointless sorting of requests and responses under single mutex lock.

MFC after:	2 weeks
2015-03-27 12:44:28 +00:00
Alexander Motin
41fe4ba647 Fix bug on memory allocation error in split method.
While there, use bioq_takefirst() in place where it is convenient.

MFC after:	1 week
2015-03-27 11:14:12 +00:00
Alexander Motin
7715befdf2 Fix couple BIO_DELETE bugs in geom_mirror.
Do not report GEOM::candelete if none of providers support BIO_DELETE.
If consumer still requests BIO_DELETE, report error instead of hanging.

MFC after:	2 weeks
2015-03-12 10:20:53 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Bryan Drewery
09adfca39f Show error code when failing to destroy a mirror on delay
Sponsored by:	EMC / Isilon Storage Division
MFC after:	2 weeks
2014-04-05 03:01:29 +00:00