with more than one plex, the data will be accessed
multiple times. During this time, userland code could
potentially modify the buffer, thus causing data
corruption. In the case of a multi-plexed volume this
might be cosmetic, but in the case of a RAID-[45] plex it
can cause severe data corruption which only becomes
evident after a drive failure. Avoid this situation by
making a copy of the data buffer before using it.
Note that this solution does not guarantee any particular
content of the buffer, just that it remains unchanged for
the duration of the request.
Suggested by: alfred
Use this instead of DEBUG_LASTREQS to decide whether to log lock
requests.
MFS:
vinumlock: Catch a potential race condition where one process is
waiting for a lock, and between the time it is woken and
it retries the lock, another process gets it and places it
in the first entry in the table.
This problem has not been observed, but it's possible, and
it's easy enough to fix.
Submitted by: tegge
vinumunlock: Catch a real bug capable of hanging a system. When
releasing a lock, vinumunlock() called wakeup_one. This
caused wakeups to sometimes get lost. After due
consideration, we think that this is due to the fact that
you can't guarantee that some other process is also
waiting on the same address. This makes wakeup_one a
very dangerous function to use.
Requested by: bde
Add retryerrors keyword.
vinum_scandisk: Print a different message if an inadvertent start
command did not find any additional drives. The previous message "no
drives found" confused and worried many people.
MFS:
vinum_open: Recognize Mylex devices as storage devices.
In case of error, check the VF_RETRYERRORS flag in the subdisk and
don't take the subdisk down if it's set, just retry the I/O.
Requested by: peter
If the buffer has been copied (XFR_COPYBUF), release the copied
buffer when the I/O completes.
Suggested by: alfred
Desired by: bde
This commit is the first of a general cleanup of the header files..
It won't be enough to make bde happy.
Move debug definitions from vinumhdr.h.
Create a new struct rangelockinfo. In revision 1.21 of vinumlock.c,
the plex info was removed from struct rangelock, since it wasn't
needed there. It *is* needed for trace information, however, so use
struct rangelockinfo for that.
userland tool:
Use the vfs.devfs.generation sysctl to test for devfs presense
(thanks phk!) when devfs is active it will not try to create the
device nodes in /dev and therefore will not complain about the
failure to do so.
Revert the change in the #define for VINUM_DIR in the kernel
header so that vinum can find its device nodes.
Replace perror() with vinum_perror() to print file/line when
DEVBUG is defined (not defined by default).
kernel:
Don't use the #define names for the "superdev" creation since
they will be prepended by "/dev/" (based on VINUM_DIR), instead
use string constants.
Create both debug and non-debug "superdev" nodes in the devfs.
Problem noticed and fix tested by: Martin Blapp <mblapp@fuchur.lan.attic.ch>
remove_sd_entry() to:
Simplify (hopefully) it by moving all error returns closer to
the beginning of the function.
Return an error when "Error removing subdisk %s: not found in
plex %s\n" would have been reported, as I doubt that we are "OK"
after printing that error message.
Adding make_dev() and destroy_dev() calls in (hopefully) the right
places.
This is done by calling make_dev() in each object constructor and
caching the dev_t's returned from make_dev() in each struct
'subdisk'(sd), 'plex' and 'volume' such that the 'object'_free()
functioncs can call destroy dev.
This change makes a subset of the old /dev/vinum appear under devfs.
Enough nodes appear such that I'm able to mount my striped volume.
There may be more work needed to get vinum configuration working
properly.
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.
Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
striped plexes. This prevents various panics introduced in the last
rewrite of the locking code.
Suffered by: "Niels Chr. Bank-Pedersen" <ncbp@bank-pedersen.dk>
interrupt threads to run with it always >= 1, so that malloc can
detect M_WAITOK from "interrupt" context. This is also necessary
in order to context switch from sched_ithd() directly.
Reviewed By: peter
not yet been caught), don't save the config with a null drive
name (which causes the drive to be renamed "plex" on the next
start), put in the text "*invalid*" instead.
This is damage control, not a fix.
Experienced by: peter
Break some long format strings so that they fit in style(9)-sized
lines.
Remove some "outdentation".
Rewrite lockrange and unlockrange. The lock table is now a fixed
size, so there is no possibility for race conditions when expanding.
The current size (256 locked ranges) should be large enough that it
makes no sense to expand it. To do expansion right would require
quiescing the plex (requiring at least 256 I/O completions), and the
performance implications are horrendous.
Add a mutex per plex for accessing the lock table.
Based on analysis by: tegge
This should eliminate one case of foot shooting .
vinum_scandisk: If a drive in the partition table is downed, free it.
This duplicates code for the compatibility partition, which for some
reason was omitted here.
striped plexes.
Submitted by: des
Don't lock buffers before calls to sdio, sdio does it by itself.
Submitted by: tegge
parityops: Use correct casts when returning error information.
Requested by: Bernd Walter <ticso@cicely8.cicely.de>
Cor Bosman <cor@xs4all.net>
Kai Storbeck <kai@xs4all.net>
Joe Greco <jgreco@ns.sol.net>
Add support for Compaq SMART-2 RAID (idad) as storage
device for Vinum subdisks.
Reported by: Aaron Hill <hillaa@hotmail.com>
This makes crash recovery work for stripe sizes that are not multiples of
DEFAULT_REVIVE_BLOCKSIZE (currently 64 kB).
While we're here, fix a few cosmetic nits.
Reviewed by: grog
Sponsored by: Enitel ASA (http://www.enitel.no/)
1. Don't include <sys/conf.h> in userland. It is not used, and including it
without including its prerequisite <sys/time.h> should have broken the
world.
2. Don't include <sys/mount.h>. It is not used, except in -current it
bogusly includes <sys/stat.h> which bogusly includes <sys/time.h> and
thus accidentally provides the prerequisite in (1).
3. Cleaned up nearby include messes.
Not approved by despite 5 weeks notice: MAINTAINER
Add support for AMD RAID controllers as "disks".
Requested-by: Marius Bendiksen <mbendiks@eunet.no>
Remove potential panic when attempting to open non-existent drivers.
init_drive: Return error codes correctly. Previously it would
occasionally return 0. The error was redetected
elsewhere, but this was causing a number of confusing
error messages.
Fix several instances of breakage in RAID-5 revive code.
Tidy up code.
parityops:
Don't attempt to do anything if the plex is degraded or worse.
parityrebuild:
Add comments.
Perform transfers in correct length.