instead of %llx when %j is available).
Changed nearby output formats from %x to %#x so that it is obvious that the
numbers are in hex (vinum mostly uses 0x%x elsewhere).
Didn't fix nearby format printf errors (long lines).
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.
Tested on: i386, alpha, sparc64
DIOCGMEDIASIZE instead.
The partition type check has been XXX'ed out since we cannot expect
that a BSD disklabel with a type field be available on all platforms.
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.
Discussed on: smp@
the bio and buffer structures to have daddr64_t bio_pblkno,
b_blkno, and b_lblkno fields which allows access to disks
larger than a Terabyte in size. This change also requires
that the VOP_BMAP vnode operation accept and return daddr64_t
blocks. This delta should not affect system operation in
any way. It merely sets up the necessary interfaces to allow
the development of disk drivers that work with these larger
disk block addresses. It also allows for the development of
UFS2 which will use 64-bit block addresses.
apply to this file. The correct message is:
throw_rude_remark: Make sure we're holding the config lock before
proceeding. There's no reason to assume that this
has ever happened, but the alternative might be a
double fault.
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
In file included from ../../../dev/vinum/vinumhdr.h:77,
from ../../../dev/vinum/vinum.c:44:
../../../dev/vinum/vinumext.h:165: warning: redundant redeclaration of `setjmp' in same scope
../../../sys/systm.h:96: warning: previous declaration of `setjmp'
../../../dev/vinum/vinummemory.c:44: warning: redundant redeclaration of `longjmp' in same scope
../../../sys/systm.h:97: warning: previous declaration of `longjmp'
vinumhdr.h:80: warning: redundant redeclaration of `vinum_cdevsw'
vinumext.h:239: warning: previous declaration of `vinum_cdevsw'
in each of the following files:
vinum.c, vinumconfig.c, vinumdaemon.c, vinuminterrupt.c, vinumio.c,
vinumioctl.c, vinumlock.c, vinummemory.c, vinumraid5.c, vinumrequest.c,
vinumrevive.c, vinumstate.c, vinumutil.c
requiring fewer header files for userland programs.
Remove the gross debug device/non-debug device hack used to recognize
whether the kernel module was in sync with the userland module.
compiled with debug support. This can be used by userland programs to
recognize which ioctls the module supports.
As a result, remove the gross debug device/non-debug device hack used
to recognize whether the kernel module was in sync with the userland
module.
Replace explicit references to major/minor numbers of vinum
superdevice with the VINUM_SUPERDEV macro written for that purpose.
gets incremented every time the kernel-userland interface changes.
This enables vinum(8) to check for the correct kernel version and to
produce a useful message if it doesn't match.
Requested by: Too many to count.
Move the definitions of struct drive, sd, plex and volume to
vinumobj.h.
Add a new debug flag, DEBUG_LOCKREQS, which logs only lock requests.
with more than one plex, the data will be accessed
multiple times. During this time, userland code could
potentially modify the buffer, thus causing data
corruption. In the case of a multi-plexed volume this
might be cosmetic, but in the case of a RAID-[45] plex it
can cause severe data corruption which only becomes
evident after a drive failure. Avoid this situation by
making a copy of the data buffer before using it.
Note that this solution does not guarantee any particular
content of the buffer, just that it remains unchanged for
the duration of the request.
Suggested by: alfred
Use this instead of DEBUG_LASTREQS to decide whether to log lock
requests.
MFS:
vinumlock: Catch a potential race condition where one process is
waiting for a lock, and between the time it is woken and
it retries the lock, another process gets it and places it
in the first entry in the table.
This problem has not been observed, but it's possible, and
it's easy enough to fix.
Submitted by: tegge
vinumunlock: Catch a real bug capable of hanging a system. When
releasing a lock, vinumunlock() called wakeup_one. This
caused wakeups to sometimes get lost. After due
consideration, we think that this is due to the fact that
you can't guarantee that some other process is also
waiting on the same address. This makes wakeup_one a
very dangerous function to use.
Requested by: bde
Add retryerrors keyword.
vinum_scandisk: Print a different message if an inadvertent start
command did not find any additional drives. The previous message "no
drives found" confused and worried many people.
MFS:
vinum_open: Recognize Mylex devices as storage devices.
In case of error, check the VF_RETRYERRORS flag in the subdisk and
don't take the subdisk down if it's set, just retry the I/O.
Requested by: peter
If the buffer has been copied (XFR_COPYBUF), release the copied
buffer when the I/O completes.
Suggested by: alfred
Desired by: bde
This commit is the first of a general cleanup of the header files..
It won't be enough to make bde happy.
Move debug definitions from vinumhdr.h.
Create a new struct rangelockinfo. In revision 1.21 of vinumlock.c,
the plex info was removed from struct rangelock, since it wasn't
needed there. It *is* needed for trace information, however, so use
struct rangelockinfo for that.
userland tool:
Use the vfs.devfs.generation sysctl to test for devfs presense
(thanks phk!) when devfs is active it will not try to create the
device nodes in /dev and therefore will not complain about the
failure to do so.
Revert the change in the #define for VINUM_DIR in the kernel
header so that vinum can find its device nodes.
Replace perror() with vinum_perror() to print file/line when
DEVBUG is defined (not defined by default).
kernel:
Don't use the #define names for the "superdev" creation since
they will be prepended by "/dev/" (based on VINUM_DIR), instead
use string constants.
Create both debug and non-debug "superdev" nodes in the devfs.
Problem noticed and fix tested by: Martin Blapp <mblapp@fuchur.lan.attic.ch>
remove_sd_entry() to:
Simplify (hopefully) it by moving all error returns closer to
the beginning of the function.
Return an error when "Error removing subdisk %s: not found in
plex %s\n" would have been reported, as I doubt that we are "OK"
after printing that error message.
Adding make_dev() and destroy_dev() calls in (hopefully) the right
places.
This is done by calling make_dev() in each object constructor and
caching the dev_t's returned from make_dev() in each struct
'subdisk'(sd), 'plex' and 'volume' such that the 'object'_free()
functioncs can call destroy dev.
This change makes a subset of the old /dev/vinum appear under devfs.
Enough nodes appear such that I'm able to mount my striped volume.
There may be more work needed to get vinum configuration working
properly.
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.
Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
striped plexes. This prevents various panics introduced in the last
rewrite of the locking code.
Suffered by: "Niels Chr. Bank-Pedersen" <ncbp@bank-pedersen.dk>
interrupt threads to run with it always >= 1, so that malloc can
detect M_WAITOK from "interrupt" context. This is also necessary
in order to context switch from sched_ithd() directly.
Reviewed By: peter
not yet been caught), don't save the config with a null drive
name (which causes the drive to be renamed "plex" on the next
start), put in the text "*invalid*" instead.
This is damage control, not a fix.
Experienced by: peter
Break some long format strings so that they fit in style(9)-sized
lines.
Remove some "outdentation".
Rewrite lockrange and unlockrange. The lock table is now a fixed
size, so there is no possibility for race conditions when expanding.
The current size (256 locked ranges) should be large enough that it
makes no sense to expand it. To do expansion right would require
quiescing the plex (requiring at least 256 I/O completions), and the
performance implications are horrendous.
Add a mutex per plex for accessing the lock table.
Based on analysis by: tegge
This should eliminate one case of foot shooting .
vinum_scandisk: If a drive in the partition table is downed, free it.
This duplicates code for the compatibility partition, which for some
reason was omitted here.
striped plexes.
Submitted by: des
Don't lock buffers before calls to sdio, sdio does it by itself.
Submitted by: tegge
parityops: Use correct casts when returning error information.
Requested by: Bernd Walter <ticso@cicely8.cicely.de>
Cor Bosman <cor@xs4all.net>
Kai Storbeck <kai@xs4all.net>
Joe Greco <jgreco@ns.sol.net>
Add support for Compaq SMART-2 RAID (idad) as storage
device for Vinum subdisks.
Reported by: Aaron Hill <hillaa@hotmail.com>
This makes crash recovery work for stripe sizes that are not multiples of
DEFAULT_REVIVE_BLOCKSIZE (currently 64 kB).
While we're here, fix a few cosmetic nits.
Reviewed by: grog
Sponsored by: Enitel ASA (http://www.enitel.no/)
1. Don't include <sys/conf.h> in userland. It is not used, and including it
without including its prerequisite <sys/time.h> should have broken the
world.
2. Don't include <sys/mount.h>. It is not used, except in -current it
bogusly includes <sys/stat.h> which bogusly includes <sys/time.h> and
thus accidentally provides the prerequisite in (1).
3. Cleaned up nearby include messes.
Not approved by despite 5 weeks notice: MAINTAINER
Add support for AMD RAID controllers as "disks".
Requested-by: Marius Bendiksen <mbendiks@eunet.no>
Remove potential panic when attempting to open non-existent drivers.
init_drive: Return error codes correctly. Previously it would
occasionally return 0. The error was redetected
elsewhere, but this was causing a number of confusing
error messages.
Fix several instances of breakage in RAID-5 revive code.
Tidy up code.
parityops:
Don't attempt to do anything if the plex is degraded or worse.
parityrebuild:
Add comments.
Perform transfers in correct length.
not gone yet.
format_config: print correct text when a volume has a preferred plex.
This is still broken, but not quite as badly.
Reported-by: Phil Regnauld <regnauld@ftf.net>
Change a rather silly comment.
revive_block: Correct bug introduced in revision 1.25 which caused
Add fields to vinum_ioctl_msgexcessive concurrent requests followed by
system death.
<sys/bio.h>.
<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.
Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.
Still a few bogus uses of struct buf to track down.
Repocopy by: peter
which seems to correspond better with what a busy plex needs. This
may also help us avoid race conditions when expanding the table which
may have been contributing to the random corruption, panics and hangs
we've been seeing in RAID-5 plexes, particularly with ata drives.
Eagerly-awaited-by: sos
Get counting volume I/Os right.
launch_requests: Be macho, throw away the safety net and walk the
tightrope with no splbio().
Add some comments explaining the smoke and mirrors.
Remove some redundant braces.
sdio: Set the state of an accessed but down subdisk correctly. This
appears to duplicate an earlier commit that I hadn't seen.
Get counting volume I/Os right.
Count buffer sizes correctly for architectures where ints are not 32 bits.
complete_rqe: Move decrementing active count until after call to
complete_raid5_write, thus possibly avoiding a race condition.
Suggested-by: dillon
Rename user bp to ubp to avoid confusion.
Tidy up comments.
request could be deallocated before the top half had finished
issuing it. The problem seems only to happen with IDE drives
and vn devices, but theoretically it could happen with any
drive. This is the most important part of a possible series
of fixes designed to remove race conditions without locking
out interrupts for longer than absolutely necessary.
Reported-by: sos
Fix-supplied-by: dillon
(Much of this done by script)
Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.
Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.
Add bio_queue field for struct bio aware disksort.
Address a lot of stylistic issues brought up by bde.
set properly in the struct buf with vinum:
Fix locations where B_READ was cleared in the old code but
b.b_iocmd wasn't set to BIO_WRITE
Fix propogation of b_iocmd
Correct comments to reflect reality
Don't compare b_flags with BIO_READ, it's in b_iocmd.
Submitted by: Bernd Walter <ticso@cicely.de>
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)
substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)
This patch is machine generated except for the ccd.c and buf.h parts.
field in struct buf: b_iocmd. The b_iocmd is enforced to have
exactly one bit set.
B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.
Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.
Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue. It is likely to write on your disk
where it should have been reading.
This change is a step in the direction towards a stackable BIO capability.
A lot of this patch were machine generated (Thanks to style(9) compliance!)
Vinum users: Greg has not had time to test this yet, be careful.
transferred, do it in complete_rqe instead.
launch_requests: Replace the inadvertently removed splbio() around the
main loop. It may not be necessary, but the biggest
test of this stuff are IDE disks, which I'm not
using.
Remove throttling code, I'm pretty sure it's not
needed any more.
Don't set B_ORDERED, it's not necessary either.
Objected-to-by: alfred
build_rq_buffer: Don't lose the B_ORDERED bit, it still has some
residual meaning. To do this right, Vinum needs to
look at the B_ORDERED bit and order the transfer
across all disks involved. That's an exercise for
another day.
Objected-to-by: alfred
Implicitly-sanctioned-by: jkh
VINUM_BDEV_MAJOR and VINUM_CDEV_MAJOR respectively.
Set DRIVE_MAXACTIVE and VINUM_MAXACTIVE to 30000, effectively
disabling the request limitation code. This code was added as an
attempt to escape from a bug which seems to have gone away, and it's
very likely I'll remove the code Real Soon Now, but I don't want to do
it just yet.
struct drive: Remove references to vnode pointers, including debug
output. Vinum now talks directly to the device driver. Instead, add
a dev_t.
enum plexorg: Add an instance for RAID-4.
Change checks for striped or RAID-5 plexes to a macro 'isstriped',
which now also includes RAID-4.
Change checks for RAID-5 plexes to a macro 'isparity', which now also
includes RAID-4.
Approved-by: jkh
set_sd_state: update the state of a subdisk in a multi-plex volume
more correctly.
update_plex_state: Bring the plex up correctly when the last subdisk
comes up.
checksdstate: Update comments.
vpstate: Don't return an "up" state on a degraded, unattached plex.
start_object: Return a sensible error message when trying to revive a
subdisk whose drive is down. Previously it returned EBUSY.
Approved-by: jkh
data corruption. It's a wonder it worked at all.
Led-on-the-right-path-by: dillon
revive_block: Add treatment for RAID-4.
Add function parityrebuild, called by revive_block and parityops.
Approved-by: jkh
the tsleep call flags.
Submitted-by: Bernd Walter <ticso@cicely.de>
Remove references to vnode pointers, including debug output. Vinum
now talks directly to the device driver.
bre: Add case for RAID-4.
sdio: Don't try to write to a down drive. Set the sd state instead.
Approved-by: jkh
vn_open. This is necessary in order to be able to open drives before
the root file system is mounted. This also involves restructuring the
drive struct, which no longer contains a vnode pointer. Instead,
open_drive sets an open flag. It's a horrible kludge, and I'll gladly
borrow a Danish axe and hack it in little pieces when devfs comes.
read_drive, write_drive, drive_io_done: Replace with driveio. The
function names are now macros.
driveio: Fix horrible, embarrassing breakage which was the reason why
read_drive and write_drive existed in the first place.
Code-torn-to-shreds-by: dillon
format_config: Don't save config of objects in referenced state. They
get rebuilt automatically.
Change checks for striped or RAID-5 plexes to a macro 'isstriped',
which now also includes RAID-4.
Change checks for RAID-5 plexes to a macro 'isparity', which now also
includes RAID-4.
Replace the preprocessor variable names BDEV_MAJOR and CDEV_MAJOR with
VINUM_BDEV_MAJOR and VINUM_CDEV_MAJOR respectively.
vinum_scandisk: Don't free memory twice on error, once is enough.
Approved-by: jkh
to RAID-5. peter claims that it might be faster for sequential
reading, since the drive caches don't trip over the parity blocks. I
have seen no evidence to support this, but it's a trivial change.
Requested-by: peter
Change checks for striped or RAID-5 plexes to a macro 'isstriped',
which now also includes RAID-4.
Change checks for RAID-5 plexes to a macro 'isparity', which now also
includes RAID-4.
atoi(): Remove, nobody was talking to it.
give_sd_to_drive: If no space is available, make the subdisk down,
don't delete it.
Change the manner in which the subdisk count was maintained to avoid
cases where the count was not adjusted correctly.
config_drive: Check if we have subdisks referencing us, and add them
if so. This fixes problems which arose when a drive is replaced in a
running system.
config_sd: Add support for a keyword 'partition', whose meaning will
be revealed in the fullness of time.
Cosmetic: Shorten some console messages.
Approved-by: jkh
to SI_SUB_VINUM, thus making it possible for Vinum to access I/O
devices and start.
Replace the preprocessor variable names BDEV_MAJOR and CDEV_MAJOR with
VINUM_BDEV_MAJOR and VINUM_CDEV_MAJOR respectively.
Style fixes: replace NULL with 0 where appropriate.
Submitted-by: Charlie Root <root@sms-1.follo.net> (yup, that's all I
have to go on).
Approved-by: jkh