Put splbio protection around the main launch loop. We've seen cases where
the bottom half was cutting off the branch on which we're sitting.
Experienced-by: Michael Reifenberger <root@nihil.plaut.de>
maximum of 64 kB.
vinum_conf, struct drive: add fields for the current number of active
requests and the maximum ever active.
struct sd: Add fields for initialize progress.
struct mc: set the size of saved file names to MCFILENAMELEN instead
of the previous explicit constant.
limit the number of outstanding requests on a specific drive and
overall.
Change the way we set the active request count. This enables us to
start the requests without being in splbio for the duration, which
could be very long for IDE drives in PIO mode.
context. Be prepared to fail instead.
MMalloc, FFree: ensure that saved file names are properly terminated.
Use MCFILENAMELEN instead of the previous explicit constant.
and then doing it itself, resulting in a panic downed drives.
Sleuth-work-by: Christopher Masto <chris@netmonger.net>
Remove dummy function initsd, which is now (implemented) in
vinumrevive.c
vinum_scandisk: Check that a drive is up before reading from it. This
is probably excessive paranoia.
- Move intrhook stuff into kernel.h
- Remove all occurrences of #device <device.h>
- Add kernel.h were necessary (nowhere)
- delete device.h
This file contained the structures for cfdata (old style config) and is no
longer used. It was included by most drivers.
It confuses the remote debugger as the definition of 'struct device' in
device.h is found before the one in bus_private.h.
of parity check and rebuild operations. This enables us
to stop the operation and restart at a later time.
enum parityop: Trivial enum to decide what parityops() is going to do.
could stand.
Define the correct return lengths for a number of ioctls.
Add ioctls VINUM_CHECKPARITY and VINUM_RESETPARITY, still to be fully
implemented.
ourselves. This breaks a vicious circle which caused
vinum to dereference a null vp if device nodes were
missing.
Reported-by: Brad Chisholm <sasblc@unx.sas.com>
Alec Wolman <wolman@cs.washington.edu>
check_drive: Don't take a drive down if it's only referenced.
read_drive: Remove unused variable.
get_empty_volume: initialize plexes to -1 (not allocated)
remove_drive_entry:
Remove recurse parameter (there's nothing below a drive in the hierarchy).
Use remove_sd_entry to remove sds, don't do it ourselves.
Log errors, don't throw rude remarks.
remove_plex_entry:
Don't use plex->subdisks as a loop limit, it gets changed in the
loop. This caused some removals to only remove half the subdisks.
Change logging of some "impossible" situations.
remove_volume_entry:
Use remove_plex_entry to remove plexes, don't do it ourselves.
update_sd_config:
Use set_sd_state to do the work.
have been there in the first place. A GENERIC kernel shrinks almost 1k.
Add a slightly different safetybelt under nostop for tty drivers.
Add some missing FreeBSD tags
a quick think and discussion among various people some form of some of
these changes will probably be recommitted.
The reversion requested was requested by dg while discussions proceed.
PHK has indicated that he can live with this, and it has been agreed
that some form of some of these changes may return shortly after further
discussion.
the highly non-recommended option ALLOW_BDEV_ACCESS is used.
(bdev access is evil because you don't get write errors reported.)
Kill si_bsize_best before it kills Matt :-)
Use the specfs routines rather having cloned copies in devfs.
some swapper problems analogous to those experienced with ccd.
This fix is a kludge: since we currently don't track the "sector size"
in a volume label, we guess a worst case (4 kB, as used by vnode
devices). If the concept of sector size is here to stay, I'll make
some changes to track the "sector size" of a volume. This will
probably be the maximum of the sector sizes of all component drives,
but things could get ugly if we start allowing non-standard sector
sizes such as 524 bytes.
Unkludged-version-submitted-by: phk
lockrange: correctly expand rangelock struct, including expanding a
null struct. Previously lockrange would attempt to lock a
NULL pointer under these circumstances.
Reported-by: Ian Freislich <iang@uunet.co.za>
initialized subdisks.
Tidy up some comments.
Eliminate sddownstate(); it wasn't being used any more. Return
REQUEST_DOWN instead.
Add setstate_by_force() to implement the VINUM_SETSTATE_FORCE ioctl
for diddling individual object states. This is a repair tool which
can also be used for panicing the system. Use with utmost care if at
all.
avoids a race condition where multiple RAID-5 subdisks are being
revived at the same time. The locks should also prevent conflicts
with user requests on concatenated and striped plexes, but this needs
more work.
Tidy up some comments.
format_config: code preening.
vinum_scandisk: If we find a partition in the first pass over a drive,
note the fact so we don't grab the compatibility partition as well.
Submitted-by: peter
goes into initialized state, not 'up'. This makes it easier to ensure
consistency in multi-plex volumes.
update_plex_state: redo transitions from empty and initialized
subdisks to up or reviving, depending on the number of plexes.
Reported-by: Bernd Walter <ticso@cicely.de>
Remy Nonnenmacher <remy@synx.com>
Introduce BUF_STRATEGY(struct buf *, int flag) macro, and use it throughout.
please see comment in sys/conf.h about the flag argument.
Remove strategy argument from all the diskslice/label/bad144
implementations, it should be found from the dev_t.
Remove bogus and unused strategy1 routines.
Remove open/close arguments from dssize(). Pick them up from dev_t.
Remove unused and unfinished setgeom support from diskslice/label/bad144 code.
Don't return "can't do it" when the user requests a state change to
the current state. This previously caused silly messages like "Can't
start <foo>: invalid argument", when in fact <foo> was already
started.
set_plex_state: don't set state for non-existent plexes.
update_plex_status: as long as we have initializing subdisks, we're
initializing.
Move the declaration of freerq() to request.h.
logrq: add support for lock events.
vinumstart: solve a problem where removing a plex from an active
volume could cause attempts to access non-existent plexes.
launch_requests: don't set a request group active until we're sure we
can launch it. This caused some hangs under unusual
circumstances.
bre: don't set XFR_BAD_SUBDISK if we're not going to use it.
build_read_request: correct recovery, which caused some hangs under
(other) unusual circumstances.
build_rq_buffer: don't set bp->b_dev if we don't have a dev.
sdio: clean up, remove obsolete code.
deallocrqg: unlock any locks the rqg may have.
bre5:
Shorten some lines.
Desired-by: bde
If we're reading from a short plex, return EOF indication.
Always lock the stripe before starting a transfer. Hopefully the
current version will solve some data integrity problems that have
been reported with degraded RAID-5 plexes.
Reported-by: Bernd Walter <ticso@cicely.de>
Remy Nonnenmacher <remy@synx.com>
solve some data integrity problems that have been reported with
degraded RAID-5 plexes.
Reported-by: Bernd Walter <ticso@cicely.de>
Remy Nonnenmacher <remy@synx.com>
Tidy other comments.
open_drive: don't call set_drive_state if we decide to take it down.
This could help avoid some race conditions with the daemon.
init_drive: don't set the drive down, we'll let close_locked_drive do
that.
close_locked_drive: set drive state to down without calling
set_drive_state. This could help avoid some race conditions with the
daemon.
driveio: remove the function, it wasn't being used.
get_volume_label: remove volume dependencies so that we can return a
label for plexes and subdisks as well. What a kludge.
Remove declarations for freerq and free_rqg.
Remove DEBUG_RESID code.
freerq: check whether the request is holding a lock, free if so.
free_rqg: remove. It wasn't being used any more.
Change the Debugger calls to panics.
checkdiskconfig(): remove. It didn't make any sense to complain about
kernel keywords in user config files; it just made it more difficult
to convert. Now we ignore kernel keywords if we're not in kernel
mode.
get_empty_sd: initialize sectors.
free_drive: don't close if we don't have a vp. Maybe this will help
fix the problem that peter had, but I wouldn't count on it.
config_plex: If the plex is RAID-5, give it a rangelock structure.
start_config: Reset current drive, plex and volume so that a new
'create' command doesn't get long-dead defaults.
struct rqelement, enum rqinfo_type, struct rqinfo, union rqinfou: add
lock requests.
Add declarations for freerq and unlockrange. Since they include
request structures, they can't go in vinumext.h
- %q -> %ll.
Fixed nearby errors not reported by gcc -Wformat on i386's:
- don't assume that the promotion of [u_]int64_t is [u_]quad_t.
- don't use signed formats for unsigned args.
Add Cybernet copyright.
OK'd-by: Chuck Jacobus <chuck@cybernet.com>
update_plex_state:
If any subdisk in the plex is initializing, set the plex to
initializing state. This gets rid of the ugly corrupt/degraded/up
transitions which previously occurred.
Desired-by: Steve Taylor <staylor@cybernet.com>
sddownstate:
Add new function, used by checksdstate.
checksdstate:
Let sddownstate decide what status to return.
Add Cybernet copyright.
OK'd-by: Chuck Jacobus <chuck@cybernet.com>
logrq: save device major and minor numbers to compensate for lost
dev_t.
launch_requests: Don't issue requests which are marked
XFR_BAD_SUBDISK. This may make things easier in bre().
bre:
Rearrange.
- Change some comments
- Recognize holes in plex structure. Formerly this could lead to
incorrect write to the plex. Return REQUEST_DEGRADED on a read
request, but carry on to the bitter end on a write request, and
mark the requests for the inaccessible subdisks with
XFR_BAD_SUBDISK.
- return REQUEST_EOF if the requested transfer goes beyond the end
of the plex. This is not an error, since other plexes may go
further into the volume address space.
build_read_request:
Handle REQUEST_DEGRADED returned from bre().
sdio:
Lock buffer before issuing the requests.
will only accept partitions of type 'vinum'.
format_config: Use the new %q format option in kvprintf, thus getting
rid of some of the filthiest code I've written in a long time. Also
remove the lltoa() function.
With-great-thanks-to: peter
format_config: Accept the fact that a subdisk might not be attached to
a plex, and save the config correctly.
vinum_scandisk: Scan all slices on a drive with a Microsoft partition
table. Only look at the compatibility slice if nothing was found in
the Microsoft slices.
This change removes a frequently employed method of shooting
yourself in the foot: people would decide that the Vinum drives
belonged on their own slice, and they wouldn't be able to start the
subsystem after a reboot. Documentation updates to follow.
initialize subdisks. Probably the plex-related subdisk type will die
a death.
vinumconfig.c:
Accept (and ignore) kernel state information in userland config
files. This saves a lot of error recovery and also makes it possible
to use the output of printconfig to create new configuration.
Remove checkdiskconfig(). It wasn't needed any more.
Start adding support for hot spare drives. You can't put anything on
them (yet).
Change message formats from %lld to %qd.
get_empty_sd: Initialize size to -1. Previously this was done in
config_subdisk, which is the wrong place.
start_config: set current drive, plex and volume to -1, thus stopping
update configurations from taking their defaults from old configs.
the device numbers are now minor number only, so that we can still
compare them after dev_t has turned into a blob.
Broken-by: dev_t changes
Reported-by: Vallo Kallaste <vallo@matti.ee>
"Niels Chr. Bank-Pedersen" <ncbp@bank-pedersen.dk>
Correct race condition between caller and daemon.
Tripped-over-by: Zach Heilig <zach@uffdaonline.net>
Bernd Walter <ticso@cicely.de>
Niels Chr. Bank-Pedersen <ncbp@bank-pedersen.dk>
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.
If the drive goes down, queue a close to the daemon. In many cases
this function gets called in process context, so it could do it
directly, but it's more trouble finding out where we came from than
getting the daemon to do it.
Don't bzero the buffer structure, it's been done already by
allocrqg.
sdio:
Build up a correct buffer header, don't steal linkages from system
buffer headers.
Noticed-by: mckusick
fix; it doesn't address the problem of removing the module. If you do
the following:
vinum stop
fsck /dev/vinum/VOLUME
you *will* get a system crash. What we need is a cdevsw_remove
corresponding to cdevsw_add, but that hasn't been written yet.
Submitted-by: phk
Made a new (inline) function devsw(dev_t dev) and substituted it.
Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)
DEVFS will eventually benefit from this change too.
Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline)
function.
Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention
to the order of the cmaj/bmaj arguments!)
Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE
(ditto!)
(Next step will be to convert all bdev dev_t's to cdev dev_t's
before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)
If a drive has gone down and has dirty buffers associated with it,
we'll get a panic when we try to vn_close it. Check for this
situation and discard any buffers; they're toast anyway.
Only complain about usage count if DEBUG_WARNINGS is set.
check_drive:
Change parameter name from drivename to devicename.
Get the check for a referenced drive right.
If the partition isn't a vinum drive, set the last error to ENODEV.
vinum_scandisk:
Change parameter name from drivename [] to devicename [].
1:
s/suser/suser_xxx/
2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.
3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/
The remaining suser_xxx() calls will be scrutinized and dealt with
later.
There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.
More changes to the suser() API will come along with the "jail" code.
Fix a potential drive deadlock when saving config to a non-existent
drive.
Add debug calls to catch occasional deadlocks on drives. The problem
(above) is probably gone, but the debug checks remain for a while.
is probably gone, but the debug checks remain for a while.
update_plex_config: Catch yet another divide-by-zero problem when
detaching the last subdisk from a striped plex.
Uncovered-by: Michael Reifenberger <root@nihil.plaut.de>
Repeatedly-tripped-over-by: Vallo Kallaste <vallo@matti.ee>
When VINUMDEBUG is set, free any memory found still
allocated.
Only log errors if DEBUG_EXITFREE is set.
free_vinum: Wait for daemon to stop by checking the
vinum_conf.flags & VF_DAEMONOPEN.
vinum_modevent:
When compiled with VINUMDEBUG, check if we have
forgotten to free any memory, and log an error if we
have.
vinumopen: Allow open of an empty subdisk (otherwise we can't
initialize it).
plexes_used and volumes_used. Now these fields are only informative,
and the <object>_allocated count is used for searches, etc. This also
required checking the object state before doing things with the
presumed object.
Problems-reported-by: Kiril Mitev <kiril@ideaglobal.com>
VINUM_<object>CONFIG: return ENXIO rather than EFAULT if an object
doesn't exist.
plexes_used and volumes_used. Now these fields are only informative,
and the <object>_allocated count is used for searches, etc. This also
required checking the object state before doing things with the
presumed object.
Problems-reported-by: Kiril Mitev <kiril@ideaglobal.com>
vinum_scandisk: increment drive use count when we find a good one.
plexes_used and volumes_used. Now these fields are only informative,
and the <object>_allocated count is used for searches, etc. This also
required checking the object state before doing things with the
presumed object.
Problems-reported-by: Kiril Mitev <kiril@ideaglobal.com>
Remove unused (and braindead) functions volume_index, plex_index,
sd_index and drive_index.
Add a flag VF_CREATED for volumes. VF_NEWBORN was being used in two
capacities, and they clashed, my Lord, they clashed.
find_object: restructure the search loop as a result of the change in
variable use.
Decrement object use count in the remove_<object> functions, not in
the free_<object> functions, which are often called with partially
initialized (and uncounted) objects.
plexes_used and volumes_used. Now these fields are only informative,
and the <object>_allocated count is used for searches, etc. This also
required checking the object state before doing things with the
presumed object.
Problems-reported-by: Kiril Mitev <kiril@ideaglobal.com>
longjmp. I suspect that the occasional double panic may be the result
of incorrect parameters to longjmp. This happens, of course, like the
entire file, only with -DVINUMDEBUG.
give_sd_to_plex: Don't set Raid-5 subdisk state here.
config_subdisk: handle the name parameter correctly when the subdisk
was referenced in a previous plex definition. The
name parameter must come first.
Handle autosizing relatively correctly. There is
still a danger of losing drive space if problems
occur with an autosized subdisk.
Set state to empty, not up, when complete. This also
solves a nagging problem about enforcing the need to
initialize RAID-5 plexes.
config_plex: handle the name parameter correctly when the plex
was referenced in a previous volume definition. The
name parameter must come first.
Handle initial state better.
update_plex_config:
Calculate the trim factor for RAID-5 plexes correctly.
Set the number of down subdisks correctly when reading
from disk config.
remove the splbio() around the call to launch read requests.
launch_requests:
Move the splbio() protection outside the entire launch_loop. The
previous location was causing problems with IDE drives, where the
call to the strategy routine often did not complete until after
complete_rqe deallocated the request structure.
Solution-independently-found-by: Russell Neeper <r-neeper@tamu.edu>
Problem-reported-by: Vallo Kallaste <vallo@matti.ee>
John Saunders <john@nlc.net.au>
Bernd Walter <ticso@cicely.de> (maybe)
Check for partition types FS_VINUM and FS_UNUSED. Accept both, but
complain about FS_UNUSED. At a later date, only FS_VINUM will be
accepted.
Threatened-since: over a year
Add a flag `force' (VF_FORCECONFIG) to force name changes of
existing drives.
config_drive:
If the drive already has a vinum label, and name doesn't match the
specified drive, do it anyway if the 'force' flag is specified.
finish_config:
Reset the `force' flag.
Continually-tripped-over-by: Karl Pielorz <kpielorz@tdx.co.uk>
give_sd_to_drive:
If the drive is down, take the subdisk down and don't try to fix
things.
update_plex_config:
Don't try to update the config parameters of a plex which isn't
fully configured (state plex_init or plex_unallocated).
Correctly calculate the amount to trim off a striped or RAID-5 plex
whose size is not a multiple of the stripe size.
compiled with or without debugging support. This enables us to catch
(fatal) mismatches between the kernel and userland.
Coalesce flags VINUM_DISKCONFIG and VINUM_READING_CONFIG. They did
essentially the same thing.
Add VINUM_BIGDRIVE for pretending we have macho hardware.
pretends that each drive is 100 times as large as it really is. Not
for use at home.
Coalesce flags VINUM_DISKCONFIG and VINUM_READING_CONFIG. They did
essentially the same thing.
This solved a problem where 'vinum resetconfig' only reset half
the drives.
Reported-by: Brad Knowles <blk@skynet.be>
Karl Pielorz <kpielorz@tdx.co.uk>
Change the super device. We now have three super devices:
1. The normal superdevice used by vinum(8).
2. The superdevice used by vinum(8) when compiled with debug support.
3. The superdevice used by the daemon.
This method allows vinum(8) to determine debug mismatches. Also check
correctly for the device type. The old code did not check all bits of
the minor number.
Reported-by: a cast of thousands, most recently by Brad Knowles
<blk@skynet.be>.
MMalloc: save the time at which the request was granted, remove more
crud.
FFree: add a circular buffer of the last 64 Free requests if
DEBUG_MEMFREE is set.
after the volume had been fully operational; involves a change in the
use of the VF_NEWBORN flag. Now if you add a plex to a volume which
is up, the plex will be down and the subdisks stale. You need to
explicitly start the subdisks, which copies data from the good
subdisks to the uninitialized ones.
Stumbled-over-by: Ludwig Pummer <ludwigp@bigfoot.com>
give_sd_to_drive:
correct method to give the entire largest chunk of drive to the
subdisk. Now it's enough to specify a length, and vinum will give
you as much as it can. Not to be recommended except for empty
drives.
Correct a bogon which made vinum refuse to give the last sector of
a drive to a subdisk.
Last-reported-by: Ludwig Pummer <ludwigp@bigfoot.com>
Change %q formats to %ll before the former go away. This doesn't make
much difference, since kernel kvprintf currently doesn't support
either, and the messages in question are just error messages.
Change VINUM_SAVECONFIG: it now requires a parameter. 0 means
"configuration updates are finished, please save", and 1 means "please
just save the config". This second meaning is invoked by the new
"saveconfig" command to vinum(8).
Recognize "referenced" drives by the lack of a slash in the device
name, not by a NUL character.
vinum_scandisk: return error indication (ENOENT if we can't find any
vinum drive, otherwise 0).
VINUM_SAVECONFIG: change parameters.
Don't save config while we're reading it from disk.
Change the way we handle the daemon: if we can't communicate with it
for 1 second (which is possible), start a new one. The daemon saves
its pid in daemonpid; on each iteration of the main loop the daemon
checks whether it's still in favour. If not, it silently exits.
Also, when trying to communicate with the daemon, check daemonpid
first. If it's set to 0, don't even try.
Rename the VF_KERNELOP to VF_DISKCONFIG and checkkernel () to
checkdiskconfig (), which better describes their function.
Disable configuration updates if we have an error reading in the
configuration. This stops a "shoot-in-foot" problem where a mistake
can cause the configuration to be obliterated.
Tidy up some messages, which included superfluous \ns.
Recognize RAID-5 configuration information even in the non-RAID-5
version. This fixes shoot-in-foot problems where starting the wrong
version of vinum would kill RAID-5 plexes.
Recognize drives that have been referenced, but for which no physical
location is known. This is part of a modification which will
ultimately allow incrementally reading configurations. Such drives
will have a device name "unknown".
New function return_drive_space () returns space to a drive.
Previously this was part of free_sd ().
give_sd_to_drive: don't do it if the subdisk needs more space than the
drive has available.
config_sd: if reading config from disk, accept plex offset, drive
offset and length specs of -1 to indicate error conditions.
parse_config: return ENOENT if the "read" command doesn't find any
drives.
remove_sd_entry: don't do it, even by force, if it's open.
If the size of a striped or RAID-5 plex is not an integral multiple of
the stripe size, trim the size until it is.
reinstate update_volume_config, which had atrophied, to recalculate
the size of a volume if a plex has shrunk due to stripe size
considerations.
vinumattach: Zero out tables after allocating them
Modify procedure at unload: if a vinum(8) has the superdev open, don't
close down. If only the daemon has it open, send the daemon a stop
request and wait for it to close the superdev, then unload.
In order to do this, create a second superdev which is opened by the
daemon. The open and close routines set a different bit in
vinum_conf.flags; otherwise the treatment is identical.
Remove opencount field in vol structure; replace by a flag bit, since
we can't count the number of opens.
Remove dead LKM grunge.