civilized way which doesn't cause grief.
The problem is that it is not generally safe to cast a "struct bio
*" to a "struct buf *". Things like ccd, vinum, ata-raid and GEOM
constructs bio's which are not entrails of a struct buf.
Also, curthread may or may not have anything to do with the I/O request
at hand.
The correct solution can either be to tag struct bio's with a
priority derived from the requesting threads nice and have disksort
act on this field, this wouldn't address the "silly-seek syndrome"
where two equal processes bang the diskheads from one edge to the
other of the disk repeatedly.
Alternatively, and probably better: a sleep should be introduced
either at the time the I/O is requested or at the time it is completed
where we can be sure to sleep in the right thread.
The sleep also needs to be in constant timeunits, 1/hz can be practicaly
any sub-second size, at high HZ the current code practically doesn't
do anything.
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.
Tested on: i386, alpha, sparc64
These functions use DEV_STRATEGY() which can easily return a short
count (with no error) for reads near EOF. EOF happens for "disks" too
small to contain a label sector (mainly for empty slices). The functions
didn't understand this at all, and looked for labels in the garbage
in the buffer beyond what DEV_STRATEGY() returned. The recent UMA
changes combined with my local changes and configuration resulted in
the garbage often containing a valid but garbage label left over from
a previous call.
Bugs in EOF handling in -current limited the problem to "disks" with
size precisely LABELSECTOR sectors. LABELSECTOR happens to be a very
unusual "disk" size since it is only 0 for non-i386 arches that don't
usually have disks with DOS MBRs.
When a positively niced process requests a disk I/O, make
it wait for its nice value of ticks before scheduling its
I/O request if there are any other processes with I/O
requests in the disk queue. For all the gory details, see
the ``Running fsck in the Background'' paper in the Usenix
BSDCon 2002 Conference Proceedings, pages 55-64.
dev_t. The dev_depends(dev_t, dev_t) function is for tying them
to each other.
When destroy_dev() is called on a dev_t, all dev_t's depending
on it will also be destroyed (depth first order).
Rewrite the make_dev_alias() to use this dependency facility.
kern/subr_disk.c:
Make the disk mini-layer use dependencies to make sure all
relevant dev_t's are removed when the disk disappears.
Make the disk mini-layer precreate some magic sub devices
which the disk/slice/label code expects to be there.
kern/subr_disklabel.c:
Remove some now unneeded variables.
kern/subr_diskmbr.c:
Remove some ancient, commented out code.
kern/subr_diskslice.c:
Minor cleanup. Use name from dev_t instead of dsname()
<sys/bio.h>.
<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.
Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.
Still a few bogus uses of struct buf to track down.
Repocopy by: peter
Exceptions:
Vinum untouched. This means that it cannot be compiled.
Greg Lehey is on the case.
CCD not converted yet, casts to struct buf (still safe)
atapi-cd casts to struct buf to examine B_PHYS
devstat_end_transaction_bio()
bioq_* versions of bufq_* incl bioqdisksort()
the corresponding "buf" versions will disappear when no longer used.
Move b_offset, b_data and b_bcount to struct bio.
Add BIO_FORMAT as a hack for fd.c etc.
We are now largely ready to start converting drivers to use struct
bio instead of struct buf.
(Much of this done by script)
Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.
Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.
Add bio_queue field for struct bio aware disksort.
Address a lot of stylistic issues brought up by bde.
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)
substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)
This patch is machine generated except for the ccd.c and buf.h parts.
field in struct buf: b_iocmd. The b_iocmd is enforced to have
exactly one bit set.
B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.
Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.
Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue. It is likely to write on your disk
where it should have been reading.
This change is a step in the direction towards a stackable BIO capability.
A lot of this patch were machine generated (Thanks to style(9) compliance!)
Vinum users: Greg has not had time to test this yet, be careful.
warnings caused by the arg having the wrong type (not const enough).
The arg was also wrong (a full name instead of a short one) for calls
from from subr_diskmbr.c and pc98/diskslice_machdep.c.
Introduce BUF_STRATEGY(struct buf *, int flag) macro, and use it throughout.
please see comment in sys/conf.h about the flag argument.
Remove strategy argument from all the diskslice/label/bad144
implementations, it should be found from the dev_t.
Remove bogus and unused strategy1 routines.
Remove open/close arguments from dssize(). Pick them up from dev_t.
Remove unused and unfinished setgeom support from diskslice/label/bad144 code.
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.
partition that the label ioctl is being done on just because it has
offset 0, since there is no guarantee that such a partition is large
enough to contain the label. Don't use the wrong raw partition (0
instead of RAW_PART).
This fixes problems rewriting bizarre labels (with a nonzero offset
for the 'a' partition) in newfs(8). Such labels shouldn't normally
be used, but creating them was allowed if the ioctl was done on the
raw partition, and sysinstall creates them if the root partition isn't
allocated first.
Note that allowing write access to a partition other than the one that
has been checked for write access doesn't increase security holes
significantly, since write access to any partition already allows
changing the in-core label.
This fix should be in 3.0R. Rev.1.26 of newfs/newfs.c shouldn't be
in 3.0R.
formats and args to match. Fixed old printf format errors (all related;
most were hidden by calling printf indirectly).
This change somehow avoids compiler bugs for 64-bit longs on i386's,
although it increases the number of 64-bit calculations.
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
the sd & od drivers. There is also slight changes to fdisk & newfs
in order to comply with different sectorsizes.
Currently sectors of size 512, 1024 & 2048 are supported, the only
restriction beeing in fdisk, which hunts for the sectorsize of
the device.
This is based on patches to od.c and the other system files by
John Gumb & Barry Scott, minor changes and the sd.c patches by
me.
There also exist some patches for the msdos filesys code, but I
havn't been able to test those (yet).
John Gumb (john@talisker.demon.co.uk)
Barry Scott (barry@scottb.demon.co.uk)
It is needed for implementation details but very little of it is
needed for the interface. Include it in the few places that didn't
already include it.
Include <sys/ioccom.h> in <sys/disklabel.h> (as already in
<sys/diskslice.h>) so that all the disk-related headers are almost
self-sufficient.
and B_READ before writing. This was was fatal. They also broke the
clearing of B_INVAL before doing i/o. This didn't actually matter.
Submitted by: mostly by joerg
bp->b_flags has been broken for many years:
a) they didn't set B_BUSY for doing i/o. This has been fatal since
1995/07/25 when biodone() started checking that B_BUSY is set.
b) they didn't set B_INVAL for releasing the buffer. This at best
just put a useless buffer in the LRU queue for a little while.
Fix a couple of spelling errors and complete a couple of function
pointer declarations.
disksort is called at non-interrupt time and can be actively traversing
the list when that happens, there is a very small window of vulnerability.
Close it by protecting disksort with splbio().