For certain combinations of sectorsize, mediasize and random numbers
(used to define the mapping), a multisector read or write would ignore
some subset of the sectors past the first sector in the request because
those sectors would be mapped past the end of the parent device, and
normal "end of media" truncation would zap that part of the request.
Rev 1.19+1.20 of g_bde_work.c added the check which should have alerted
me to this happening. This commit maps the request correctly and
adds KASSERTS to make sure things stay inside the parent device.
This does not change the on-disk layout of GBDE, there is no need to
backup/restore.
it wrote the full length. The only case where this should be able
to happen is if we try to read/write past the end and the request
is truncated. We obviously should never try to do that, so this
code should never activate.
our key-sector, we would end up returning the read without an error,
despite the fact that the data was not correctly decrypted.
This would result in data corruption on read, but intact data still
on the media.
Give up the entire bio as soon as we detect a problem.
When we detect a problem, give up the bio by contributing the
remainder with ENOMEM, rather than kicking the bio back right
away.
If we failed on a non-first iteration we previously could end up
modifying fields in the bio after we delivered it. This could
account for memory corruption (none directly reported) on machines
with GBDE.
held.
The only place where we want to not hold topology is when we read
(or write) the label to disk: in the case of a disk error with a
long recovery time, holding topology would prevent open/close of
any disk device.
This means that you can no longer trash your opened partitions by writing to
the sunlabel through another partition. This is similar to the semantics
implemented for BSD labels.
test is built to test GEOM as running in the kernel.
This commit is basically "unifdef -D_KERNEL" to remove the mainly #include
related code to support the userland-harness.
to have in your kernel since it indiscriminately attaches to anything
it is offered with a range of bogus partitions.
Stop this from happening by rejecting any label with negative numbers in
it.
event posting functions varargs to fill these.
Attribute g_call_me() to appropriate g_geom's where necessary.
Add a flag argument to g_call_me() methods which will be used to signal
cancellation of events in the future.
This commit should be a no-op.
KASSERT the race between close and strategy, it is an error in the upper
echelons if this happens,
Add XXX: comment explaining why the ioctl/orphan race is not closed.
Retain the mistake of not updating the devstat API for now.
Spell bioq_disksort() consistently with the remaining bioq_*().
#include <geom/geom_disk.h> where this is more appropriate.
parts of it.
[*] I've been asked what "OAM" means: It's an acronym used in the
telecom industry, "Operations And Maintenance", and there it covers
anything from a single unlabeled led on the frontpanel the the full
nightmare of CMIP for SS7.
have to examine the stats structure to tell if we have outstanding I/O
requests.
Making them u_int improves the chance of atomic updates to them,
but risks roll-over. Since the only interesting property is if
they are equal or not, this is not an issue.
outstanding requests to return before we unravel the mesh.
It is very important that the stuff below us plays nice and don't
overlook a couple of outstanding bio's, because until they remember
the geom event thread is blocked. At an expense in code here this
could be made more robust, but I actually _want_ a robust failure
in this case so any offending drivers can be fixed.
in geom_disk.c.
As a side effect this makes a lot of #include <sys/devicestat.h>
lines not needed and some biofinish() calls can be reduced to
biodone() again.
memory-allocation purposes. Right now it is also a very good idea
because we hit a Giant assertion in the free(9) processing if we
free something larger than 64k.
branches:
Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.
This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.
Approved by: re(scottl)
- Get rid of the useless atop() / pmap_phys_address() detour. The
device mmap handlers must now give back the physical address
without atop()'ing it.
- Don't borrow the physical address of the mapping in the returned
int. Now we properly pass a vm_offset_t * and expect it to be
filled by the mmap handler when the mapping was successful. The
mmap handler must now return 0 when successful, any other value
is considered as an error. Previously, returning -1 was the only
way to fail. This change thus accidentally fixes some devices
which were bogusly returning errno constants which would have been
considered as addresses by the device pager.
- Garbage collect the poorly named pmap_phys_address() now that it's
no longer used.
- Convert all the d_mmap_t consumers to the new API.
I'm still not sure wheter we need a __FreeBSD_version bump for this,
since and we didn't guarantee API/ABI stability until 5.1-RELEASE.
Discussed with: alc, phk, jake
Reviewed by: peter
Compile-tested on: LINT (i386), GENERIC (alpha and sparc64)
Runtime-tested on: i386
Retire the "d_dump_t" and use the "dumper_t" type instead.
Dumper_t takes a void * as first arg which is more general than the
dev_t taken by d_dump_t. (Remember: we could have net-dumpers if
somebody wrote us one!)
Define the convention for GEOM controlled disk devices to be that the
first argument to the dumper function is the struct disk pointer.
Change device drivers accordingly.
Change the argument to disk_destroy() to be the same struct disk * as
disk_create() takes.
This enables drivers to ignore the (now) bogus dev_t which disk_create()
returns.
lower extremities.
Setting bit 4 in debugflags (sysctl kern.geom.debugflags=16) will
allow any open to succeed on rank#1 providers. This will generally
correspond to the physical disk devices: ad0, da0, md0 etc.
This fundamentally violates the mechanics of GEOMs autoconfiguration,
and is only provided as a debugging facility, so obviously error
reports on GEOM where this bit is or has been set will not be
accepted.
disk I/O processing.
The intent is that the disk driver in its hardware interrupt
routine will simply schedule the bio on the task queue with
a routine to finish off whatever needs done.
The g_up thread will then schedule this routine, the likely
outcome of which is a biodone() which queues the bio on
g_up's regular queue where it will be picked up and processed.
Compared to the using the regular taskqueue, this saves one
contextswitch.
Change our scheduling of the g_up and g_down queues to be water-tight,
at the cost of breaking the userland regression test-shims.
Input and ideas from: scottl
Cut up requests into smaller bits if they are longer than the drivers
disk->d_maxsize or dev->si_iosize_max.
Properly handle the race condition when using g_clone_bio() is used
without having the single-threadedness of g_down/g_up secure locking.
and d_stripesisze;
Introduce si_stripesize and si_stripeoffset in struct cdev so we
can make the visible to clustering code.
Add stripesize and stripeoffset to providers.
DTRT with stripesize and stripeoffset in various places in GEOM.
idle time.
Statistics now default to "on" and can be turned off with
sysctl kern.geom.collectstats=0
Performance impact of statistics collection is on the order of
800 nsec per consumer/provider set on a 700MHz Athlon.
Insted of embedding a struct g_stat in consumers and providers, merely
include a pointer.
Remove a couple of <sys/time.h> includes now unneeded.
Add a special allocator for struct g_stat. This allocator will allocate
entire pages and hand out g_stat functions from there. The "id" field
indicates free/used status.
Add "/dev/geom.stats" device driver whic exports the pages from the
allocator to userland with mmap(2) in read-only mode.
This mmap(2) interface should be considered a non-public interface and
the functions in libgeom (not yet committed) should be used to access
the statistics data.
Add debug.sizeof.g_stat sysctl.
Set the id field of the g_stat when we create consumers and providers.
Remove biocount from consumer, we will use the counters in the g_stat
structure instead. Replace one field which will need to be atomically
manipulated with two fields which will not (stat.nop and stat.nend).
Change add companion field to bio_children: bio_inbed for the exact
same reason.
Don't output the biocount in the confdot output.
Fix KASSERT in g_io_request().
Add sysctl kern.geom.collectstats defaulting to off.
Collect the following raw statistics conditioned on this sysctl:
for each consumer and provider {
total number of operations started.
total number of operations completed.
time last operation completed.
sum of idle-time.
for each of BIO_READ, BIO_WRITE and BIO_DELETE {
number of operations completed.
number of bytes completed.
number of ENOMEM errors.
number of other errors.
sum of transaction time.
}
}
API for getting hold of these statistics data not included yet.
We may actually be increasing Giant contention doing so because the
actual stuff we do is very cheap.
Also I am not convinced there is not a tiny window for a race here.
Change the si_name of dev_t's to be a char * and put a private buffer for
holding the name at then end of the struct.
Initialize si_name to point to the private buffer.
Put a KASSERT in geom_disk to prevent overrun on the fake dev_t we still
have to generate for the disk_drivers.
this will cause volume labels to be exposed in /dev/vol/<volname>. Currently,
there is no conflict resolution if more than one FS has the same volume name.
Reviewed by: phk
Make passing the methods in a cdevsw structure optional.
Move "CANFREE" and "NOGIANT" flags into struct disk instead of the
cdevsw which may or may not be there.
Rename CANFREE to CANDELETE to match BIO_DELETE operation.
Add "OPEN" flag so drivers don't have to provide open/close methods
just to maintain such a flag.
Add temporary stopgap include of <sys/conf.h> to <sys/disk.h> until
the files which have them in the other order are fixed.
Add KASSERTS to make sure we don't get fed too many NULL pointers.
Clear our geom's softc pointer before we wither.
labeled disk.
This is complicated by the fact that BBSIZE is greater than the
PAGE_SIZE limit ioctl inflicts on arguments which are automatically
copied in.
As long as we don't need access to userland memory (copyin/out) we
can deal with the ioctl using g_callme() which executes it from the
GEOM event thread.
Once we need copyin/out, we need to return the bio with EDIRIOCTL
in order to make geom_dev call us back in the original process context
where copyin will work.
Unfortunately, that results in us getting called with Giant, so
we have to DROP_GIANT/PICKUP_GIANT around the code where we diddle
GEOMs internals.
Sometimes you just can't win...
... But it does make geom_bsd.c an almost complete example of the
GEOM beastiarium.
CAUTION:
Previously CCD would be different from all other disks in
the system in that there were no "ccd0" device, only a
"ccd0c" device.
This is no longer so after this commit. If you access a
ccd device through the "/dev/ccd0c" device _and_ have not
actually put a BSD disklabel on the device, you will have
to use the name "/dev/ccd0". If your CCD device contains
a BSD disklabel there should be no difference.
You need to recompile ccdconfig(8) using the changed
src/sys/sys/ccdvar.h for the -g "show me" option to work.
I have run the regression test I created before I started
overhauling CCD and it flags no problems, but this code
is mildly evil, so take care. If you would cry if you lost
what's on CCD, make a back before you upgrade.
Create separate cdevsw for the /dev/ccd.ctl device.
Remove the cloning function, the disk-minilayer will do all naming
for us.
Remove the ccdunit and ccdpart functions and carry the softc pointer
in the relevant dev_t's and structures.
Release all memory when a CCD device is unconfigured, previously
the softc would linger behind.
Remove all traces of BSD disklabel fiddling code.
Remove ccdpsize, the disk mini-layer does this for us.
Don't allocate memory with M_WAITOK in ccdstrategy().
Remove boundary checks which the disk mini-layer does for us.
Don't allocate space for more than 2 ccdbuf, RAID was never implemented.
NB: I have not tried to address any of the preexisting ailments of CCD.
and can be added back selectively, should anybody start to interest
themselves for the internal workings of ccd.
This commit will make the diffs for the following commits much more
readable.
the three configuration ioctls which need a unit number.
Add a "ccd.ctl" device for config operations.
Implement ioctls on ccd.ctl which rely on the explicityly passed
unit numbers.
Update ccdconfig to use the new ccd.ctl interface.
Add code to the kernel to detect old ccdconfig binaries, and whine
about it.
Add code to ccdconfig to detect old kernels, and whine about it.
These two compatibility measures will be retained only for a limited
period since they are in the way of GEOM'ification of ccd.
ioctls are no reliable indication of the ioctls "set" or "get" nature or if
such simplistic categories can even be applied.
MFC candidate: boot0cfg issue.
some trick is necessary to prevent further BSD geoms from attaching to
that. Our old trick was to make sure we don't attach to a geom from
the "BSD" class, but this doesn't work if an intermediary geom obscures
this fact. Instead, calculate the MD5 checksum of the label we target
and ask if anybody below us loves that label. If they do we don't.
Coded by: gordon.