have to examine the stats structure to tell if we have outstanding I/O
requests.
Making them u_int improves the chance of atomic updates to them,
but risks roll-over. Since the only interesting property is if
they are equal or not, this is not an issue.
outstanding requests to return before we unravel the mesh.
It is very important that the stuff below us plays nice and don't
overlook a couple of outstanding bio's, because until they remember
the geom event thread is blocked. At an expense in code here this
could be made more robust, but I actually _want_ a robust failure
in this case so any offending drivers can be fixed.
in geom_disk.c.
As a side effect this makes a lot of #include <sys/devicestat.h>
lines not needed and some biofinish() calls can be reduced to
biodone() again.
memory-allocation purposes. Right now it is also a very good idea
because we hit a Giant assertion in the free(9) processing if we
free something larger than 64k.
branches:
Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.
This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.
Approved by: re(scottl)
- Get rid of the useless atop() / pmap_phys_address() detour. The
device mmap handlers must now give back the physical address
without atop()'ing it.
- Don't borrow the physical address of the mapping in the returned
int. Now we properly pass a vm_offset_t * and expect it to be
filled by the mmap handler when the mapping was successful. The
mmap handler must now return 0 when successful, any other value
is considered as an error. Previously, returning -1 was the only
way to fail. This change thus accidentally fixes some devices
which were bogusly returning errno constants which would have been
considered as addresses by the device pager.
- Garbage collect the poorly named pmap_phys_address() now that it's
no longer used.
- Convert all the d_mmap_t consumers to the new API.
I'm still not sure wheter we need a __FreeBSD_version bump for this,
since and we didn't guarantee API/ABI stability until 5.1-RELEASE.
Discussed with: alc, phk, jake
Reviewed by: peter
Compile-tested on: LINT (i386), GENERIC (alpha and sparc64)
Runtime-tested on: i386
Retire the "d_dump_t" and use the "dumper_t" type instead.
Dumper_t takes a void * as first arg which is more general than the
dev_t taken by d_dump_t. (Remember: we could have net-dumpers if
somebody wrote us one!)
Define the convention for GEOM controlled disk devices to be that the
first argument to the dumper function is the struct disk pointer.
Change device drivers accordingly.
Change the argument to disk_destroy() to be the same struct disk * as
disk_create() takes.
This enables drivers to ignore the (now) bogus dev_t which disk_create()
returns.
lower extremities.
Setting bit 4 in debugflags (sysctl kern.geom.debugflags=16) will
allow any open to succeed on rank#1 providers. This will generally
correspond to the physical disk devices: ad0, da0, md0 etc.
This fundamentally violates the mechanics of GEOMs autoconfiguration,
and is only provided as a debugging facility, so obviously error
reports on GEOM where this bit is or has been set will not be
accepted.
disk I/O processing.
The intent is that the disk driver in its hardware interrupt
routine will simply schedule the bio on the task queue with
a routine to finish off whatever needs done.
The g_up thread will then schedule this routine, the likely
outcome of which is a biodone() which queues the bio on
g_up's regular queue where it will be picked up and processed.
Compared to the using the regular taskqueue, this saves one
contextswitch.
Change our scheduling of the g_up and g_down queues to be water-tight,
at the cost of breaking the userland regression test-shims.
Input and ideas from: scottl
Cut up requests into smaller bits if they are longer than the drivers
disk->d_maxsize or dev->si_iosize_max.
Properly handle the race condition when using g_clone_bio() is used
without having the single-threadedness of g_down/g_up secure locking.
and d_stripesisze;
Introduce si_stripesize and si_stripeoffset in struct cdev so we
can make the visible to clustering code.
Add stripesize and stripeoffset to providers.
DTRT with stripesize and stripeoffset in various places in GEOM.
idle time.
Statistics now default to "on" and can be turned off with
sysctl kern.geom.collectstats=0
Performance impact of statistics collection is on the order of
800 nsec per consumer/provider set on a 700MHz Athlon.
Insted of embedding a struct g_stat in consumers and providers, merely
include a pointer.
Remove a couple of <sys/time.h> includes now unneeded.
Add a special allocator for struct g_stat. This allocator will allocate
entire pages and hand out g_stat functions from there. The "id" field
indicates free/used status.
Add "/dev/geom.stats" device driver whic exports the pages from the
allocator to userland with mmap(2) in read-only mode.
This mmap(2) interface should be considered a non-public interface and
the functions in libgeom (not yet committed) should be used to access
the statistics data.
Add debug.sizeof.g_stat sysctl.
Set the id field of the g_stat when we create consumers and providers.
Remove biocount from consumer, we will use the counters in the g_stat
structure instead. Replace one field which will need to be atomically
manipulated with two fields which will not (stat.nop and stat.nend).
Change add companion field to bio_children: bio_inbed for the exact
same reason.
Don't output the biocount in the confdot output.
Fix KASSERT in g_io_request().
Add sysctl kern.geom.collectstats defaulting to off.
Collect the following raw statistics conditioned on this sysctl:
for each consumer and provider {
total number of operations started.
total number of operations completed.
time last operation completed.
sum of idle-time.
for each of BIO_READ, BIO_WRITE and BIO_DELETE {
number of operations completed.
number of bytes completed.
number of ENOMEM errors.
number of other errors.
sum of transaction time.
}
}
API for getting hold of these statistics data not included yet.
We may actually be increasing Giant contention doing so because the
actual stuff we do is very cheap.
Also I am not convinced there is not a tiny window for a race here.
Change the si_name of dev_t's to be a char * and put a private buffer for
holding the name at then end of the struct.
Initialize si_name to point to the private buffer.
Put a KASSERT in geom_disk to prevent overrun on the fake dev_t we still
have to generate for the disk_drivers.
this will cause volume labels to be exposed in /dev/vol/<volname>. Currently,
there is no conflict resolution if more than one FS has the same volume name.
Reviewed by: phk
Make passing the methods in a cdevsw structure optional.
Move "CANFREE" and "NOGIANT" flags into struct disk instead of the
cdevsw which may or may not be there.
Rename CANFREE to CANDELETE to match BIO_DELETE operation.
Add "OPEN" flag so drivers don't have to provide open/close methods
just to maintain such a flag.
Add temporary stopgap include of <sys/conf.h> to <sys/disk.h> until
the files which have them in the other order are fixed.
Add KASSERTS to make sure we don't get fed too many NULL pointers.
Clear our geom's softc pointer before we wither.
labeled disk.
This is complicated by the fact that BBSIZE is greater than the
PAGE_SIZE limit ioctl inflicts on arguments which are automatically
copied in.
As long as we don't need access to userland memory (copyin/out) we
can deal with the ioctl using g_callme() which executes it from the
GEOM event thread.
Once we need copyin/out, we need to return the bio with EDIRIOCTL
in order to make geom_dev call us back in the original process context
where copyin will work.
Unfortunately, that results in us getting called with Giant, so
we have to DROP_GIANT/PICKUP_GIANT around the code where we diddle
GEOMs internals.
Sometimes you just can't win...
... But it does make geom_bsd.c an almost complete example of the
GEOM beastiarium.
CAUTION:
Previously CCD would be different from all other disks in
the system in that there were no "ccd0" device, only a
"ccd0c" device.
This is no longer so after this commit. If you access a
ccd device through the "/dev/ccd0c" device _and_ have not
actually put a BSD disklabel on the device, you will have
to use the name "/dev/ccd0". If your CCD device contains
a BSD disklabel there should be no difference.
You need to recompile ccdconfig(8) using the changed
src/sys/sys/ccdvar.h for the -g "show me" option to work.
I have run the regression test I created before I started
overhauling CCD and it flags no problems, but this code
is mildly evil, so take care. If you would cry if you lost
what's on CCD, make a back before you upgrade.
Create separate cdevsw for the /dev/ccd.ctl device.
Remove the cloning function, the disk-minilayer will do all naming
for us.
Remove the ccdunit and ccdpart functions and carry the softc pointer
in the relevant dev_t's and structures.
Release all memory when a CCD device is unconfigured, previously
the softc would linger behind.
Remove all traces of BSD disklabel fiddling code.
Remove ccdpsize, the disk mini-layer does this for us.
Don't allocate memory with M_WAITOK in ccdstrategy().
Remove boundary checks which the disk mini-layer does for us.
Don't allocate space for more than 2 ccdbuf, RAID was never implemented.
NB: I have not tried to address any of the preexisting ailments of CCD.
and can be added back selectively, should anybody start to interest
themselves for the internal workings of ccd.
This commit will make the diffs for the following commits much more
readable.
the three configuration ioctls which need a unit number.
Add a "ccd.ctl" device for config operations.
Implement ioctls on ccd.ctl which rely on the explicityly passed
unit numbers.
Update ccdconfig to use the new ccd.ctl interface.
Add code to the kernel to detect old ccdconfig binaries, and whine
about it.
Add code to ccdconfig to detect old kernels, and whine about it.
These two compatibility measures will be retained only for a limited
period since they are in the way of GEOM'ification of ccd.
ioctls are no reliable indication of the ioctls "set" or "get" nature or if
such simplistic categories can even be applied.
MFC candidate: boot0cfg issue.
some trick is necessary to prevent further BSD geoms from attaching to
that. Our old trick was to make sure we don't attach to a geom from
the "BSD" class, but this doesn't work if an intermediary geom obscures
this fact. Instead, calculate the MD5 checksum of the label we target
and ask if anybody below us loves that label. If they do we don't.
Coded by: gordon.
Make sure sector zero is protected if it contains metadata.
Lower WARNS for gbde to 3 on non-i386 archs. rijndael-fst is evil
but appearntly does the right thing and passes the test-vectors.
MFC Candidate.
for request sizes larger than the sectorsize or for multi-key setups.
See warning mailed to current@ for details of recovery.
Found by: Marcus Reid <marcus@blazingdot.com>
This mostly consists of functionality to serialize accesses to
the two ATA channels (which can also be used to "fix" certain
PCI based controllers).
Add support for Acard controllers.
Enable the ATA driver in PC98 GENERIC, and add device hints.
Update man page with latest support.
The PC98 core team has kindly provided me with a PC98
machine that made this all possible, thanks to all that
contributed to that effort, without that this would
probably newer have been possible..
Approved by: re@
are the output of AES/128/CBC or ARC4RANDOM. Encrypt the random data with which
we wipe when we get a BIO_DELETE to make such an algorithm useful.
Sponsored by: DARPA & NAI Labs
Approved by: re (blanket)
Replace ARC4 with SHA2-512.
Change lock-structure encoding to use random ordering rather for obscurity.
Encrypt lock-structure with AES/256 instead of AES/128.
Change kkey derivation to be MD5 hash based.
Watch for malloc(M_NOWAIT) failures and ditch our cache when they happen.
Remove clause 3 of the license with NAI Labs consent.
Many thanks to "Lucky Green" <shamrock@cypherpunks.to> and "David
Wagner" <daw@cs.berkeley.edu>, for code reading, inputs and
suggestions.
This code has still not been stared at for 10 years by a gang of
hard-core cryptographers. Discretion advised.
NB: These changes result in the on-disk format changing: dump/restore needed.
Sponsored by: DARPA & NAI Labs.
skip those. This handles the Protective MBR (PMBR) which consists
of a single partition of type 0xEE that covers the whole disk and
as such protects the GPT partitioning. We allow other partitions to
be present besides partitions of type 0xEE and as such interpret
partition type 0xEE as a "hands-off" partition only.
While here, fix g_mbrext_dumpconf to test if indent is NULL and
dump the data in a form that libdisk can grok. Change the logic
in g_mbr_dumpconf to match that of g_mbrext_dumpconf. This does
not change the output, but prevents a NULL-pointer dereference
when indent == NULL && pp == NULL.
expected under -current. This is a problem for GEOM because the up/down
threads cannot sleep waiting for memory to become free. The reason they
cannot sleep is that paging things out to disk may be the only way we can
clear up some RAM. Nice catch-22 there.
Implement a rudimentary ENOMEM recovery strategy: If an I/O request
fails with an error code of ENOMEM, schedule it for a retry, and
tell the down-thread to sleep hz/10 to get other parts of the system
a chance to free up some memory, in particular the up-path in GEOM.
All caches should probably start to monitor malloc(9) failures using the new
malloc_last_fail() function, and release when it indicates congestion.
Sponsored by: DARPA & NAI Labs.
WARNING: This is not a published interface, it is a stopgap measure for
WARNING: libdisk so we can get 5.0-R out of the door.
Sponsored by: DARPA & NAI Labs
WARNING: You need to backup and restore the _unencrypted_ contents
WARNING: of your GBDE disks when you take this update!
Sponsored by: DARPA & NAI Labs.
This is not quite the set of information I would want, but the tree where
I have the "correct" version is messed up with conflicts.
Sponsored by: DARPA & NAI Labs.