idle time.
Statistics now default to "on" and can be turned off with
sysctl kern.geom.collectstats=0
Performance impact of statistics collection is on the order of
800 nsec per consumer/provider set on a 700MHz Athlon.
Insted of embedding a struct g_stat in consumers and providers, merely
include a pointer.
Remove a couple of <sys/time.h> includes now unneeded.
Add a special allocator for struct g_stat. This allocator will allocate
entire pages and hand out g_stat functions from there. The "id" field
indicates free/used status.
Add "/dev/geom.stats" device driver whic exports the pages from the
allocator to userland with mmap(2) in read-only mode.
This mmap(2) interface should be considered a non-public interface and
the functions in libgeom (not yet committed) should be used to access
the statistics data.
Add debug.sizeof.g_stat sysctl.
Set the id field of the g_stat when we create consumers and providers.
Remove biocount from consumer, we will use the counters in the g_stat
structure instead. Replace one field which will need to be atomically
manipulated with two fields which will not (stat.nop and stat.nend).
Change add companion field to bio_children: bio_inbed for the exact
same reason.
Don't output the biocount in the confdot output.
Fix KASSERT in g_io_request().
Add sysctl kern.geom.collectstats defaulting to off.
Collect the following raw statistics conditioned on this sysctl:
for each consumer and provider {
total number of operations started.
total number of operations completed.
time last operation completed.
sum of idle-time.
for each of BIO_READ, BIO_WRITE and BIO_DELETE {
number of operations completed.
number of bytes completed.
number of ENOMEM errors.
number of other errors.
sum of transaction time.
}
}
API for getting hold of these statistics data not included yet.
We may actually be increasing Giant contention doing so because the
actual stuff we do is very cheap.
Also I am not convinced there is not a tiny window for a race here.
Change the si_name of dev_t's to be a char * and put a private buffer for
holding the name at then end of the struct.
Initialize si_name to point to the private buffer.
Put a KASSERT in geom_disk to prevent overrun on the fake dev_t we still
have to generate for the disk_drivers.
this will cause volume labels to be exposed in /dev/vol/<volname>. Currently,
there is no conflict resolution if more than one FS has the same volume name.
Reviewed by: phk
Make passing the methods in a cdevsw structure optional.
Move "CANFREE" and "NOGIANT" flags into struct disk instead of the
cdevsw which may or may not be there.
Rename CANFREE to CANDELETE to match BIO_DELETE operation.
Add "OPEN" flag so drivers don't have to provide open/close methods
just to maintain such a flag.
Add temporary stopgap include of <sys/conf.h> to <sys/disk.h> until
the files which have them in the other order are fixed.
Add KASSERTS to make sure we don't get fed too many NULL pointers.
Clear our geom's softc pointer before we wither.
labeled disk.
This is complicated by the fact that BBSIZE is greater than the
PAGE_SIZE limit ioctl inflicts on arguments which are automatically
copied in.
As long as we don't need access to userland memory (copyin/out) we
can deal with the ioctl using g_callme() which executes it from the
GEOM event thread.
Once we need copyin/out, we need to return the bio with EDIRIOCTL
in order to make geom_dev call us back in the original process context
where copyin will work.
Unfortunately, that results in us getting called with Giant, so
we have to DROP_GIANT/PICKUP_GIANT around the code where we diddle
GEOMs internals.
Sometimes you just can't win...
... But it does make geom_bsd.c an almost complete example of the
GEOM beastiarium.
CAUTION:
Previously CCD would be different from all other disks in
the system in that there were no "ccd0" device, only a
"ccd0c" device.
This is no longer so after this commit. If you access a
ccd device through the "/dev/ccd0c" device _and_ have not
actually put a BSD disklabel on the device, you will have
to use the name "/dev/ccd0". If your CCD device contains
a BSD disklabel there should be no difference.
You need to recompile ccdconfig(8) using the changed
src/sys/sys/ccdvar.h for the -g "show me" option to work.
I have run the regression test I created before I started
overhauling CCD and it flags no problems, but this code
is mildly evil, so take care. If you would cry if you lost
what's on CCD, make a back before you upgrade.
Create separate cdevsw for the /dev/ccd.ctl device.
Remove the cloning function, the disk-minilayer will do all naming
for us.
Remove the ccdunit and ccdpart functions and carry the softc pointer
in the relevant dev_t's and structures.
Release all memory when a CCD device is unconfigured, previously
the softc would linger behind.
Remove all traces of BSD disklabel fiddling code.
Remove ccdpsize, the disk mini-layer does this for us.
Don't allocate memory with M_WAITOK in ccdstrategy().
Remove boundary checks which the disk mini-layer does for us.
Don't allocate space for more than 2 ccdbuf, RAID was never implemented.
NB: I have not tried to address any of the preexisting ailments of CCD.
and can be added back selectively, should anybody start to interest
themselves for the internal workings of ccd.
This commit will make the diffs for the following commits much more
readable.
the three configuration ioctls which need a unit number.
Add a "ccd.ctl" device for config operations.
Implement ioctls on ccd.ctl which rely on the explicityly passed
unit numbers.
Update ccdconfig to use the new ccd.ctl interface.
Add code to the kernel to detect old ccdconfig binaries, and whine
about it.
Add code to ccdconfig to detect old kernels, and whine about it.
These two compatibility measures will be retained only for a limited
period since they are in the way of GEOM'ification of ccd.
ioctls are no reliable indication of the ioctls "set" or "get" nature or if
such simplistic categories can even be applied.
MFC candidate: boot0cfg issue.
some trick is necessary to prevent further BSD geoms from attaching to
that. Our old trick was to make sure we don't attach to a geom from
the "BSD" class, but this doesn't work if an intermediary geom obscures
this fact. Instead, calculate the MD5 checksum of the label we target
and ask if anybody below us loves that label. If they do we don't.
Coded by: gordon.
Make sure sector zero is protected if it contains metadata.
Lower WARNS for gbde to 3 on non-i386 archs. rijndael-fst is evil
but appearntly does the right thing and passes the test-vectors.
MFC Candidate.
for request sizes larger than the sectorsize or for multi-key setups.
See warning mailed to current@ for details of recovery.
Found by: Marcus Reid <marcus@blazingdot.com>
This mostly consists of functionality to serialize accesses to
the two ATA channels (which can also be used to "fix" certain
PCI based controllers).
Add support for Acard controllers.
Enable the ATA driver in PC98 GENERIC, and add device hints.
Update man page with latest support.
The PC98 core team has kindly provided me with a PC98
machine that made this all possible, thanks to all that
contributed to that effort, without that this would
probably newer have been possible..
Approved by: re@
are the output of AES/128/CBC or ARC4RANDOM. Encrypt the random data with which
we wipe when we get a BIO_DELETE to make such an algorithm useful.
Sponsored by: DARPA & NAI Labs
Approved by: re (blanket)
Replace ARC4 with SHA2-512.
Change lock-structure encoding to use random ordering rather for obscurity.
Encrypt lock-structure with AES/256 instead of AES/128.
Change kkey derivation to be MD5 hash based.
Watch for malloc(M_NOWAIT) failures and ditch our cache when they happen.
Remove clause 3 of the license with NAI Labs consent.
Many thanks to "Lucky Green" <shamrock@cypherpunks.to> and "David
Wagner" <daw@cs.berkeley.edu>, for code reading, inputs and
suggestions.
This code has still not been stared at for 10 years by a gang of
hard-core cryptographers. Discretion advised.
NB: These changes result in the on-disk format changing: dump/restore needed.
Sponsored by: DARPA & NAI Labs.
skip those. This handles the Protective MBR (PMBR) which consists
of a single partition of type 0xEE that covers the whole disk and
as such protects the GPT partitioning. We allow other partitions to
be present besides partitions of type 0xEE and as such interpret
partition type 0xEE as a "hands-off" partition only.
While here, fix g_mbrext_dumpconf to test if indent is NULL and
dump the data in a form that libdisk can grok. Change the logic
in g_mbr_dumpconf to match that of g_mbrext_dumpconf. This does
not change the output, but prevents a NULL-pointer dereference
when indent == NULL && pp == NULL.
expected under -current. This is a problem for GEOM because the up/down
threads cannot sleep waiting for memory to become free. The reason they
cannot sleep is that paging things out to disk may be the only way we can
clear up some RAM. Nice catch-22 there.
Implement a rudimentary ENOMEM recovery strategy: If an I/O request
fails with an error code of ENOMEM, schedule it for a retry, and
tell the down-thread to sleep hz/10 to get other parts of the system
a chance to free up some memory, in particular the up-path in GEOM.
All caches should probably start to monitor malloc(9) failures using the new
malloc_last_fail() function, and release when it indicates congestion.
Sponsored by: DARPA & NAI Labs.
WARNING: This is not a published interface, it is a stopgap measure for
WARNING: libdisk so we can get 5.0-R out of the door.
Sponsored by: DARPA & NAI Labs
WARNING: You need to backup and restore the _unencrypted_ contents
WARNING: of your GBDE disks when you take this update!
Sponsored by: DARPA & NAI Labs.
This is not quite the set of information I would want, but the tree where
I have the "correct" version is messed up with conflicts.
Sponsored by: DARPA & NAI Labs.
don't take the detour over the I/O path to discover them using getattr(),
we can just pick them out directly.
Do note though, that for now they are only valid after the first open
of the underlying disk device due compatibility with the old disk_create()
API. This will change in the future so they will always be valid.
Sponsored by: DARPA & NAI Labs.
This is an encryption module designed for to secure denial of access
to the contents of "cold disks" with or without destruction activation.
Major features:
* Based on AES, MD5 and ARC4 algorithms.
* Four cryptographic barriers:
1) Pass-phrase encrypts the master key.
2) Pass-phrase + Lock data locates master key.
3) 128 bit key derived from 2048 bit master key protects sector key.
3) 128 bit random single-use sector keys protect data payload.
* Up to four different changeable pass-phrases.
* Blackening feature for provable destruction of master key material.
* Isotropic disk contents offers no information about sector contents.
* Configurable destination sector range allows steganographic deployment.
This commit adds the kernel part, separate commits will follow for the
userland utility and documentation.
This software was developed for the FreeBSD Project by Poul-Henning Kamp and
NAI Labs, the Security Research Division of Network Associates, Inc. under
DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the DARPA CHATS
research program.
Many thanks to Robert Watson, CBOSS Principal Investigator for making this
possible.
Sponsored by: DARPA & NAI Labs.
So do GEOM. Not a pretty sight.
Take all the interesting stuff out of GEOM::disk_create(), and leave just
the creation of the fake dev_t. Schedule the topology munging to happen
in the g_event thread with g_call_me().
This makes disk_create() pretty lock-agnostic, almost lock-atheist.
Tripped over by: peter
Sponsored by: DARPA & NAI Labs
and therefore we need a way for ioctl handlers to run in that thread
in GEOM. Rather than invent a complicated registration system to
recognize which ioctl handler to use for a given ioctl, we still
schedule all ioctls down the tree as bio transactions but add a
special return code that means "call me directly" and have the
geom_dev layer do that.
Use this for all ioctls that make it as far as a diskdriver to
avoid any backwards compatibility problems.
Requested by: scottl
Sponsored by: DARPA & NAI Labs
NB: But it will enable it in all kernels not having options "NO_GEOM"
Put the GEOM related options into the intended order.
Add "options NO_GEOM" to all kernel configs apart from NOTES.
In some order of controlled fashion, the NO_GEOM options will be
removed, architecture by architecture in the coming days.
There are currently three known issues which may force people to
need the NO_GEOM option:
boot0cfg/fdisk:
Tries to update the MBR while it is being used to control
slices. GEOM does not allow this as a direct operation.
SCSI floppy drives:
Appearantly the scsi-da driver return "EBUSY" if no media
is inserted. This is wrong, it should return ENXIO.
PC98:
It is unclear if GEOM correctly recognizes all variants of
PC98 disklabels. (Help Wanted! I have neither docs nor HW)
These issues are all being worked.
Sponsored by: DARPA & NAI Labs.
that this will make people use this for their future copy&paste operations.
Rework the detection of raw-disk offsets in disklabels. This actually
unearthed a number of bugs in the (now) previous version.
Also accept labels which don't have a magic RAW_PART, provided they don't
confuse us too much.
Change the order of our sanity-checks on labels found on disks to be more
robust.
Check against MAXPARTITIONS in our sanity-check and reject disklabels
we cannot cope with.
Create new g_bsd_modify() function to implment disklabel modifying
ioctls.
Implement DIOCSDINFO and DIOCWDINFO with the provision that the latter
still not writes your change back to disk. I didn't have the nerves
for that yet.
In the start routine, use g_call_me() for complex ioctls to prevent
sleeping.
Sponsored by: DARPA & NAI Labs.
with support for trying, doing and forcing.
This will eventually replace g_slice_addslice() which gets changed from
grabbing topology to requing it in this commit as well.
Sponsored by: DARPA & NAI Labs.
work.
This prevents people from sleeping in the UP/DOWN I/O path by mistake
or design (doing so almost invariably result in deadlocks since it
stalls all I/O processing in the given direction.
Sponsored by: DARPA & NAI Labs.
a disklabel modification tries to change an open device, and no
counter-examples exists.
Be less facist about when we can do Setattr, the openmodes of devices
are so loosely managed that the "exclusive" count is almost useless.
Sponsored by: DARPA & NAI Labs.
Add a __unused.
Make the 2byte decoder functions return 16 bits for the benefits
of picky lints.
No need to grab giant around a tsleep() when we have a timeout.
Sponsored by: DARPA & NAI Labs.
to be performed in the event-thread.
To do this, we need to lock the eventlist with g_eventlock (nee g_doorlock),
since g_call_me() being called from the UP/DOWN paths will not be able to
aquire g_topology_lock.
This also means that for now these events are not referenced on any
particular consumer/provider/geom.
For UP/DOWN path use, this will not become a problem since the access()
function will make sure we drain any bio's before we dismantle.
Sponsored by: DARPA & NAI Labs.
and predictable way, and I apologize if I have gotten it wrong anywhere,
getting prior review on a patch like this is not feasible, considering
the number of people involved and hardware availability etc.)
If struct disklabel is the messenger: kill the messenger.
Inside struct disk we had a struct disklabel which disk drivers used to
communicate certain metrics to the disklayer above (GEOM or the disk
mini-layer). This commit changes this communication to use four
explicit fields instead.
Amongst the benefits is that the fields do not get overwritten by
wrong or bogus on-disk disklabels.
Once that is clear, <sys/disk.h> which is included in the drivers
no longer need to pull <sys/disklabel.h> and <sys/diskslice.h> in,
the few places that needs them, have gotten explicit #includes for
them.
The disklabel inside struct disk is now only for internal use in
the disk mini-layer, so instead of embedding it, we malloc it as
we need it.
This concludes (modulus any mistakes) the series of disklabel related
commits.
I belive it all amounts to a NOP for all the rest of you :-)
Sponsored by: DARPA & NAI Labs.
them visible from userland, if need be.
I wish that the C language contained this as part of struct definintions,
but failing that, I would settle for an agreed upon set of functions for
packing/unpacking integers in various sizes from byte-streams which may
have unfriendly alignment.
This really belongs in <sys/endian.h> I guess.
is currently conditional on both the GEOM and GEOM_GPT options to
avoid getting GPT by default and having the MBR and GPT classes
clash.
The correct behaviour of the MBR class would be to back-off (reject)
a MBR if it's a Protective MBR (a MBR with a single partition of type
0xEE that spans the whole disk (as far as the MBR is concerned).
The correct behaviour if the GPT class would be to back-off (reject)
a GPT if there's a MBR that's not a Protective MBR.
At this stage it's inconvenient to destroy a good MBR when working
with GPTs that it's more convenient to have the MBR class back-off
when it detects the GPT signature on disk and have the GPT class
ignore the MBR.
In sys/gpt.h UUIDs (GUIDs) for the following FreeBSD partitions
have been defined:
GPT_ENT_TYPE_FREEBSD
FreeBSD slice with disklabel. This is the equivalent of
the well-known FreeBSD MBR partition type.
GPT_ENT_TYPE_FREEBSD_{SWAP|UFS|UFS2|VINUM}
FreeBSD partitions in the context of disklabel. This is
speculating on the idea to use the GPT to hold partitions
instead if slices and removing the fixed (and low) limits
we have on the number of partitions.
This commit lacks a GPT image for the regression suite.
"The only hard problem in cryptography is key-management."
All sectors are encrypted with AES in CBC mode using a constant key,
currently compiled in and all zero.
To activate this module, write the magic header on the partition:
echo "<<FreeBSD-GEOM-AES>>" | dd conv=sync of=/dev/md98
The encrypted device will be one sector shorter and have ".aes"
appended to its name.
Sponsored by: DARPA & NAI Labs.
Printing daddr_t's using %d format was always an error, but gcc's
warning about it was ignored for supported 64-bit arches and not printed
for supported 32-bit arches. Hundreds if not thousands thousands of
previously "fixed" daddr_t printings are now broken on 32-bit machines
by casting daddr_t's to longs. daddr_t's should be printed using %jd
format, but this fix uses %lld since %j is not implemented in the
kernel yet.
Fixed some nearby format printf errors (style bugs).
the relevant classes.
Some methods may implement various "magic spaces", this is reserved
or magic areas on the disk, set a side for various and sundry purposes.
A good example is the BSD disklabel and boot code on i386 which occupies
a total of four magic spaces: boot1, the disklabel, the padding behind
the disklabel and boot2. The reason we don't simply tell people to
write the appropriate stuff on the underlying device is that (some of)
the magic spaces might be real-time modifiable. It is for instance
possible to change a disklabel while partitions are open, provided
the open partitions do not get trampled in the process.
Sponsored by: DARPA & NAI Labs.
Notice that if the device on which the dump is set is destroyed for
any reason, the dump setting is lost. This in particular will
happen in the case of spoilage. For instance if you set dump on
ad0s1b and open ad0 for writing, ad0s* will be spoilt and the dump
setting lost. See geom(4) for more about spoiling.
Sponsored by: DARPA & NAI Labs.
3.The only thing worse than generalizing from one example
is generalizing from no examples at all.
Remove the fwcylinders attribute before anybody gets the idea that we
alone have squared the circle.
Sponsored by: DARPA & NAI Labs.
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.
Tested on: i386, alpha, sparc64
Caveats:
The new savecore program is not complete in the sense that it emulates
enough of the old savecores features to do the job, but implements none
of the options yet.
I would appreciate if a userland hacker could help me out getting savecore
to do what we want it to do from a users point of view, compression,
email-notification, space reservation etc etc. (send me email if
you are interested).
Currently, savecore will scan all devices marked as "swap" or "dump" in
/etc/fstab _or_ any devices specified on the command-line.
All architectures but i386 lack an implementation of dumpsys(), but
looking at the i386 version it should be trivial for anybody familiar
with the platform(s) to provide this function.
Documentation is quite sparse at this time, more to come.
Details:
ATA and SCSI drivers should work as the dump formatting code has been
removed. The IDA, TWE and AAC have not yet been converted.
Dumpon now opens the device and uses ioctl(DIOCGKERNELDUMP) to set
the device as dumpdev. To implement the "off" argument, /dev/null
is used as the device.
Savecore will fail if handed any options since they are not (yet)
implemented. All devices marked "dump" or "swap" in /etc/fstab
will be scanned and dumps found will be saved to diskfiles
named from the MD5 hash of the header record. The header record
is dumped in readable format in the .info file. The kernel
is not saved. Only complete dumps will be saved.
All maintainer rights for this code are disclaimed: feel free to
improve and extend.
Sponsored by: DARPA, NAI Labs
I have not been able to find very much information about the PC98
extended partition layout so this is gleaned from the source in
our pc98 architecture. Corrections and patched very welcome.
Sponsored by: DARPA and NAI Labs.