are the output of AES/128/CBC or ARC4RANDOM. Encrypt the random data with which
we wipe when we get a BIO_DELETE to make such an algorithm useful.
Sponsored by: DARPA & NAI Labs
Approved by: re (blanket)
Replace ARC4 with SHA2-512.
Change lock-structure encoding to use random ordering rather for obscurity.
Encrypt lock-structure with AES/256 instead of AES/128.
Change kkey derivation to be MD5 hash based.
Watch for malloc(M_NOWAIT) failures and ditch our cache when they happen.
Remove clause 3 of the license with NAI Labs consent.
Many thanks to "Lucky Green" <shamrock@cypherpunks.to> and "David
Wagner" <daw@cs.berkeley.edu>, for code reading, inputs and
suggestions.
This code has still not been stared at for 10 years by a gang of
hard-core cryptographers. Discretion advised.
NB: These changes result in the on-disk format changing: dump/restore needed.
Sponsored by: DARPA & NAI Labs.
skip those. This handles the Protective MBR (PMBR) which consists
of a single partition of type 0xEE that covers the whole disk and
as such protects the GPT partitioning. We allow other partitions to
be present besides partitions of type 0xEE and as such interpret
partition type 0xEE as a "hands-off" partition only.
While here, fix g_mbrext_dumpconf to test if indent is NULL and
dump the data in a form that libdisk can grok. Change the logic
in g_mbr_dumpconf to match that of g_mbrext_dumpconf. This does
not change the output, but prevents a NULL-pointer dereference
when indent == NULL && pp == NULL.
expected under -current. This is a problem for GEOM because the up/down
threads cannot sleep waiting for memory to become free. The reason they
cannot sleep is that paging things out to disk may be the only way we can
clear up some RAM. Nice catch-22 there.
Implement a rudimentary ENOMEM recovery strategy: If an I/O request
fails with an error code of ENOMEM, schedule it for a retry, and
tell the down-thread to sleep hz/10 to get other parts of the system
a chance to free up some memory, in particular the up-path in GEOM.
All caches should probably start to monitor malloc(9) failures using the new
malloc_last_fail() function, and release when it indicates congestion.
Sponsored by: DARPA & NAI Labs.
WARNING: This is not a published interface, it is a stopgap measure for
WARNING: libdisk so we can get 5.0-R out of the door.
Sponsored by: DARPA & NAI Labs
WARNING: You need to backup and restore the _unencrypted_ contents
WARNING: of your GBDE disks when you take this update!
Sponsored by: DARPA & NAI Labs.
This is not quite the set of information I would want, but the tree where
I have the "correct" version is messed up with conflicts.
Sponsored by: DARPA & NAI Labs.
don't take the detour over the I/O path to discover them using getattr(),
we can just pick them out directly.
Do note though, that for now they are only valid after the first open
of the underlying disk device due compatibility with the old disk_create()
API. This will change in the future so they will always be valid.
Sponsored by: DARPA & NAI Labs.
This is an encryption module designed for to secure denial of access
to the contents of "cold disks" with or without destruction activation.
Major features:
* Based on AES, MD5 and ARC4 algorithms.
* Four cryptographic barriers:
1) Pass-phrase encrypts the master key.
2) Pass-phrase + Lock data locates master key.
3) 128 bit key derived from 2048 bit master key protects sector key.
3) 128 bit random single-use sector keys protect data payload.
* Up to four different changeable pass-phrases.
* Blackening feature for provable destruction of master key material.
* Isotropic disk contents offers no information about sector contents.
* Configurable destination sector range allows steganographic deployment.
This commit adds the kernel part, separate commits will follow for the
userland utility and documentation.
This software was developed for the FreeBSD Project by Poul-Henning Kamp and
NAI Labs, the Security Research Division of Network Associates, Inc. under
DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the DARPA CHATS
research program.
Many thanks to Robert Watson, CBOSS Principal Investigator for making this
possible.
Sponsored by: DARPA & NAI Labs.
So do GEOM. Not a pretty sight.
Take all the interesting stuff out of GEOM::disk_create(), and leave just
the creation of the fake dev_t. Schedule the topology munging to happen
in the g_event thread with g_call_me().
This makes disk_create() pretty lock-agnostic, almost lock-atheist.
Tripped over by: peter
Sponsored by: DARPA & NAI Labs
and therefore we need a way for ioctl handlers to run in that thread
in GEOM. Rather than invent a complicated registration system to
recognize which ioctl handler to use for a given ioctl, we still
schedule all ioctls down the tree as bio transactions but add a
special return code that means "call me directly" and have the
geom_dev layer do that.
Use this for all ioctls that make it as far as a diskdriver to
avoid any backwards compatibility problems.
Requested by: scottl
Sponsored by: DARPA & NAI Labs
NB: But it will enable it in all kernels not having options "NO_GEOM"
Put the GEOM related options into the intended order.
Add "options NO_GEOM" to all kernel configs apart from NOTES.
In some order of controlled fashion, the NO_GEOM options will be
removed, architecture by architecture in the coming days.
There are currently three known issues which may force people to
need the NO_GEOM option:
boot0cfg/fdisk:
Tries to update the MBR while it is being used to control
slices. GEOM does not allow this as a direct operation.
SCSI floppy drives:
Appearantly the scsi-da driver return "EBUSY" if no media
is inserted. This is wrong, it should return ENXIO.
PC98:
It is unclear if GEOM correctly recognizes all variants of
PC98 disklabels. (Help Wanted! I have neither docs nor HW)
These issues are all being worked.
Sponsored by: DARPA & NAI Labs.
that this will make people use this for their future copy&paste operations.
Rework the detection of raw-disk offsets in disklabels. This actually
unearthed a number of bugs in the (now) previous version.
Also accept labels which don't have a magic RAW_PART, provided they don't
confuse us too much.
Change the order of our sanity-checks on labels found on disks to be more
robust.
Check against MAXPARTITIONS in our sanity-check and reject disklabels
we cannot cope with.
Create new g_bsd_modify() function to implment disklabel modifying
ioctls.
Implement DIOCSDINFO and DIOCWDINFO with the provision that the latter
still not writes your change back to disk. I didn't have the nerves
for that yet.
In the start routine, use g_call_me() for complex ioctls to prevent
sleeping.
Sponsored by: DARPA & NAI Labs.
with support for trying, doing and forcing.
This will eventually replace g_slice_addslice() which gets changed from
grabbing topology to requing it in this commit as well.
Sponsored by: DARPA & NAI Labs.
work.
This prevents people from sleeping in the UP/DOWN I/O path by mistake
or design (doing so almost invariably result in deadlocks since it
stalls all I/O processing in the given direction.
Sponsored by: DARPA & NAI Labs.
a disklabel modification tries to change an open device, and no
counter-examples exists.
Be less facist about when we can do Setattr, the openmodes of devices
are so loosely managed that the "exclusive" count is almost useless.
Sponsored by: DARPA & NAI Labs.
Add a __unused.
Make the 2byte decoder functions return 16 bits for the benefits
of picky lints.
No need to grab giant around a tsleep() when we have a timeout.
Sponsored by: DARPA & NAI Labs.
to be performed in the event-thread.
To do this, we need to lock the eventlist with g_eventlock (nee g_doorlock),
since g_call_me() being called from the UP/DOWN paths will not be able to
aquire g_topology_lock.
This also means that for now these events are not referenced on any
particular consumer/provider/geom.
For UP/DOWN path use, this will not become a problem since the access()
function will make sure we drain any bio's before we dismantle.
Sponsored by: DARPA & NAI Labs.
and predictable way, and I apologize if I have gotten it wrong anywhere,
getting prior review on a patch like this is not feasible, considering
the number of people involved and hardware availability etc.)
If struct disklabel is the messenger: kill the messenger.
Inside struct disk we had a struct disklabel which disk drivers used to
communicate certain metrics to the disklayer above (GEOM or the disk
mini-layer). This commit changes this communication to use four
explicit fields instead.
Amongst the benefits is that the fields do not get overwritten by
wrong or bogus on-disk disklabels.
Once that is clear, <sys/disk.h> which is included in the drivers
no longer need to pull <sys/disklabel.h> and <sys/diskslice.h> in,
the few places that needs them, have gotten explicit #includes for
them.
The disklabel inside struct disk is now only for internal use in
the disk mini-layer, so instead of embedding it, we malloc it as
we need it.
This concludes (modulus any mistakes) the series of disklabel related
commits.
I belive it all amounts to a NOP for all the rest of you :-)
Sponsored by: DARPA & NAI Labs.
them visible from userland, if need be.
I wish that the C language contained this as part of struct definintions,
but failing that, I would settle for an agreed upon set of functions for
packing/unpacking integers in various sizes from byte-streams which may
have unfriendly alignment.
This really belongs in <sys/endian.h> I guess.
is currently conditional on both the GEOM and GEOM_GPT options to
avoid getting GPT by default and having the MBR and GPT classes
clash.
The correct behaviour of the MBR class would be to back-off (reject)
a MBR if it's a Protective MBR (a MBR with a single partition of type
0xEE that spans the whole disk (as far as the MBR is concerned).
The correct behaviour if the GPT class would be to back-off (reject)
a GPT if there's a MBR that's not a Protective MBR.
At this stage it's inconvenient to destroy a good MBR when working
with GPTs that it's more convenient to have the MBR class back-off
when it detects the GPT signature on disk and have the GPT class
ignore the MBR.
In sys/gpt.h UUIDs (GUIDs) for the following FreeBSD partitions
have been defined:
GPT_ENT_TYPE_FREEBSD
FreeBSD slice with disklabel. This is the equivalent of
the well-known FreeBSD MBR partition type.
GPT_ENT_TYPE_FREEBSD_{SWAP|UFS|UFS2|VINUM}
FreeBSD partitions in the context of disklabel. This is
speculating on the idea to use the GPT to hold partitions
instead if slices and removing the fixed (and low) limits
we have on the number of partitions.
This commit lacks a GPT image for the regression suite.
"The only hard problem in cryptography is key-management."
All sectors are encrypted with AES in CBC mode using a constant key,
currently compiled in and all zero.
To activate this module, write the magic header on the partition:
echo "<<FreeBSD-GEOM-AES>>" | dd conv=sync of=/dev/md98
The encrypted device will be one sector shorter and have ".aes"
appended to its name.
Sponsored by: DARPA & NAI Labs.
Printing daddr_t's using %d format was always an error, but gcc's
warning about it was ignored for supported 64-bit arches and not printed
for supported 32-bit arches. Hundreds if not thousands thousands of
previously "fixed" daddr_t printings are now broken on 32-bit machines
by casting daddr_t's to longs. daddr_t's should be printed using %jd
format, but this fix uses %lld since %j is not implemented in the
kernel yet.
Fixed some nearby format printf errors (style bugs).
the relevant classes.
Some methods may implement various "magic spaces", this is reserved
or magic areas on the disk, set a side for various and sundry purposes.
A good example is the BSD disklabel and boot code on i386 which occupies
a total of four magic spaces: boot1, the disklabel, the padding behind
the disklabel and boot2. The reason we don't simply tell people to
write the appropriate stuff on the underlying device is that (some of)
the magic spaces might be real-time modifiable. It is for instance
possible to change a disklabel while partitions are open, provided
the open partitions do not get trampled in the process.
Sponsored by: DARPA & NAI Labs.
Notice that if the device on which the dump is set is destroyed for
any reason, the dump setting is lost. This in particular will
happen in the case of spoilage. For instance if you set dump on
ad0s1b and open ad0 for writing, ad0s* will be spoilt and the dump
setting lost. See geom(4) for more about spoiling.
Sponsored by: DARPA & NAI Labs.
3.The only thing worse than generalizing from one example
is generalizing from no examples at all.
Remove the fwcylinders attribute before anybody gets the idea that we
alone have squared the circle.
Sponsored by: DARPA & NAI Labs.
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.
Tested on: i386, alpha, sparc64
Caveats:
The new savecore program is not complete in the sense that it emulates
enough of the old savecores features to do the job, but implements none
of the options yet.
I would appreciate if a userland hacker could help me out getting savecore
to do what we want it to do from a users point of view, compression,
email-notification, space reservation etc etc. (send me email if
you are interested).
Currently, savecore will scan all devices marked as "swap" or "dump" in
/etc/fstab _or_ any devices specified on the command-line.
All architectures but i386 lack an implementation of dumpsys(), but
looking at the i386 version it should be trivial for anybody familiar
with the platform(s) to provide this function.
Documentation is quite sparse at this time, more to come.
Details:
ATA and SCSI drivers should work as the dump formatting code has been
removed. The IDA, TWE and AAC have not yet been converted.
Dumpon now opens the device and uses ioctl(DIOCGKERNELDUMP) to set
the device as dumpdev. To implement the "off" argument, /dev/null
is used as the device.
Savecore will fail if handed any options since they are not (yet)
implemented. All devices marked "dump" or "swap" in /etc/fstab
will be scanned and dumps found will be saved to diskfiles
named from the MD5 hash of the header record. The header record
is dumped in readable format in the .info file. The kernel
is not saved. Only complete dumps will be saved.
All maintainer rights for this code are disclaimed: feel free to
improve and extend.
Sponsored by: DARPA, NAI Labs
I have not been able to find very much information about the PC98
extended partition layout so this is gleaned from the source in
our pc98 architecture. Corrections and patched very welcome.
Sponsored by: DARPA and NAI Labs.
The detection code in this method is written so that it should work on
all architectures which means that you can plug a Sun disk into a i386
now and access the partitions.
We still need an endian-agnostic ufs/ffs before this is really
interresting, but the main focus was to get sparc64 onto the GEOM
trail.
the bio and buffer structures to have daddr64_t bio_pblkno,
b_blkno, and b_lblkno fields which allows access to disks
larger than a Terabyte in size. This change also requires
that the VOP_BMAP vnode operation accept and return daddr64_t
blocks. This delta should not affect system operation in
any way. It merely sets up the necessary interfaces to allow
the development of disk drivers that work with these larger
disk block addresses. It also allows for the development of
UFS2 which will use 64-bit block addresses.
test and play with this.
This is not yet production quality and should be run only on dedicated
test boxes.
For people who want to develop transformations for GEOM there exist a
set of shims to run geom in userland (ask phk@freebsd.org).
Reports of all kinds to: phk@freebsd.org
Please include in report:
dmesg
sysctl debug.geomdot
sysctl debug.geomconf
Known significant limitations:
no kernel dump facility.
ioctls severely restricted.
Sponsored by: DARPA, NAI Labs
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
cloning infrastructure standard in kern_conf. Modules are now
the same with or without devfs support.
If you need to detect if devfs is present, in modules or elsewhere,
check the integer variable "devfs_present".
This happily removes an ugly hack from kern/vfs_conf.c.
This forces a rename of the eventhandler and the standard clone
helper function.
Include <sys/eventhandler.h> in <sys/conf.h>: it's a helper #include
like <sys/queue.h>
Remove all #includes of opt_devfs.h they no longer matter.
after the acquisition of any advisory locks. This fix corrects a case
in which a process tries to open a file with a non-blocking exclusive
lock. Even if it fails to get the lock it would still truncate the
file even though its open failed. With this change, the truncation
is done only after the lock is successfully acquired.
Obtained from: BSD/OS
<sys/bio.h>.
<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.
Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.
Still a few bogus uses of struct buf to track down.
Repocopy by: peter
Exceptions:
Vinum untouched. This means that it cannot be compiled.
Greg Lehey is on the case.
CCD not converted yet, casts to struct buf (still safe)
atapi-cd casts to struct buf to examine B_PHYS
(Much of this done by script)
Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.
Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.
Add bio_queue field for struct bio aware disksort.
Address a lot of stylistic issues brought up by bde.
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)
substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)
This patch is machine generated except for the ccd.c and buf.h parts.
field in struct buf: b_iocmd. The b_iocmd is enforced to have
exactly one bit set.
B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.
Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.
Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue. It is likely to write on your disk
where it should have been reading.
This change is a step in the direction towards a stackable BIO capability.
A lot of this patch were machine generated (Thanks to style(9) compliance!)
Vinum users: Greg has not had time to test this yet, be careful.