110 lines
4.0 KiB
Plaintext
110 lines
4.0 KiB
Plaintext
|
|
||
|
$FreeBSD$
|
||
|
|
||
|
ifs- inode filesystem
|
||
|
--
|
||
|
|
||
|
|
||
|
ifs is the beginning of a little experiment - to remove the namespace
|
||
|
from ffs. FFS is a good generic filesystem, however in high volume
|
||
|
activities today (eg web, mail, cache, news) one thing causes a rather
|
||
|
huge resource drain - the namespace.
|
||
|
|
||
|
Having to maintain the directory structures means wasting a lot of
|
||
|
disk IO and/or memory. Since most applications these days have their
|
||
|
own database containing object->ufs namespace mappings, we can effectively
|
||
|
bypass namespace together and talk instead to just inodes.
|
||
|
|
||
|
This is a big big hack(1), but its also a start. It should speed up news
|
||
|
servers and cache servers quite a bit - since the time spent in open()
|
||
|
and unlink() is drastically reduced - however, it is nowhere near
|
||
|
optimal. We'll cover that shortly.
|
||
|
|
||
|
(1) not hack as evil and ugly, hack as in non-optimal solution. The
|
||
|
optimal solution hasn't quite presented itself yet. :-)
|
||
|
|
||
|
|
||
|
|
||
|
How it works:
|
||
|
--
|
||
|
|
||
|
Basically ifs is a copy of ffs, overriding some vfs/vnops. (Yes, hack.)
|
||
|
I didn't see the need in duplicating all of sys/ufs/ffs to get this
|
||
|
off the ground.
|
||
|
|
||
|
File creation is done through a special file - 'newfile' . When newfile
|
||
|
is called, the system allocates and returns an inode. Note that newfile
|
||
|
is done in a cloning fashion:
|
||
|
|
||
|
fd = open("newfile", O_CREAT|O_RDWR, 0644);
|
||
|
fstat(fd, &st);
|
||
|
|
||
|
printf("new file is %d\n", (int)st.st_ino);
|
||
|
|
||
|
Once you have created a file, you can open() and unlink() it by its returned
|
||
|
inode number retrieved from the stat call, ie:
|
||
|
|
||
|
fd = open("5", O_RDWR);
|
||
|
|
||
|
The creation permissions depend entirely if you have write access to the
|
||
|
root directory of the filesystem.
|
||
|
|
||
|
|
||
|
Why its nowhere near optimal
|
||
|
--
|
||
|
|
||
|
When doing file allocation in FFS, it tries to reduce the disk seeks by
|
||
|
allocating new files inside the cylinder group of the parent directory, if
|
||
|
possible. In this scheme, we've had to drop that. Files are allocated
|
||
|
sequentially, filling up cylinder groups as we go along. Its not very optimal,
|
||
|
more research will have to be done into how cylinder group locality can be
|
||
|
bought back into this. (It entirely depends upon the benefits here..)
|
||
|
|
||
|
Allowing create by inode number requires quite a bit of code rewrite, and in
|
||
|
the test applications here I didn't need it. Maybe in the next phase I might
|
||
|
look at allowing create by inode number, feedback, please.
|
||
|
|
||
|
SOFTUPDATES will *NOT* work here - especially in unlink() where I've just
|
||
|
taken a large axe to it. I've tried to keep as much of the softupdates call
|
||
|
stubs in as much as possible, but I haven't looked at the softupdates code.
|
||
|
My reasoning was that because there's no directory metadata anymore,
|
||
|
softupdates isn't as important. Besides, fsck's are so damn quick ..
|
||
|
|
||
|
Extras
|
||
|
--
|
||
|
|
||
|
I've taken the liberty of applying a large axe to bits of fsck - stripping out
|
||
|
namespace checks. As far as I can *TELL*, its close, however, I'd like it if
|
||
|
someone fsck clued poked me back on what I missed.
|
||
|
|
||
|
There's also a modified copy of mount that will mount a fs type 'ifs'. Again,
|
||
|
its just the normal mount with s/"ufs"/"ifs"/g, async/noatime/etc mount
|
||
|
options work just as normal.
|
||
|
|
||
|
I haven't supplied an ifs 'newfs' - use FFS newfs to create a blank drive.
|
||
|
That creates the root directory, which you still do DEFINITELY need.
|
||
|
However, ifs updates on the drive will not update directory entries in '.'.
|
||
|
There is a 1:1 mapping between the inode numbers in open()/stat() and the
|
||
|
inodes on disk. You don't get access to inodes 0-2. They don't show up
|
||
|
in a readdir. I'll work on making 2 avaliable, but since the current ufs/ffs
|
||
|
code assumes things are locked against the root inode which is 2 ..
|
||
|
|
||
|
You can find these utilities in src/sbin/mount_ifs and src/sbin/fsck_ifs .
|
||
|
Yes, this means that you can tie in ifs partitions in your bootup
|
||
|
sequence.
|
||
|
|
||
|
TODO:
|
||
|
--
|
||
|
|
||
|
* Implement cookies for NFS
|
||
|
|
||
|
(Realise that this is a huge hack which uses the existing UFS/FFS code.
|
||
|
Therefore its nowhere near as optimal as it could be, and things aren't
|
||
|
as easy to add as one might think. Especially 'fake' files. :-)
|
||
|
|
||
|
|
||
|
|
||
|
--
|
||
|
Adrian Chadd
|
||
|
<adrianFreeBSD.org>
|