freebsd-dev/sys/ufs/ifs/README


$FreeBSD$

ifs- inode filesystem
--


ifs is the beginning of a little experiment - to remove the namespace
from ffs. FFS is a good generic filesystem, however in high volume
activities today (eg web, mail, cache, news) one thing causes a rather
huge resource drain - the namespace.

Having to maintain the directory structures means wasting a lot of
disk IO and/or memory. Since most applications these days have their
own database containing object->ufs namespace mappings, we can effectively
bypass namespace together and talk instead to just inodes.

This is a big big hack(1), but its also a start. It should speed up news
servers and cache servers quite a bit - since the time spent in open()
and unlink() is drastically reduced - however, it is nowhere near
optimal. We'll cover that shortly.

(1) not hack as evil and ugly, hack as in non-optimal solution. The
    optimal solution hasn't quite presented itself yet. :-)


How it works:
--

Basically ifs is a copy of ffs, overriding some vfs/vnops. (Yes, hack.)
I didn't see the need in duplicating all of sys/ufs/ffs to get this
off the ground.

File creation is done through a special file - 'newfile' . When newfile
is called, the system allocates and returns an inode. Note that newfile
is done in a cloning fashion:

fd = open("newfile", O_CREAT|O_RDWR, 0644);
fstat(fd, &st);

printf("new file is %d\n", (int)st.st_ino); 

Once you have created a file, you can open() and unlink() it by its returned
inode number retrieved from the stat call, ie:

fd = open("5", O_RDWR);

The creation permissions depend entirely if you have write access to the
root directory of the filesystem.


Why its nowhere near optimal
--

When doing file allocation in FFS, it tries to reduce the disk seeks by
allocating new files inside the cylinder group of the parent directory, if
possible.  In this scheme, we've had to drop that. Files are allocated
sequentially, filling up cylinder groups as we go along. Its not very optimal,
more research will have to be done into how cylinder group locality can be
bought back into this. (It entirely depends upon the benefits here..)

Allowing create by inode number requires quite a bit of code rewrite, and in
the test applications here I didn't need it. Maybe in the next phase I might
look at allowing create by inode number, feedback, please.

SOFTUPDATES will *NOT* work here - especially in unlink() where I've just
taken a large axe to it. I've tried to keep as much of the softupdates call
stubs in as much as possible, but I haven't looked at the softupdates code.
My reasoning was that because there's no directory metadata anymore,
softupdates isn't as important. Besides, fsck's are so damn quick ..

Extras
--

I've taken the liberty of applying a large axe to bits of fsck - stripping out
namespace checks. As far as I can *TELL*, its close, however, I'd like it if
someone fsck clued poked me back on what I missed.

There's also a modified copy of mount that will mount a fs type 'ifs'. Again,
its just the normal mount with s/"ufs"/"ifs"/g, async/noatime/etc mount
options work just as normal.

I haven't supplied an ifs 'newfs' - use FFS newfs to create a blank drive.
That creates the root directory, which you still do DEFINITELY need.
However, ifs updates on the drive will not update directory entries in '.'.
There is a 1:1 mapping between the inode numbers in open()/stat() and the
inodes on disk. You don't get access to inodes 0-2. They don't show up
in a readdir. I'll work on making 2 avaliable, but since the current ufs/ffs
code assumes things are locked against the root inode which is 2 ..

You can find these utilities in src/sbin/mount_ifs and src/sbin/fsck_ifs .
Yes, this means that you can tie in ifs partitions in your bootup
sequence.

TODO:
--

* Implement cookies for NFS

  (Realise that this is a huge hack which uses the existing UFS/FFS code.
   Therefore its nowhere near as optimal as it could be, and things aren't
   as easy to add as one might think. Especially 'fake' files. :-)


--
Adrian Chadd
<adrianFreeBSD.org>
Initial commit of IFS - a inode-namespaced FFS. Here is a short description: How it works: -- Basically ifs is a copy of ffs, overriding some vfs/vnops. (Yes, hack.) I didn't see the need in duplicating all of sys/ufs/ffs to get this off the ground. File creation is done through a special file - 'newfile' . When newfile is called, the system allocates and returns an inode. Note that newfile is done in a cloning fashion: fd = open("newfile", O_CREAT\|O_RDWR, 0644); fstat(fd, &st); printf("new file is %d\n", (int)st.st_ino); Once you have created a file, you can open() and unlink() it by its returned inode number retrieved from the stat call, ie: fd = open("5", O_RDWR); The creation permissions depend entirely if you have write access to the root directory of the filesystem. To get the list of currently allocated inodes, VOP_READDIR has been added which returns a directory listing of those currently allocated. -- What this entails: * patching conf/files and conf/options to include IFS as a new compile option (and since ifs depends upon FFS, include the FFS routines) * An entry in i386/conf/NOTES indicating IFS exists and where to go for an explanation * Unstaticize a couple of routines in src/sys/ufs/ffs/ which the IFS routines require (ffs_mount() and ffs_reload()) * a new bunch of routines in src/sys/ufs/ifs/ which implement the IFS routines. IFS replaces some of the vfsops, and a handful of vnops - most notably are VFS_VGET(), VOP_LOOKUP(), VOP_UNLINK() and VOP_READDIR(). Any other directory operation is marked as invalid. What this results in: * an IFS partition's create permissions are controlled by the perm/ownership of the root mount point, just like a normal directory * Each inode has perm and ownership too * IFS does NOT mean an FFS partition can be opened per inode. This is a completely seperate filesystem here * Softupdates doesn't work with IFS, and really I don't think it needs it. Besides, fsck's are FAST. (Try it :-) * Inodes 0 and 1 aren't allocatable because they are special (dump/swap IIRC). Inode 2 isn't allocatable since UFS/FFS locks all inodes in the system against this particular inode, and unravelling THAT code isn't trivial. Therefore, useful inodes start at 3. Enjoy, and feedback is definitely appreciated! 2000-10-14 03:02:30 +00:00
			$FreeBSD$

			`ifs- inode filesystem`
			`--`


			`ifs is the beginning of a little experiment - to remove the namespace`
			`from ffs. FFS is a good generic filesystem, however in high volume`
			`activities today (eg web, mail, cache, news) one thing causes a rather`
			`huge resource drain - the namespace.`

			`Having to maintain the directory structures means wasting a lot of`
			`disk IO and/or memory. Since most applications these days have their`
			`own database containing object->ufs namespace mappings, we can effectively`
			`bypass namespace together and talk instead to just inodes.`

			`This is a big big hack(1), but its also a start. It should speed up news`
			`servers and cache servers quite a bit - since the time spent in open()`
			`and unlink() is drastically reduced - however, it is nowhere near`
			`optimal. We'll cover that shortly.`

			`(1) not hack as evil and ugly, hack as in non-optimal solution. The`
			`optimal solution hasn't quite presented itself yet. :-)`



			`How it works:`
			`--`

			`Basically ifs is a copy of ffs, overriding some vfs/vnops. (Yes, hack.)`
			`I didn't see the need in duplicating all of sys/ufs/ffs to get this`
			`off the ground.`

			`File creation is done through a special file - 'newfile' . When newfile`
			`is called, the system allocates and returns an inode. Note that newfile`
			`is done in a cloning fashion:`

			`fd = open("newfile", O_CREAT\|O_RDWR, 0644);`
			`fstat(fd, &st);`

			`printf("new file is %d\n", (int)st.st_ino);`

			`Once you have created a file, you can open() and unlink() it by its returned`
			`inode number retrieved from the stat call, ie:`

			`fd = open("5", O_RDWR);`

			`The creation permissions depend entirely if you have write access to the`
			`root directory of the filesystem.`


			`Why its nowhere near optimal`
			`--`

			`When doing file allocation in FFS, it tries to reduce the disk seeks by`
			`allocating new files inside the cylinder group of the parent directory, if`
			`possible. In this scheme, we've had to drop that. Files are allocated`
			`sequentially, filling up cylinder groups as we go along. Its not very optimal,`
			`more research will have to be done into how cylinder group locality can be`
			`bought back into this. (It entirely depends upon the benefits here..)`

			`Allowing create by inode number requires quite a bit of code rewrite, and in`
			`the test applications here I didn't need it. Maybe in the next phase I might`
			`look at allowing create by inode number, feedback, please.`

			`SOFTUPDATES will NOT work here - especially in unlink() where I've just`
			`taken a large axe to it. I've tried to keep as much of the softupdates call`
			`stubs in as much as possible, but I haven't looked at the softupdates code.`
			`My reasoning was that because there's no directory metadata anymore,`
			`softupdates isn't as important. Besides, fsck's are so damn quick ..`

			`Extras`
			`--`

			`I've taken the liberty of applying a large axe to bits of fsck - stripping out`
			`namespace checks. As far as I can TELL, its close, however, I'd like it if`
			`someone fsck clued poked me back on what I missed.`

			`There's also a modified copy of mount that will mount a fs type 'ifs'. Again,`
			`its just the normal mount with s/"ufs"/"ifs"/g, async/noatime/etc mount`
			`options work just as normal.`

			`I haven't supplied an ifs 'newfs' - use FFS newfs to create a blank drive.`
			`That creates the root directory, which you still do DEFINITELY need.`
			`However, ifs updates on the drive will not update directory entries in '.'.`
			`There is a 1:1 mapping between the inode numbers in open()/stat() and the`
			`inodes on disk. You don't get access to inodes 0-2. They don't show up`
			`in a readdir. I'll work on making 2 avaliable, but since the current ufs/ffs`
			`code assumes things are locked against the root inode which is 2 ..`

			`You can find these utilities in src/sbin/mount_ifs and src/sbin/fsck_ifs .`
			`Yes, this means that you can tie in ifs partitions in your bootup`
			`sequence.`

			`TODO:`
			`--`

			`* Implement cookies for NFS`

			`(Realise that this is a huge hack which uses the existing UFS/FFS code.`
			`Therefore its nowhere near as optimal as it could be, and things aren't`
			`as easy to add as one might think. Especially 'fake' files. :-)`



			`--`
			`Adrian Chadd`
			`<adrianFreeBSD.org>`