23b590188f
against VM_WAIT in the pageout code. Both fixes involve adjusting the lockmgr's timeout capability so locks obtained with timeouts do not interfere with locks obtained without a timeout. Hopefully MFC: before the 4.5 release
$FreeBSD$ ifs- inode filesystem -- ifs is the beginning of a little experiment - to remove the namespace from ffs. FFS is a good generic filesystem, however in high volume activities today (eg web, mail, cache, news) one thing causes a rather huge resource drain - the namespace. Having to maintain the directory structures means wasting a lot of disk IO and/or memory. Since most applications these days have their own database containing object->ufs namespace mappings, we can effectively bypass namespace together and talk instead to just inodes. This is a big big hack(1), but its also a start. It should speed up news servers and cache servers quite a bit - since the time spent in open() and unlink() is drastically reduced - however, it is nowhere near optimal. We'll cover that shortly. (1) not hack as evil and ugly, hack as in non-optimal solution. The optimal solution hasn't quite presented itself yet. :-) How it works: -- Basically ifs is a copy of ffs, overriding some vfs/vnops. (Yes, hack.) I didn't see the need in duplicating all of sys/ufs/ffs to get this off the ground. File creation is done through a special file - 'newfile' . When newfile is called, the system allocates and returns an inode. Note that newfile is done in a cloning fashion: fd = open("newfile", O_CREAT|O_RDWR, 0644); fstat(fd, &st); printf("new file is %d\n", (int)st.st_ino); Once you have created a file, you can open() and unlink() it by its returned inode number retrieved from the stat call, ie: fd = open("5", O_RDWR); The creation permissions depend entirely if you have write access to the root directory of the filesystem. Why its nowhere near optimal -- When doing file allocation in FFS, it tries to reduce the disk seeks by allocating new files inside the cylinder group of the parent directory, if possible. In this scheme, we've had to drop that. Files are allocated sequentially, filling up cylinder groups as we go along. Its not very optimal, more research will have to be done into how cylinder group locality can be bought back into this. (It entirely depends upon the benefits here..) Allowing create by inode number requires quite a bit of code rewrite, and in the test applications here I didn't need it. Maybe in the next phase I might look at allowing create by inode number, feedback, please. SOFTUPDATES will *NOT* work here - especially in unlink() where I've just taken a large axe to it. I've tried to keep as much of the softupdates call stubs in as much as possible, but I haven't looked at the softupdates code. My reasoning was that because there's no directory metadata anymore, softupdates isn't as important. Besides, fsck's are so damn quick .. Extras -- I've taken the liberty of applying a large axe to bits of fsck - stripping out namespace checks. As far as I can *TELL*, its close, however, I'd like it if someone fsck clued poked me back on what I missed. There's also a modified copy of mount that will mount a fs type 'ifs'. Again, its just the normal mount with s/"ufs"/"ifs"/g, async/noatime/etc mount options work just as normal. I haven't supplied an ifs 'newfs' - use FFS newfs to create a blank drive. That creates the root directory, which you still do DEFINITELY need. However, ifs updates on the drive will not update directory entries in '.'. There is a 1:1 mapping between the inode numbers in open()/stat() and the inodes on disk. You don't get access to inodes 0-2. They don't show up in a readdir. I'll work on making 2 avaliable, but since the current ufs/ffs code assumes things are locked against the root inode which is 2 .. You can find these utilities in src/sbin/mount_ifs and src/sbin/fsck_ifs . Yes, this means that you can tie in ifs partitions in your bootup sequence. TODO: -- * Implement cookies for NFS (Realise that this is a huge hack which uses the existing UFS/FFS code. Therefore its nowhere near as optimal as it could be, and things aren't as easy to add as one might think. Especially 'fake' files. :-) -- Adrian Chadd <adrianFreeBSD.org>