freebsd-skq/sys/ufs/ifs
Bosko Milekic 9ed346bab0 Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)
2001-02-09 06:11:45 +00:00
..
ifs_extern.h
ifs_lookup.c
ifs_subr.c
ifs_vfsops.c Change and clean the mutex lock interface. 2001-02-09 06:11:45 +00:00
ifs_vnops.c
README

$FreeBSD$

ifs- inode filesystem
--


ifs is the beginning of a little experiment - to remove the namespace
from ffs. FFS is a good generic filesystem, however in high volume
activities today (eg web, mail, cache, news) one thing causes a rather
huge resource drain - the namespace.

Having to maintain the directory structures means wasting a lot of
disk IO and/or memory. Since most applications these days have their
own database containing object->ufs namespace mappings, we can effectively
bypass namespace together and talk instead to just inodes.

This is a big big hack(1), but its also a start. It should speed up news
servers and cache servers quite a bit - since the time spent in open()
and unlink() is drastically reduced - however, it is nowhere near
optimal. We'll cover that shortly.

(1) not hack as evil and ugly, hack as in non-optimal solution. The
    optimal solution hasn't quite presented itself yet. :-)



How it works:
--

Basically ifs is a copy of ffs, overriding some vfs/vnops. (Yes, hack.)
I didn't see the need in duplicating all of sys/ufs/ffs to get this
off the ground.

File creation is done through a special file - 'newfile' . When newfile
is called, the system allocates and returns an inode. Note that newfile
is done in a cloning fashion:

fd = open("newfile", O_CREAT|O_RDWR, 0644);
fstat(fd, &st);

printf("new file is %d\n", (int)st.st_ino); 

Once you have created a file, you can open() and unlink() it by its returned
inode number retrieved from the stat call, ie:

fd = open("5", O_RDWR);

The creation permissions depend entirely if you have write access to the
root directory of the filesystem.


Why its nowhere near optimal
--

When doing file allocation in FFS, it tries to reduce the disk seeks by
allocating new files inside the cylinder group of the parent directory, if
possible.  In this scheme, we've had to drop that. Files are allocated
sequentially, filling up cylinder groups as we go along. Its not very optimal,
more research will have to be done into how cylinder group locality can be
bought back into this. (It entirely depends upon the benefits here..)

Allowing create by inode number requires quite a bit of code rewrite, and in
the test applications here I didn't need it. Maybe in the next phase I might
look at allowing create by inode number, feedback, please.

SOFTUPDATES will *NOT* work here - especially in unlink() where I've just
taken a large axe to it. I've tried to keep as much of the softupdates call
stubs in as much as possible, but I haven't looked at the softupdates code.
My reasoning was that because there's no directory metadata anymore,
softupdates isn't as important. Besides, fsck's are so damn quick ..

Extras
--

I've taken the liberty of applying a large axe to bits of fsck - stripping out
namespace checks. As far as I can *TELL*, its close, however, I'd like it if
someone fsck clued poked me back on what I missed.

There's also a modified copy of mount that will mount a fs type 'ifs'. Again,
its just the normal mount with s/"ufs"/"ifs"/g, async/noatime/etc mount
options work just as normal.

I haven't supplied an ifs 'newfs' - use FFS newfs to create a blank drive.
That creates the root directory, which you still do DEFINITELY need.
However, ifs updates on the drive will not update directory entries in '.'.
There is a 1:1 mapping between the inode numbers in open()/stat() and the
inodes on disk. You don't get access to inodes 0-2. They don't show up
in a readdir. I'll work on making 2 avaliable, but since the current ufs/ffs
code assumes things are locked against the root inode which is 2 ..

You can find these utilities in src/sbin/mount_ifs and src/sbin/fsck_ifs .
Yes, this means that you can tie in ifs partitions in your bootup
sequence.

TODO:
--

* Implement cookies for NFS

  (Realise that this is a huge hack which uses the existing UFS/FFS code.
   Therefore its nowhere near as optimal as it could be, and things aren't
   as easy to add as one might think. Especially 'fake' files. :-)



--
Adrian Chadd
<adrianFreeBSD.org>