140 lines
6.0 KiB
Plaintext
140 lines
6.0 KiB
Plaintext
|
# @(#)README 8.1 (Berkeley) 6/11/93
|
||
|
|
||
|
The file system is reasonably stable, but incomplete. There are
|
||
|
places where cleaning performance can be improved dramatically (see
|
||
|
comments in lfs_syscalls.c). For details on the implementation,
|
||
|
performance and why garbage collection always wins, see Dr. Margo
|
||
|
Seltzer's thesis available for anonymous ftp from toe.cs.berkeley.edu,
|
||
|
in the directory pub/personal/margo/thesis.ps.Z, or the January 1993
|
||
|
USENIX paper.
|
||
|
|
||
|
Missing Functionality:
|
||
|
Multiple block sizes and/or fragments are not yet implemented.
|
||
|
|
||
|
----------
|
||
|
The disk is laid out in segments. The first segment starts 8K into the
|
||
|
disk (the first 8K is used for boot information). Each segment is composed
|
||
|
of the following:
|
||
|
|
||
|
An optional super block
|
||
|
One or more groups of:
|
||
|
segment summary
|
||
|
0 or more data blocks
|
||
|
0 or more inode blocks
|
||
|
|
||
|
The segment summary and inode/data blocks start after the super block (if
|
||
|
present), and grow toward the end of the segment.
|
||
|
|
||
|
_______________________________________________
|
||
|
| | | | |
|
||
|
| summary | data/inode | summary | data/inode |
|
||
|
| block | blocks | block | blocks | ...
|
||
|
|_________|____________|_________|____________|
|
||
|
|
||
|
The data/inode blocks following a summary block are described by the
|
||
|
summary block. In order to permit the segment to be written in any order
|
||
|
and in a forward direction only, a checksum is calculated across the
|
||
|
blocks described by the summary. Additionally, the summary is checksummed
|
||
|
and timestamped. Both of these are intended for recovery; the former is
|
||
|
to make it easy to determine that it *is* a summary block and the latter
|
||
|
is to make it easy to determine when recovery is finished for partially
|
||
|
written segments. These checksums are also used by the cleaner.
|
||
|
|
||
|
Summary block (detail)
|
||
|
________________
|
||
|
| sum cksum |
|
||
|
| data cksum |
|
||
|
| next segment |
|
||
|
| timestamp |
|
||
|
| FINFO count |
|
||
|
| inode count |
|
||
|
| flags |
|
||
|
|______________|
|
||
|
| FINFO-1 | 0 or more file info structures, identifying the
|
||
|
| . | blocks in the segment.
|
||
|
| . |
|
||
|
| . |
|
||
|
| FINFO-N |
|
||
|
| inode-N |
|
||
|
| . |
|
||
|
| . |
|
||
|
| . | 0 or more inode daddr_t's, identifying the inode
|
||
|
| inode-1 | blocks in the segment.
|
||
|
|______________|
|
||
|
|
||
|
Inode blocks are blocks of on-disk inodes in the same format as those in
|
||
|
the FFS. However, spare[0] contains the inode number of the inode so we
|
||
|
can find a particular inode on a page. They are packed page_size /
|
||
|
sizeof(inode) to a block. Data blocks are exactly as in the FFS. Both
|
||
|
inodes and data blocks move around the file system at will.
|
||
|
|
||
|
The file system is described by a super-block which is replicated and
|
||
|
occurs as the first block of the first and other segments. (The maximum
|
||
|
number of super-blocks is MAXNUMSB). Each super-block maintains a list
|
||
|
of the disk addresses of all the super-blocks. The super-block maintains
|
||
|
a small amount of checkpoint information, essentially just enough to find
|
||
|
the inode for the IFILE (fs->lfs_idaddr).
|
||
|
|
||
|
The IFILE is visible in the file system, as inode number IFILE_INUM. It
|
||
|
contains information shared between the kernel and various user processes.
|
||
|
|
||
|
Ifile (detail)
|
||
|
________________
|
||
|
| cleaner info | Cleaner information per file system. (Page
|
||
|
| | granularity.)
|
||
|
|______________|
|
||
|
| segment | Space available and last modified times per
|
||
|
| usage table | segment. (Page granularity.)
|
||
|
|______________|
|
||
|
| IFILE-1 | Per inode status information: current version #,
|
||
|
| . | if currently allocated, last access time and
|
||
|
| . | current disk address of containing inode block.
|
||
|
| . | If current disk address is LFS_UNUSED_DADDR, the
|
||
|
| IFILE-N | inode is not in use, and it's on the free list.
|
||
|
|______________|
|
||
|
|
||
|
|
||
|
First Segment at Creation Time:
|
||
|
_____________________________________________________________
|
||
|
| | | | | | | |
|
||
|
| 8K pad | Super | summary | inode | ifile | root | l + f |
|
||
|
| | block | | block | | dir | dir |
|
||
|
|________|_______|_________|_______|_______|_______|_______|
|
||
|
^
|
||
|
Segment starts here.
|
||
|
|
||
|
Some differences from the Sprite LFS implementation.
|
||
|
|
||
|
1. The LFS implementation placed the ifile metadata and the super block
|
||
|
at fixed locations. This implementation replicates the super block
|
||
|
and puts each at a fixed location. The checkpoint data is divided into
|
||
|
two parts -- just enough information to find the IFILE is stored in
|
||
|
two of the super blocks, although it is not toggled between them as in
|
||
|
the Sprite implementation. (This was deliberate, to avoid a single
|
||
|
point of failure.) The remaining checkpoint information is treated as
|
||
|
a regular file, which means that the cleaner info, the segment usage
|
||
|
table and the ifile meta-data are stored in normal log segments.
|
||
|
(Tastes great, less filling...)
|
||
|
|
||
|
2. The segment layout is radically different in Sprite; this implementation
|
||
|
uses something a lot like network framing, where data/inode blocks are
|
||
|
written asynchronously, and a checksum is used to validate any set of
|
||
|
summary and data/inode blocks. Sprite writes summary blocks synchronously
|
||
|
after the data/inode blocks have been written and the existence of the
|
||
|
summary block validates the data/inode blocks. This permits us to write
|
||
|
everything contiguously, even partial segments and their summaries, whereas
|
||
|
Sprite is forced to seek (from the end of the data inode to the summary
|
||
|
which lives at the end of the segment). Additionally, writing the summary
|
||
|
synchronously should cost about 1/2 a rotation per summary.
|
||
|
|
||
|
3. Sprite LFS distinguishes between different types of blocks in the segment.
|
||
|
Other than inode blocks and data blocks, we don't.
|
||
|
|
||
|
4. Sprite LFS traverses the IFILE looking for free blocks. We maintain a
|
||
|
free list threaded through the IFILE entries.
|
||
|
|
||
|
5. The cleaner runs in user space, as opposed to kernel space. It shares
|
||
|
information with the kernel by reading/writing the IFILE and through
|
||
|
cleaner specific system calls.
|
||
|
|