One-line Summary
As a reimplementation of the UNIX file system, FFS addresses the problem of low data throughput rates/bandwidth usage by making the file system "disk aware".
Paper Structure Outline
New file system organization
Optimizing storage utilization
File system parameterization
File system functional enhancements
Background & Motivation
The old UNIX file system. As a predecessor of FFS, the original UNIX operating system (very-simple file system, VSFS) was simple and easy-to-use. It had some major problems, though:
The biggest problem was terrible performance: As measured in this paper, in some cases, the disk bandwidth utilization was a mere 3%. The throughput was also low.
This was because of low locality: The old UNIX file system treated the disk like it was a random-access memory, so inodes and data blocks were scattered across the disk.
Another problem was file system fragmentation: The free space was not carefully managed. As data come and go, the file system got fragmented in that the free list ended up pointing to blocks spread across the disk, so accessing a logically contiguous file required going back and forth across the disk. (This problem was originally solved by disk defragmentation tools.)
The original block size was too small (512 bytes): A smaller size was good for minimizing internal fragmentation (waste within the block) but bad for transfers as each block required a positioning overhead.
To resolve these issues, the solution is to make the file system "disk aware".
Design and Implementation
Cylinder groups
A disk is divided into several cylinder groups. IRL, as disks hide details of their geometry from clients, modern file systems organize the drive into block groups, each of which is a consecutive portion of the disk's address space. Note that IRL each group will contain many more blocks. What FFS keeps within a single cylinder group. ib and db are inote bitmap and data bitmap that tracks whether the inodes and data blocks of the group are allocated. Bitmaps replaces freelists as bitmaps are faster to update/lookup and are space efficient. By putting two files in the same group, FFS ensures that accessing one after the other will not result in long seeks across the disk.
The superblock is duplicated and stored in different cylinder groups with a rotated location/offset. This reduces the chance of data loss due to corruption of the superblock (top platter damage).
Layout policies
The basic mantra is simple: Keep related stuff together and keep unrelated stuff far apart.
Placing directories: FFS finds the cylinder group with a low number of allocated directories and a high number of free inodes.
Placing files: First, data blocks of a file in the same group as its inode are allocated. Second, file inodes are allocated in the same cylinder group with other files within the same directory.
A problem is the large files exception: Single, large files (> 48KB) can fill nearly all of a group, while ideally none of the cylinder groups should be completely full. The solution is to redirect block allocation to a different cylinder group when a file exceeds 48KB and at every megabyte thereafter.
Larger blocks & fragments within blocks
The size of a file system block is increased from 512 bytes to 4096 bytes (4 KB). This improves performance because:
Each disk transfer access more data
More files can be described without the need to access indirect blocks (since the direct blocks now contain more data)
File system parameterization (rotationally optimal block placement)
Think about this: In sequential reads, FFS firstly reads block 0. By the time the read is finished, block 1 had rotated under the head and to get to block 1, we now need a full rotation. FFS resolves this by figuring out the specific performance parameters of the disk and use those to decide on the exact staggered layout scheme. FFS is faster for both reads and writes and the disk bandwidth (~3% to 47%) FFS also introduced some neat file system functional enhancements that are routines of today's operating systems and likely help FFS gain a stronger user base:
Long file names: File names can now be of nearly arbitrary length (previously: 8 characters. now: 255 characters)
File locking: Programs can now apply advisory shared or exclusive locks at the granularity of files
Symbolic links: Users can now create an "alias" to any other file or directory on a system and thus are much more flexible
File renaming: Atomic rename() operation
Quotas: Restricts the amount of file system resources (#inodes & #disk blocks) that a user can obtain
Hard locks & advisory locks: A hard lock is always enforced when a program tries to access a file, while an advisory lock is only applied when it is requested by a program.
Prof. Andrea's slides on FFS and LFS