Rui's Blog
  • Rui's Blog/Paper Reading Notes - Introduction
  • Personal Blog
    • Personal Blog - Index
      • How to Create Picture-in-Picture Effect / Video Overlay for a Presentation Video
      • How to Do Your Part to Protect the Environment in Wisconsin
      • How to Get a Driver's License in Wisconsin
      • How to Travel from the U.S. to China onboard AA127 in June 2021
      • How to Transfer Credits Back to UW-Madison
      • Resources on Learning Academic Writing (for Computer Science)
    • Towards applying to CS Ph.D. programs
  • Machine Learning Systems
    • Machine Learning Systems - Index
      • MLSys Papers - Short Notes
      • [2011 NSDI] Dominant Resource Fairness: Fair Allocation of Multiple Resource Types
      • [2014 OSDI] Scaling Distributed Machine Learning with the Parameter Server
      • [2018 OSDI] Gandiva: Introspective Cluster Scheduling for Deep Learning
      • [2018 SIGCOMM] Chameleon: Scalable Adaptation of Video Analytics via Temporal and Cross-camera ...
      • [2018 NIPS] Dynamic Space-Time Scheduling for GPU Inference
      • [2019 ATC] Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads
      • [2019 NSDI] Tiresias: A GPU Cluster Manager for Distributed Deep Learning
      • [2019 SOSP] ByteScheduler: A Generic Communication Scheduler for Distributed DNN Training ...
      • [2019 SOSP] PipeDream: Generalized Pipeline Parallelism for DNN Training
      • [2019 SOSP] Parity Models: Erasure-Coded Resilience for Prediction Serving Systems
      • [2019 NIPS] GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
      • [2019 SC] ZeRO: memory optimizations toward training trillion parameter models
      • [2020 OSDI] Gavel: Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads
      • [2020 OSDI] AntMan: Dynamic Scaling on GPU Clusters for Deep Learning
      • [2020 OSDI] BytePS: A High Performance and Generic Framework for Distributed DNN Training
      • [2020 SIGCOMM] Reducto: On-Camera Filtering for Resource-Efficient Real-Time Video Analytics
        • [2020 MLSys] Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications
      • [2020 EuroSys] AlloX: Compute Allocation in Hybrid Clusters
      • [2020 VLDB] PyTorch Distributed: Experiences on Accelerating Data Parallel Training
      • [2020 NetAI] Is Network the Bottleneck of Distributed Training?
      • [2020 NSDI] Themis: Fair and Efficient GPU Cluster Scheduling
      • [2021 MLSys] Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification
      • [2021 VLDB] Analyzing and Mitigating Data Stalls in DNN Training
      • [2021 FAST] CheckFreq: Frequent, Fine-Grained DNN Checkpointing
      • [2021 EuroMLSys] Interference-Aware Scheduling for Inference Serving
      • [2021 OSDI] Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
      • [2021 MLSys] Wavelet: Efficient DNN Training with Tick-Tock Scheduling
      • [2021 NSDI] SwitchML: Scaling Distributed Machine Learning with In-Network Aggregation
    • Big Data Systems - Index
      • Big Data Systems Papers - Short Notes
      • [2003 SOSP] The Google File System
      • [2004 OSDI] MapReduce: Simplified Data Processing on Large Clusters
      • [2010 SIGMOD] Pregel: A System for Large-Scale Graph Processing
      • [2011 NSDI] Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
      • [2012 NSDI] Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster ...
      • [2012 OSDI] PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs
      • [2019 FAST] DistCache: Provable Load Balancing for Large-Scale Storage Systems with Distributed...
      • [2021 HotOS] From Cloud Computing to Sky Computing
      • [2021 EuroSys] NextDoor: Accelerating graph sampling for graph machine learning using GPUs
  • Earlier Readings & Notes
    • High Performance Computing Course Notes
      • Lecture 1: Course Overview
      • Lecture 2: From Code to Instructions. The FDX Cycle. Instruction Level Parallelism.
      • Lecture 3: Superscalar architectures. Measuring Computer Performance. Memory Aspects.
      • Lecture 4: The memory hierarchy. Caches.
      • Lecture 5: Caches, wrap up. Virtual Memory.
      • Lecture 6: The Walls to Sequential Computing. Moore’s Law.
      • Lecture 7: Parallel Computing. Flynn's Taxonomy. Amdahl's Law.
      • Lecture 8: GPU Computing Intro. The CUDA Programming Model. CUDA Execution Configuration.
      • Lecture 9: GPU Memory Spaces
      • Lecture 10: GPU Scheduling Issues.
      • Lecture 11: Execution Divergence. Control Flow in CUDA. CUDA Shared Memory Issues.
      • Lecture 12: Global Memory Access Patterns and Implications.
      • Lecture 13: Atomic operations in CUDA. GPU ode optimization rules of thumb.
      • Lecture 14: CUDA Case Studies. (1) 1D Stencil Operation. (2) Vector Reduction in CUDA.
      • Lecture 15: CUDA Case Studies. (3) Parallel Prefix Scan on the GPU. Using Multiple Streams in CUDA.
      • Lecture 16: Streams, and overlapping data copy with execution.
      • Lecture 17: GPU Computing: Advanced Features.
      • Lecture 18: GPU Computing with thrust and cub.
      • Lecture 19: Hardware aspects relevant in multi-core, shared memory parallel computing.
      • Lecture 20: Multi-core Parallel Computing with OpenMP. Parallel Regions.
      • Lecture 21: OpenMP Work Sharing.
      • Lecture 22: OpenMP Work Sharing
      • Lecture 23: OpenMP NUMA Aspects. Caching and OpenMP.
      • Lecture 24: Critical Thinking. Code Optimization Aspects.
      • Lecture 25: Computing with Supercomputers.
      • Lecture 26: MPI Parallel Programming General Introduction. Point-to-Point Communication.
      • Lecture 27: MPI Parallel Programming Point-to-Point communication: Blocking vs. Non-blocking sends.
      • Lecture 28: MPI Parallel Programming: MPI Collectives. Overview of topics covered in the class.
    • Cloud Computing Course Notes
      • 1.1 Introduction to Clouds, MapReduce
      • 1.2 Gossip, Membership, and Grids
      • 1.3 P2P Systems
      • 1.4 Key-Value Stores, Time, and Ordering
      • 1.5 Classical Distributed Algorithms
      • 4.1 Spark, Hortonworks, HDFS, CAP
      • 4.2 Large Scale Data Storage
    • Operating Systems Papers - Index
      • CS 736 @ UW-Madison Fall 2020 Reading List
      • All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications
      • ARC: A Self-Tuning, Low Overhead Replacement Cache
      • A File is Not a File: Understanding the I/O Behavior of Apple Desktop Applications
      • Biscuit: The benefits and costs of writing a POSIX kernel in a high-level language
      • Data Domain: Avoiding the Disk Bottleneck in the Data Domain Deduplication File System
      • Disco: Running Commodity Operating Systems on Scalable Multiprocessors
      • FFS: A Fast File System for UNIX
      • From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees
      • LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation
      • LFS: The Design and Implementation of a Log-Structured File System
      • Lottery Scheduling: Flexible Proportional-Share Resource Management
      • Memory Resource Management in VMware ESX Server
      • Monotasks: Architecting for Performance Clarity in Data Analytics Frameworks
      • NFS: Sun's Network File System
      • OptFS: Optimistic Crash Consistency
      • RAID: A Case for Redundant Arrays of Inexpensive Disks
      • RDP: Row-Diagonal Parity for Double Disk Failure Correction
      • Resource Containers: A New Facility for Resource Management in Server Systems
      • ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay
      • Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism
      • SnapMirror: File-System-Based Asynchronous Mirroring for Disaster Recovery
      • The Linux Scheduler: a Decade of Wasted Cores
      • The Unwritten Contract of Solid State Drives
      • Venti: A New Approach to Archival Storage
    • Earlier Notes
      • How to read a paper
  • FIXME
    • Template for Paper Reading Notes
Powered by GitBook
On this page
  • One-line Summary
  • Paper Structure Outline
  • Background & Motivation
  • Design and Implementation
  • Cylinder groups
  • Layout policies
  • Larger blocks & fragments within blocks
  • File system parameterization (rotationally optimal block placement)
  • Evaluation
  • New Vocabulary
  • Links

Was this helpful?

  1. Earlier Readings & Notes
  2. Operating Systems Papers - Index

FFS: A Fast File System for UNIX

One-line Summary

As a reimplementation of the UNIX file system, FFS addresses the problem of low data throughput rates/bandwidth usage by making the file system "disk aware".

Paper Structure Outline

  1. Introduction

  2. Old file system

  3. New file system organization

    1. Optimizing storage utilization

    2. File system parameterization

    3. Layout policies

  4. Performance

  5. File system functional enhancements

    1. Long file names

    2. File locking

    3. Symbolic links

    4. Rename

    5. Quotas

  6. Acknowledgments

Background & Motivation

As a predecessor of FFS, the original UNIX operating system (very-simple file system, VSFS) was simple and easy-to-use. It had some major problems, though:

  • The biggest problem was terrible performance: As measured in this paper, in some cases, the disk bandwidth utilization was a mere 3%. The throughput was also low.

  • This was because of low locality: The old UNIX file system treated the disk like it was a random-access memory, so inodes and data blocks were scattered across the disk.

  • Another problem was file system fragmentation: The free space was not carefully managed. As data come and go, the file system got fragmented in that the free list ended up pointing to blocks spread across the disk, so accessing a logically contiguous file required going back and forth across the disk. (This problem was originally solved by disk defragmentation tools.)

  • The original block size was too small (512 bytes): A smaller size was good for minimizing internal fragmentation (waste within the block) but bad for transfers as each block required a positioning overhead.

To resolve these issues, the solution is to make the file system "disk aware".

Design and Implementation

Cylinder groups

By putting two files in the same group, FFS ensures that accessing one after the other will not result in long seeks across the disk.

The superblock is duplicated and stored in different cylinder groups with a rotated location/offset. This reduces the chance of data loss due to corruption of the superblock (top platter damage).

Layout policies

The basic mantra is simple: Keep related stuff together and keep unrelated stuff far apart.

  • Placing directories: FFS finds the cylinder group with a low number of allocated directories and a high number of free inodes.

  • Placing files: First, data blocks of a file in the same group as its inode are allocated. Second, file inodes are allocated in the same cylinder group with other files within the same directory.

A problem is the large files exception: Single, large files (> 48KB) can fill nearly all of a group, while ideally none of the cylinder groups should be completely full. The solution is to redirect block allocation to a different cylinder group when a file exceeds 48KB and at every megabyte thereafter.

The first spill over point at 48 kilobytes is the point at which a file on a 4096 byte block file system first requires a single indirect block. This appears to be a natural first point at which to redirect block allocation. The other spillover points are chosen with the intent of forcing block allocation to be redirected when a file has used about 25% of the data blocks in a cylinder group. In observing the new file system in day to day use, the heuristics appear to work well in minimizing the number of completely filled cylinder groups.

Larger blocks & fragments within blocks

The size of a file system block is increased from 512 bytes to 4096 bytes (4 KB). This improves performance because:

  1. Each disk transfer access more data

  2. More files can be described without the need to access indirect blocks (since the direct blocks now contain more data)

File system parameterization (rotationally optimal block placement)

Evaluation

FFS also introduced some neat file system functional enhancements that are routines of today's operating systems and likely help FFS gain a stronger user base:

  • Long file names: File names can now be of nearly arbitrary length (previously: 8 characters. now: 255 characters)

  • File locking: Programs can now apply advisory shared or exclusive locks at the granularity of files

  • Symbolic links: Users can now create an "alias" to any other file or directory on a system and thus are much more flexible

  • File renaming: Atomic rename() operation

  • Quotas: Restricts the amount of file system resources (#inodes & #disk blocks) that a user can obtain

New Vocabulary

  • Hard locks & advisory locks: A hard lock is always enforced when a program tries to access a file, while an advisory lock is only applied when it is requested by a program.

Links

PreviousDisco: Running Commodity Operating Systems on Scalable MultiprocessorsNextFrom WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees

Last updated 4 years ago

Was this helpful?

Paper PDF
FFS in OSTEP
FFS in CS 537 @ UW-Madison
2MB
L2-FFS+LFS.pptx
Prof. Andrea's slides on FFS and LFS
The old UNIX file system.
A disk is divided into several cylinder groups.
IRL, as disks hide details of their geometry from clients, modern file systems organize the drive into block groups, each of which is a consecutive portion of the disk's address space. Note that IRL each group will contain many more blocks.
What FFS keeps within a single cylinder group. ib and db are inote bitmap and data bitmap that tracks whether the inodes and data blocks of the group are allocated. Bitmaps replaces freelists as bitmaps are faster to update/lookup and are space efficient.
Think about this: In sequential reads, FFS firstly reads block 0. By the time the read is finished, block 1 had rotated under the head and to get to block 1, we now need a full rotation. FFS resolves this by figuring out the specific performance parameters of the disk and use those to decide on the exact staggered layout scheme.
FFS is faster for both reads and writes and the disk bandwidth (~3% to 47%)