OptFS: Optimistic Crash Consistency
One-line Summary
The authors present the optimistic crash consistency and an optimistic journaling system (OptFS) which implements optimistic crash consistency and maintains consistency to the same extent as pessimistic journaling while achieving the same performance as with probabilistic consistency.
Paper Structure Outline
Introduction
Pessimistic Crash Consistency
Disk Interface
Pessimistic Journaling
Flushing Performance Impact
Probabilistic Crash Consistency
Quantifying Probabilistic Consistency
Factors affecting P_inc (probability of inconsistency)
Workload
Queue Size
Summary
Optimistic Crash Consistency
Asynchronous Durability Notification
Optimistic Consistency Properties
Optimistic Techniques
In-Order Journal Recovery
In-Order Journal Release
Checksums
Background Write after Notification
Reuse after Notification
Selective Data Journaling
Durability vs. Consistency
Implementation of OptFS
Asynchronous Durability Notifications
Handling Data Blocks
Optimistic Techniques
Evaluation
Reliability
Performance
Resource consumption
Journal size
Case Studies
Atomic Update within Gedit
Temporary Logging in SQLite
Related Work
Conclusion
Background & Motivation
In file system journaling, pessimistic journaling (default) incurs extra work due to unnecessary flushing (assuming crash does not happen). In probabilistic journaling, typical operations may or may not result in much reordering, so the disk is only sometimes in an inconsistent state and thus flushes can be disabled. Although probabilistic crash consistency does not guarantee consistency after a crash, many practitioners use it due to performance degradation from flushes.
D: Data write
J_M: Loggin Metadata
J_C: Logging Commit
M: Checkpointing
The idea of optimistic crash consistency comes from the optimistic concurrency control (OCC) from distributed transaction systems.
Design and Implementation
OptFS decouples fsync()
into two novel primitives: dsync()
for immediate durability as well as ordering, and osync()
for write ordering/atomic updates but only eventual durability.
Optimistic techniques
A number of techniques are used: In-order journal recovery and release, checksums, background writes after notification, reuse after notification, selective data journaling.
Checksums
Checksums (over D and J_M into J_C) remove the need for ordering writes. Optimistic crash consistency eliminates the need for ordering during transaction commit by generalizing metadata transactional checksums to include data blocks. During recovery, transactions are discarded upon checksum mismatch.
Asynchronous Durability Notifications (ADN)
ADNs are used to delay checkpointing a transaction until it has been committed durably (M is only written when D, J_M and J_C are all written). Fortunately, this delay does not affect application performance, as applications block until the transaction is committed, not until it is checkpointed. Additional techniques are required for correctness in scenarios such as block reuse and overwrite. ADNs improve performance because:
The disk can schedule blocks from the cache to platter in the best order
The file system can do other work while waiting for ADN
(Main) The user applications do not have to wait for ADN
Selective data journaling
Implementation
OptFS is implemented as a variant of the ext4 file system inside Linux 3.2.
Evaluation
New Vocabulary
Links
Thanks to Guanzhou Hu & Pei-Hsuan Wu for the paper review notes!
Last updated