Rui's Blog
  • Rui's Blog/Paper Reading Notes - Introduction
  • Personal Blog
    • Personal Blog - Index
      • How to Create Picture-in-Picture Effect / Video Overlay for a Presentation Video
      • How to Do Your Part to Protect the Environment in Wisconsin
      • How to Get a Driver's License in Wisconsin
      • How to Travel from the U.S. to China onboard AA127 in June 2021
      • How to Transfer Credits Back to UW-Madison
      • Resources on Learning Academic Writing (for Computer Science)
    • Towards applying to CS Ph.D. programs
  • Machine Learning Systems
    • Machine Learning Systems - Index
      • MLSys Papers - Short Notes
      • [2011 NSDI] Dominant Resource Fairness: Fair Allocation of Multiple Resource Types
      • [2014 OSDI] Scaling Distributed Machine Learning with the Parameter Server
      • [2018 OSDI] Gandiva: Introspective Cluster Scheduling for Deep Learning
      • [2018 SIGCOMM] Chameleon: Scalable Adaptation of Video Analytics via Temporal and Cross-camera ...
      • [2018 NIPS] Dynamic Space-Time Scheduling for GPU Inference
      • [2019 ATC] Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads
      • [2019 NSDI] Tiresias: A GPU Cluster Manager for Distributed Deep Learning
      • [2019 SOSP] ByteScheduler: A Generic Communication Scheduler for Distributed DNN Training ...
      • [2019 SOSP] PipeDream: Generalized Pipeline Parallelism for DNN Training
      • [2019 SOSP] Parity Models: Erasure-Coded Resilience for Prediction Serving Systems
      • [2019 NIPS] GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
      • [2019 SC] ZeRO: memory optimizations toward training trillion parameter models
      • [2020 OSDI] Gavel: Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads
      • [2020 OSDI] AntMan: Dynamic Scaling on GPU Clusters for Deep Learning
      • [2020 OSDI] BytePS: A High Performance and Generic Framework for Distributed DNN Training
      • [2020 SIGCOMM] Reducto: On-Camera Filtering for Resource-Efficient Real-Time Video Analytics
        • [2020 MLSys] Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications
      • [2020 EuroSys] AlloX: Compute Allocation in Hybrid Clusters
      • [2020 VLDB] PyTorch Distributed: Experiences on Accelerating Data Parallel Training
      • [2020 NetAI] Is Network the Bottleneck of Distributed Training?
      • [2020 NSDI] Themis: Fair and Efficient GPU Cluster Scheduling
      • [2021 MLSys] Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification
      • [2021 VLDB] Analyzing and Mitigating Data Stalls in DNN Training
      • [2021 FAST] CheckFreq: Frequent, Fine-Grained DNN Checkpointing
      • [2021 EuroMLSys] Interference-Aware Scheduling for Inference Serving
      • [2021 OSDI] Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
      • [2021 MLSys] Wavelet: Efficient DNN Training with Tick-Tock Scheduling
      • [2021 NSDI] SwitchML: Scaling Distributed Machine Learning with In-Network Aggregation
    • Big Data Systems - Index
      • Big Data Systems Papers - Short Notes
      • [2003 SOSP] The Google File System
      • [2004 OSDI] MapReduce: Simplified Data Processing on Large Clusters
      • [2010 SIGMOD] Pregel: A System for Large-Scale Graph Processing
      • [2011 NSDI] Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
      • [2012 NSDI] Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster ...
      • [2012 OSDI] PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs
      • [2019 FAST] DistCache: Provable Load Balancing for Large-Scale Storage Systems with Distributed...
      • [2021 HotOS] From Cloud Computing to Sky Computing
      • [2021 EuroSys] NextDoor: Accelerating graph sampling for graph machine learning using GPUs
  • Earlier Readings & Notes
    • High Performance Computing Course Notes
      • Lecture 1: Course Overview
      • Lecture 2: From Code to Instructions. The FDX Cycle. Instruction Level Parallelism.
      • Lecture 3: Superscalar architectures. Measuring Computer Performance. Memory Aspects.
      • Lecture 4: The memory hierarchy. Caches.
      • Lecture 5: Caches, wrap up. Virtual Memory.
      • Lecture 6: The Walls to Sequential Computing. Moore’s Law.
      • Lecture 7: Parallel Computing. Flynn's Taxonomy. Amdahl's Law.
      • Lecture 8: GPU Computing Intro. The CUDA Programming Model. CUDA Execution Configuration.
      • Lecture 9: GPU Memory Spaces
      • Lecture 10: GPU Scheduling Issues.
      • Lecture 11: Execution Divergence. Control Flow in CUDA. CUDA Shared Memory Issues.
      • Lecture 12: Global Memory Access Patterns and Implications.
      • Lecture 13: Atomic operations in CUDA. GPU ode optimization rules of thumb.
      • Lecture 14: CUDA Case Studies. (1) 1D Stencil Operation. (2) Vector Reduction in CUDA.
      • Lecture 15: CUDA Case Studies. (3) Parallel Prefix Scan on the GPU. Using Multiple Streams in CUDA.
      • Lecture 16: Streams, and overlapping data copy with execution.
      • Lecture 17: GPU Computing: Advanced Features.
      • Lecture 18: GPU Computing with thrust and cub.
      • Lecture 19: Hardware aspects relevant in multi-core, shared memory parallel computing.
      • Lecture 20: Multi-core Parallel Computing with OpenMP. Parallel Regions.
      • Lecture 21: OpenMP Work Sharing.
      • Lecture 22: OpenMP Work Sharing
      • Lecture 23: OpenMP NUMA Aspects. Caching and OpenMP.
      • Lecture 24: Critical Thinking. Code Optimization Aspects.
      • Lecture 25: Computing with Supercomputers.
      • Lecture 26: MPI Parallel Programming General Introduction. Point-to-Point Communication.
      • Lecture 27: MPI Parallel Programming Point-to-Point communication: Blocking vs. Non-blocking sends.
      • Lecture 28: MPI Parallel Programming: MPI Collectives. Overview of topics covered in the class.
    • Cloud Computing Course Notes
      • 1.1 Introduction to Clouds, MapReduce
      • 1.2 Gossip, Membership, and Grids
      • 1.3 P2P Systems
      • 1.4 Key-Value Stores, Time, and Ordering
      • 1.5 Classical Distributed Algorithms
      • 4.1 Spark, Hortonworks, HDFS, CAP
      • 4.2 Large Scale Data Storage
    • Operating Systems Papers - Index
      • CS 736 @ UW-Madison Fall 2020 Reading List
      • All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications
      • ARC: A Self-Tuning, Low Overhead Replacement Cache
      • A File is Not a File: Understanding the I/O Behavior of Apple Desktop Applications
      • Biscuit: The benefits and costs of writing a POSIX kernel in a high-level language
      • Data Domain: Avoiding the Disk Bottleneck in the Data Domain Deduplication File System
      • Disco: Running Commodity Operating Systems on Scalable Multiprocessors
      • FFS: A Fast File System for UNIX
      • From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees
      • LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation
      • LFS: The Design and Implementation of a Log-Structured File System
      • Lottery Scheduling: Flexible Proportional-Share Resource Management
      • Memory Resource Management in VMware ESX Server
      • Monotasks: Architecting for Performance Clarity in Data Analytics Frameworks
      • NFS: Sun's Network File System
      • OptFS: Optimistic Crash Consistency
      • RAID: A Case for Redundant Arrays of Inexpensive Disks
      • RDP: Row-Diagonal Parity for Double Disk Failure Correction
      • Resource Containers: A New Facility for Resource Management in Server Systems
      • ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay
      • Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism
      • SnapMirror: File-System-Based Asynchronous Mirroring for Disaster Recovery
      • The Linux Scheduler: a Decade of Wasted Cores
      • The Unwritten Contract of Solid State Drives
      • Venti: A New Approach to Archival Storage
    • Earlier Notes
      • How to read a paper
  • FIXME
    • Template for Paper Reading Notes
Powered by GitBook
On this page
  • One-line Summary
  • Paper Structure Outline
  • Background & Motivation
  • Existing Algorithms
  • Offline Optimal
  • LRU
  • LFU
  • LRU-k (LRU-2 in paper, most common)
  • 2Q (VLDB '94, used in PostgreSQL)
  • MQ (Multiple queues, USENIX '01)
  • Design and Implementation
  • New Vocabulary
  • Links

Was this helpful?

  1. Earlier Readings & Notes
  2. Operating Systems Papers - Index

ARC: A Self-Tuning, Low Overhead Replacement Cache

One-line Summary

ARC is a new cache management policy.

Paper Structure Outline

  1. Introduction

    1. The Problem

    2. Our Contributions

    3. A Brief Outline of the Paper

  2. Prior Work: A Brief Review

    1. Offline Optimal

    2. Recency

    3. Frequency

    4. Recency and Frequency

    5. Temporal Distance Distribution

    6. Caching using Multiple Experts

    7. Ghost Caches

    8. Summary

  3. A Class of Replacement Policies

    1. Double Cache and a Replacement Policy

    2. A New Class of Policies

    3. LRU

  4. Adaptive Replacement Cache

    1. Fixed Replacement Cache

    2. The Policy

    3. Learning

    4. Scan-Resistant

    5. Extra History

  5. Experimental Results

    1. Traces

    2. OLTP

    3. Two Traces: P8 and P12

    4. ARC and 2Q

    5. ARC and MQ

    6. ARC and LRU

    7. ARC is Self-Tuning and Empirically Universal

    8. A Closer Examination of Adaptation in ARC

  6. Conclusions

Background & Motivation

The goals/metrics of this work are:

  • High hit rate

  • Low overhead

    • computational & space

    • locking overhead (high concurrency)

  • Scan resistant: A large file does not blow out the cache

  • No parameters to tune (self-tuning)

  • Online algorithm: Handles any workload; Changes dynamically

  • Balance between recency & frequency

  • Empirically universal: Performs as well as fixed replacement policies

Some assumptions are:

  • The replacement policy (selecting the page to page out) is investigated

  • No prefetching (assume all demand paging)

  • Only look at read policy (no write)

  • The cache receives a continuous stream of requests for pages

Existing Algorithms

Offline Optimal

  • Replaces the furthest page in the future

  • Upper bound on the achievable hit ratio by any online policy

LRU

  • Replaces the least recently used page

  • Not scan resistant

  • High concurrency (lock overhead)

LFU

  • Replaces the least frequently used page

  • High implementation complexity

  • Does not adapt to changes in access patterns (pages with previous high frequency counts may no longer be useful)

LRU-k (LRU-2 in paper, most common)

  • Track time of last 2 accesses of each page

  • Replaces the page with the least recent penultimate reference

  • Expensive algorithm (for maintaining a priority queue)

  • Keeps three working sets: Current working set, previous working set, and the long term working set.

  • Scan resistant

  • Easy to implement

  • Takes into account both recency and frequency

  • 2Q has some sizing parameters (K_in and K_out)

MQ (Multiple queues, USENIX '01)

  • Focused on a replacement algorithm for buffer caches (second level on the storage system, below the traditional OS buffer cache)

  • Put items in m LRU queues according to their frequency. Q_i contains pages that have been seen at least 2^i times but no more than 2^(i+1)-1 times recently

  • An expiration time is associated with every item

  • Maintains Q_out: Ghost cache, contains references instead of actual data

  • Not robust under a wider range of workloads

  • Higher overhead than LRU, ARC, 2Q (need to check time stamps of pages on every request)

Design and Implementation

ARC is:

  • Parameter-less

  • Self-tuning

  • Simple to implement

  • Scan resistant

  • Considers both recency and frequency

Two LRU lists are maintained: L1 contains pages accessed once recently (recency), partitioned into a top portion T1 and a bottom portion B1. Only T1 is in cache. L2 contains pages accessed at least twice recently (frequency), partitioned into a top portion T2 and a bottom portion B2. Only T2 is in cache. The middle line between the two lists can be shifted.

  • Hit in T1 or T2: MRU to T2

  • Miss in B1: MRU to T2, increase p, move T1

  • Miss in B2: MRU of T2, decrease p

  • Miss everywhere (not in B1/B2): MRU in T1, replace some LRU (complicated)

New Vocabulary

  • Demand paging: A disk page is copied into physical memory only if an attempt is made to access it and that page is not already in memory.

Links

  • Thanks to Jane Chen for the paper review notes!

PreviousAll File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent ApplicationsNextA File is Not a File: Understanding the I/O Behavior of Apple Desktop Applications

Last updated 4 years ago

Was this helpful?

(VLDB '94, used in PostgreSQL)

2Q
Paper PDF
Presentation slides by the authors @ University of Houston
Course notes by Prof. Andrea Arpaci-Dusseau