Rui's Blog
Search…
Big Data Systems - Index

Table of Contents

Infrastructure, Frameworks, and Paradigms

Scheduling & Resource Allocation

  • [NSDI '11] Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
  • [EuroSys '13] Omega: flexible, scalable schedulers for large compute clusters
  • [SoCC '13] Apache Hadoop YARN: Yet Another Resource Negotiator
  • [SoCC '14] Wrangler: Predictable and Faster Jobs using Fewer Resources
  • [OSDI '14] Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing (pdf)
  • [SIGCOMM '14] Tetris: Multi-Resource Packing for Cluster Schedulers (pdf)
  • [ASPLOS '14] Quasar: Resource-Efficient and QoS-Aware Cluster Management
  • [SIGCOMM '15] Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can
  • [OSDI '16] CARBYNE: Altruistic Scheduling in Multi-Resource Clusters (pdf)
  • [OSDI '16] Packing and Dependency-aware Scheduling for Data-Parallel Clusters
  • [NSDI '16] HUG: Multi-Resource Fairness for Correlated and Elastic Demands
  • [EuroSys '16] TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters
  • [SoCC '17] Selecting the best vm across multiple public clouds: A data-driven performance modeling approach
  • [ATC '18] On the diversity of cluster workloads and its impact on research results

Cloud/Serverless Computing

  • [SoCC '17] Occupy the Cloud: Distributed Computing for the 99%
  • [arXiv '19] Cloud Programming Simplified: A Berkeley View on Serverless Computing
  • [SoCC '19] Centralized Core-granular Scheduling for Serverless Functions
  • [SoCC '19] Cirrus: a Serverless Framework for End-to-end ML Workflows
  • [NSDI '19] Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure
  • [SIGMOD '20] Le Taureau: Deconstructing the Serverless Landscape & A Look Forward
  • [SoCC '20] Serverless linear algebra
  • [ATC '20] Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider
  • [SIGMOD '21] Towards Demystifying Serverless Machine Learning Training
  • [OSDI '21] Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads
  • [SoCC '21] Atoll: A Scalable Low-Latency Serverless Platform
  • [NSDI '21] Caerus: Nimble Task Scheduling for Serverless Analytics
  • [ASPLOS '22] Serverless computing on heterogeneous computers
  • [arXiv '22] Groundhog: Efficient Request Isolation in FaaS (pdf)

Network Flow Scheduling

Graphs

Distributed Tracing

  • [Textbook] Distributed Tracing in Practice
  • [SOSP '15] Pivot tracing: dynamic causal monitoring for distributed systems (pdf)
  • [SoCC '16] Principled Workflow-Centric Tracing of Distributed Systems (pdf)
  • [SOSP '17] Canopy: An End-to-End Performance Tracing And Analysis System (pdf)
  • [SoCC '18] Weighted Sampling of Execution Traces: Capturing More Needles and Less Hay (pdf)
  • [SoCC '19] Sifter: Scalable Sampling for Distributed Traces, without Feature Engineering (pdf)
  • [HotNets '21] Snicket: Query-Driven Distributed Tracing (pdf)
  • [NSDI '23] The Benefit of Hindsight: Tracing Edge-Cases in Distributed Systems (pdf)

Caching

New Data, Hardware Models

  • [ISCA '17] In-Datacenter Performance Analysis of a Tensor Processing Unit

Databases

  • [SIGMOD '12] Towards a Unified Architecture for in-RDBMS Analytics
  • [arXiv '13] Bayesian Optimization in a Billion Dimensions via Random Embeddings
  • [SIGMOD '17] Automatic Database Management System Tuning Through Large-scale Machine Learning
  • [HotStorage '20] Too Many Knobs to Tune? Towards Faster Database Tuning by Pre-selecting Important Knobs
  • [arXiv '21] Facilitating Database Tuning with Hyper-Parameter Optimization: A Comprehensive Experimental Evaluation
  • [VLDB '21] An Inquiry into Machine Learning-based Automatic Configuration Tuning Services on Real-World Database Management Systems (pdf)
  • [VLDB '22] LlamaTune: Sample-Efficient DBMS Configuration Tuning

Meta stuff

Copy link
Outline
Table of Contents
Infrastructure, Frameworks, and Paradigms
Scheduling & Resource Allocation
Cloud/Serverless Computing
Network Flow Scheduling
Graphs
Distributed Tracing
Caching
New Data, Hardware Models
Databases
Meta stuff