Big Data Systems - Index
- [SOSP '09] FAWN: A Fast Array of Wimpy Nodes
- [NSDI '11] Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
- [EuroSys '13] Omega: flexible, scalable schedulers for large compute clusters
- [SoCC '13] Apache Hadoop YARN: Yet Another Resource Negotiator
- [SoCC '14] Wrangler: Predictable and Faster Jobs using Fewer Resources
- [ASPLOS '14] Quasar: Resource-Efficient and QoS-Aware Cluster Management
- [SIGCOMM '15] Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can
- [OSDI '16] Packing and Dependency-aware Scheduling for Data-Parallel Clusters
- [NSDI '16] HUG: Multi-Resource Fairness for Correlated and Elastic Demands
- [EuroSys '16] TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters
- [SoCC '17] Selecting the best vm across multiple public clouds: A data-driven performance modeling approach
- [ATC '18] On the diversity of cluster workloads and its impact on research results
- [SoCC '17] Occupy the Cloud: Distributed Computing for the 99%
- [arXiv '19] Cloud Programming Simplified: A Berkeley View on Serverless Computing
- [SoCC '19] Centralized Core-granular Scheduling for Serverless Functions
- [SoCC '19] Cirrus: a Serverless Framework for End-to-end ML Workflows
- [NSDI '19] Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure
- [SIGMOD '20] Le Taureau: Deconstructing the Serverless Landscape & A Look Forward
- [SoCC '20] Serverless linear algebra
- [ATC '20] Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider
- [SIGMOD '21] Towards Demystifying Serverless Machine Learning Training
- [OSDI '21] Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads
- [SoCC '21] Atoll: A Scalable Low-Latency Serverless Platform
- [NSDI '21] Caerus: Nimble Task Scheduling for Serverless Analytics
- [ASPLOS '22] Serverless computing on heterogeneous computers
- [SIGCOMM '11] Managing Data Transfers in Computer Clusters with Orchestra
- [SIGCOMM '14] Barrat: Decentralized task-aware scheduling for data center networks
- [SIGCOMM '16] CODA: Toward Automatically Identifying and Scheduling COflows in the DArk
- [SIGCOMM '18] Sincronia: Near-Optimal Network Design for Coflows
- [SPAA '19] Near Optimal Coflow Scheduling in Networks
- [OSDI '14] GraphX: Graph Processing in a Distributed Dataflow Framework
- [ATC '17] Garaph: Efficient GPU-accelerated Graph Processing on a Single Machine with Balanced Replication
- [OSDI '21] Marius: Learning Massive Graph Embeddings on a Single Machine
- [arXiv '22] Marius++: Large-Scale Training of Graph Neural Networks on a Single Machine
- [MLSys '22] Graphiler: Optimizing Graph Neural Networks with Message Passing Data Flow Graph
- [Textbook] Distributed Tracing in Practice
- [SoCC '11] Small Cache, Big Effect: Provable Load Balancing for Randomly Partitioned Cluster Services
- [NSDI '16] Be Fast, Cheap and in Control with SwitchKV
- [SOSP '17] NetCache: Balancing Key-Value Stores with Fast In-Network Caching
- [ISCA '17] In-Datacenter Performance Analysis of a Tensor Processing Unit
- [SIGMOD '12] Towards a Unified Architecture for in-RDBMS Analytics
- [arXiv '13] Bayesian Optimization in a Billion Dimensions via Random Embeddings
- [SIGMOD '17] Automatic Database Management System Tuning Through Large-scale Machine Learning
- [HotStorage '20] Too Many Knobs to Tune? Towards Faster Database Tuning by Pre-selecting Important Knobs
- [arXiv '21] Facilitating Database Tuning with Hyper-Parameter Optimization: A Comprehensive Experimental Evaluation
- [VLDB '21] An Inquiry into Machine Learning-based Automatic Configuration Tuning Services on Real-World Database Management Systems (pdf)
- [VLDB '22] LlamaTune: Sample-Efficient DBMS Configuration Tuning
- Reading lists
- Some other stuff
- CSE 559W @ U Washington Slides: Not a paper reading class, more of an end-to-end comprehensive introduction of foundations of DL Systems