Machine Learning Systems - Index

Distributed Training & Parallelism Paradigms

[OSDI '14] Scaling Distributed Machine Learning with the Parameter Server (pdf)
[SoCC '18] Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training (pdf)
[OSDI '20] BytePS: A High Performance and Generic Framework for Distributed DNN Training (pdf)
[VLDB '20] PyTorch Distributed: Experiences on Accelerating Data Parallel Training (pdf)
[MLSys '20] Resource Elasticity in Distributed Deep Learning (pdf)
[NSDI '23] Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs (pdf)
Parallelism Paradigms & Strategies (Overview by Hugging Face)
- [NIPS '19] GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism (pdf)
- [SOSP '19] PipeDream: Generalized Pipeline Parallelism for DNN Training (pdf)
- [arXiv '19] Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism (pdf)
- [MLSys '19] FlexFlow: Beyond Data and Model Parallelism for Deep Neural Networks (pdf)
- [SC '20] ZeRO: memory optimizations toward training trillion parameter models (pdf)
- [ATC '20] HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism (pdf)
- [SC '21] ZeRO-infinity: breaking the GPU memory wall for extreme scale deep learning (pdf)
- [SC '21] Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM (pdf)
- [SC '21] Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines (pdf)
- [ICML '21] Memory-Efficient Pipeline-Parallel DNN Training (pdf)
- [ATC '21] ZeRO-Offload: Democratizing Billion-Scale Model Training (pdf)
- [PPoPP '21] DAPPLE: A Pipelined Data Parallel Approach for Training Large Models (pdf)
- [OSDI '22] Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning (pdf)
- [OSDI '22] Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization (pdf)
- [EuroSys '22] Varuna: Scalable, Low-cost Training of Massive Deep Learning Models (pdf)
- [arXiv '22] Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model (pdf)
- [PPoPP '22] BaGuaLu: Targeting Brain Scale Pretrained Models with over 37 Million Cores (pdf)
- [NeurIPS '22] AMP:Automatically Finding Model Parallel Strategies with Heterogeneity Awareness (pdf)
- [VLDB '23] MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud (pdf)

Workload Scheduling, Cluster Resource Management

[NSDI '11] DRF: Fair Allocation of Multiple Resource Types (pdf)
[OSDI '18] Gandiva: Introspective Cluster Scheduling for Deep Learning (pdf)
[EuroSys '18] Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Clusters (pdf)
[ATC '19] Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads (pdf)
[NSDI '19] Tiresias: A GPU Cluster Manager for Distributed Deep Learning (pdf)
[NSDI '20] Themis: Fair and Efficient GPU Cluster Scheduling (pdf)
[MLSys '20] Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications (pdf)
[OSDI '20] Gavel: Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads (pdf)
[OSDI '20] AntMan: Dynamic Scaling on GPU Clusters for Deep Learning (pdf)
[OSDI '20] HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees (pdf)
[EuroSys '20] Gandiva-Fair: Balancing efficiency and fairness in heterogeneous GPU clusters for deep learning (pdf)
[EuroSys '20] AlloX: Compute Allocation in Hybrid Clusters (pdf)
[MLSys '21] Wavelet: Efficient DNN Training with Tick-Tock Scheduling (pdf)
[OSDI '21] Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning (pdf)
[ATC '21] Zico: Efficient GPU Memory Sharing for Concurrent DNN Training (pdf)
[SoCC '21] Chronus: A Novel Deadline-aware Scheduler for Deep Learning Training Jobs (pdf)
[NSDI '21] AFS/CoDDL: Elastic Resource Sharing for Distributed Deep Learning (pdf)
[NSDI '22] MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters (pdf)
[OSDI '22] Synergy: Looking Beyond GPUs for DNN Scheduling on Multi-Tenant Clusters (pdf)
[SIGCOMM '22] Multi-Resource Interleaving for Deep Learning Training (pdf)
[arXiv '22] Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision (pdf)
[NSDI '23] Shockwave: Proactive, Fair, and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning
[NSDI '23] ModelKeeper: Accelerating DNN Training via Automated Training Warmup

Serving/Inference

[NSDI '17] Clipper: A Low-Latency Online Prediction Serving System (pdf)
[NIPS '17 MLSys workshop] TensorFlow-Serving: Flexible, High-Performance ML Serving (pdf)
[arXiv '18] Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications (pdf)
[NIPS '18] Dynamic Space-Time Scheduling for GPU Inference (pdf)
[SOSP '19] Parity Models: Erasure-Coded Resilience for Prediction Serving Systems (pdf)
[SOSP '19] Nexus: A GPU Cluster Engine for Accelerating DNN-Based Video Analysis (pdf)
[arXiv '19] No DNN left behind: Improving inference in the cloud with Multi-Tenancy (pdf)
[ATC '19] MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving (pdf)
[SoCC '20] GSLICE: controlled spatial sharing of GPUs for a scalable inference platform (pdf)
[SoCC '20] InferLine: Latency-Aware Provisioning and Scaling for Prediction Serving Pipelines (pdf)
[OSDI '20] Serving DNNs like Clockwork: Performance Predictability from the Bottom Up (pdf)
[OSDI '20] PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications (pdf)
[ATC '21] INFaaS: Automated Model-less Inference Serving (pdf)
[EuroMLSys '21] Interference-Aware Scheduling for Inference Serving (pdf)
[arXiv '21] Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem (pdf)
[arXiv '21] Gati: Accelerating Deep Learning Inference via Learned Caches (pdf)
[ICML '21] Boosting the Throughput and Accelerator Utilization of Specialized CNN Inference Beyond Increasing Batch Size (pdf)
[ICML '22] DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale (pdf)
[OSDI '22] Achieving μs-scale Preemption for Concurrent GPU-accelerated DNN Inferences (pdf)
[OSDI '22] Orca: A Distributed Serving System for Transformer-Based Generative Models (pdf)
[ATC '22] Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing (pdf)
[SIGMOD '22] Serverless Data Science - Are We There Yet? A Case Study of Model Serving (pdf)

Optimizing Networks/Communications for ML

[ATC '17] Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters (pdf)
[MLSys '19] BlueConnect: Decomposing All-Reduce for Deep Learning on Heterogeneous Network Hierarchy (pdf)
[MLSys '19] TicTac: Accelerating Distributed Deep Learning with Communication Scheduling (pdf)
[MLSys '19] P3: Priority-Based Parameter Propagation for Distributed DNN Training (pdf)
[SOSP '19] ByteScheduler: A Generic Communication Scheduler for Distributed DNN Training Acceleration (pdf)
[NetAI '20] Is Network the Bottleneck of Distributed Training? (pdf)
[MLSys '20] Blink: Fast and Generic Collectives for Distributed ML (pdf)
[MLSys '20] PLink: Discovering and Exploiting Datacenter Network Locality for Efficient Cloud-based Distributed Training (pdf)
[SoCC '20] Network-accelerated Distributed Machine Learning for Multi-Tenant Settings (pdf)
[NSDI '21] SwitchML: Scaling Distributed Machine Learning with In-Network Aggregation (pdf)
[NSDI '21] ATP: In-network Aggregation for Multi-tenant Learning (pdf)
[SIGCOMM '21] Efficient Sparse Collective Communication and its application to Accelerate Distributed Deep Learning (pdf)
[MLSys '21] In-network Aggregation for Shared Machine Learning Clusters
[NSDI '23] Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL (pdf)
[arXiv '21] Cloud Collectives: Towards Cloud-aware Collectives for ML Workloads with Rank Reordering (pdf)
[PPoPP '21] Synthesizing Optimal Collective Algorithms (pdf)
[NSDI '22] Accelerating Collective Communication in Data Parallel Training across Deep Learning Frameworks (pdf)
[NSDI '23] Better Together: Jointly Optimizing ML Collective Scheduling and Execution Planning using SYNDICATE
Optical Networks for ML
- [SIGCOMM '21] SiP-ML: High-Bandwidth Optical Network Interconnects for Machine Learning Training (pdf)
- [SIGCOMM '21 OptSys workshop] IOI: In-network Optical Inference (pdf)
- [OFC '22] Emerging Optical Interconnects for AI Systems (pdf)
- [NSDI '23] TOPOOPT: Optimizing the Network Topology for Distributed DNN Training (pdf)

ML for Systems, Video Analytics & Streaming

Kuntai Du's overview on video analytics
CS34702 @ UChi: Machine Learning for Networking and Systems
[SIGCOMM '17] Pensieve: Neural Adaptive Video Streaming with Pensieve
[HotNets '17] Congestion-Control Throwdown
[SIGCOMM '18] Chameleon: Scalable Adaptation of Video Analytics via Temporal and Cross-camera Correlations
[NSDI '18] PCC Vivace: Online-Learning Congestion Control
[NSDI '18] Salsify: Low-Latency Network Video through Tighter Integration between a Video Codec and a Transport Protocol
[HotEdge '19] Edge-based Transcoding for Adaptive Live Video Streaming (pdf)
[SIGCOMM '20] Reducto: On-Camera Filtering for Resource-Efficient Real-Time Video Analytics
[SIGCOMM '20] DDS: Server-Driven Video Streaming for Deep Learning Inference
[MobiCom '20] OnRL: Improving Mobile Video Telephony via Online Reinforcement Learning
[NSDI '20] Learning in situ: a randomized experiment in video streaming
[OSDI '21] Polyjuice: High-Performance Transactions via Learned Concurrency Control (pdf)
[NSDI '22] Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers
[HotMobile '22] Understanding the Potential of Server-Driven Edge Video Analytics
[SIGCOMM '22] Genet: automatic curriculum generation for learning adaptation in networking
[NSDI '23] GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge

Tricks and Relaxations in Learning and Systems: Compression, Pruning, Freezing, and many more

[NIPS '13] More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server
[arXiv '16] Training Deep Nets with Sublinear Memory Cost
[ICLR '16] Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding
[NIPS '17] Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent
[ICLR '18] Mixed precision training
[ICLR '19] The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
[arXiv '21] AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning
[PVLDB '21] BAGUA: Scaling up Distributed Learning with System Relaxations
[arXiv '22] BagPipe: Accelerating Deep Recommendation Model Training
[arXiv '22] Efficient DNN Training with Knowledge-Guided Layer Freezing
Hongyi Wang's talk: On the Utility of Gradient Compression in Distributed Training Systems
- [NIPS '18] ATOMO: Communication-efficient Learning via Atomic Sparsification (pdf)
- [MLSys '21] Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification (pdf)
- [MLSys '21] Pufferfish: Communication-efficient Models At No Extra Cost (pdf)
- [SOSP '21] Gradient Compression Supercharged High-Performance Data Parallel DNN Training (pdf)
- [MLSys '22] On the utility of gradient compression in distributed training systems (pdf)
- [arXiv '22] Cuttlefish: Factorized Model Training without All the Tuning
- [arXiv '22] ByteComp: Revisiting Gradient Compression in Distributed Training (pdf)

Misc: Storage, Hyperparameter Tuning, Federated Learning, DL Compilers, Green Datacenters

[NIPS '16 workshop] Federated Learning: Strategies for Improving Communication Efficiency
[ICML '18 workshop] Tune: A research platform for distributed model selection and training
[OSDI '18] TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
[MLSys '19] Bandana: Using Non-Volatile Memory for Storing Deep Learning Models
[MLSys '19] Towards Federated Learning at Scale: System Design
[SOSP '19] TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions
[MLSys '20] A System for Massively Parallel Hyperparameter Tuning
[ICLR '20] Federated Learning with Matched Averaging
[OSDI '20] Ansor: Generating High-Performance Tensor Programs for Deep Learning
[OSDI '20] Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks
[EuroSys '21] RubberBand: Cloud-based Hyperparameter Tuning
[FAST '21] CheckFreq: Frequent, Fine-Grained DNN Checkpointing
[VLDB '21] Analyzing and Mitigating Data Stalls in DNN Training
[MLSys '21] Fluid: Resource-aware Hyperparameter Tuning Engine
[OSDI '21] Oort: Efficient Federated Learning via Guided Participant Selection (pdf)
[OSDI '21] PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
[SoCC '21] Elastic Hyperparameter Tuning on the Cloud (pdf)
[NSDI '22] Check-N-Run: a Checkpointing System for Training Deep Learning Recommendation Models
[ICML '22] FedScale: Benchmarking Model and System Performance of Federated Learning at Scale
[HotCarbon '22] Treehouse: A Case For Carbon-Aware Datacenter Software
[NSDI '23] Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training

PreviousTowards applying to CS Ph.D. programs NextMLSys Papers - Short Notes

Last updated 3 years ago

hashtagDistributed Training & Parallelism Paradigms

hashtagWorkload Scheduling, Cluster Resource Management

hashtagServing/Inference

hashtagOptimizing Networks/Communications for ML

hashtagML for Systems, Video Analytics & Streaming

hashtagTricks and Relaxations in Learning and Systems: Compression, Pruning, Freezing, and many more

hashtagMisc: Storage, Hyperparameter Tuning, Federated Learning, DL Compilers, Green Datacenters

Distributed Training & Parallelism Paradigms

Workload Scheduling, Cluster Resource Management

Serving/Inference

Optimizing Networks/Communications for ML

ML for Systems, Video Analytics & Streaming

Tricks and Relaxations in Learning and Systems: Compression, Pruning, Freezing, and many more

Misc: Storage, Hyperparameter Tuning, Federated Learning, DL Compilers, Green Datacenters