Machine Learning Systems - Index
Distributed Training & Parallelism Paradigms
[SoCC '18] Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training (pdf)
[MLSys '20] Resource Elasticity in Distributed Deep Learning (pdf)
[NSDI '23] Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs (pdf)
Parallelism Paradigms & Strategies (Overview by Hugging Face)
[MLSys '19] FlexFlow: Beyond Data and Model Parallelism for Deep Neural Networks (pdf)
[ATC '20] HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism (pdf)
[SC '21] Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM (pdf)
[ICML '21] Memory-Efficient Pipeline-Parallel DNN Training (pdf)
[PPoPP '21] DAPPLE: A Pipelined Data Parallel Approach for Training Large Models (pdf)
[OSDI '22] Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning (pdf)
[OSDI '22] Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization (pdf)
[EuroSys '22] Varuna: Scalable, Low-cost Training of Massive Deep Learning Models (pdf)
[arXiv '22] Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model (pdf)
[PPoPP '22] BaGuaLu: Targeting Brain Scale Pretrained Models with over 37 Million Cores (pdf)
[NeurIPS '22] AMP:Automatically Finding Model Parallel Strategies with Heterogeneity Awareness (pdf)
[VLDB '23] MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud (pdf)
Workload Scheduling, Cluster Resource Management
[EuroSys '18] Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Clusters (pdf)
[OSDI '20] HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees (pdf)
[EuroSys '20] Gandiva-Fair: Balancing efficiency and fairness in heterogeneous GPU clusters for deep learning (pdf)
[EuroSys '20] AlloX: Compute Allocation in Hybrid Clusters (pdf)
[ATC '21] Zico: Efficient GPU Memory Sharing for Concurrent DNN Training (pdf)
[SoCC '21] Chronus: A Novel Deadline-aware Scheduler for Deep Learning Training Jobs (pdf)
[NSDI '21] AFS/CoDDL: Elastic Resource Sharing for Distributed Deep Learning (pdf)
[NSDI '22] MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters (pdf)
[SIGCOMM '22] Multi-Resource Interleaving for Deep Learning Training (pdf)
[arXiv '22] Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision (pdf)
[NSDI '23] Shockwave: Proactive, Fair, and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning
[NSDI '23] ModelKeeper: Accelerating DNN Training via Automated Training Warmup
Serving/Inference
[NSDI '17] Clipper: A Low-Latency Online Prediction Serving System (pdf)
[NIPS '17 MLSys workshop] TensorFlow-Serving: Flexible, High-Performance ML Serving (pdf)
[arXiv '18] Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications (pdf)
[SOSP '19] Nexus: A GPU Cluster Engine for Accelerating DNN-Based Video Analysis (pdf)
[arXiv '19] No DNN left behind: Improving inference in the cloud with Multi-Tenancy (pdf)
[ATC '19] MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving (pdf)
[SoCC '20] GSLICE: controlled spatial sharing of GPUs for a scalable inference platform (pdf)
[SoCC '20] InferLine: Latency-Aware Provisioning and Scaling for Prediction Serving Pipelines (pdf)
[OSDI '20] Serving DNNs like Clockwork: Performance Predictability from the Bottom Up (pdf)
[OSDI '20] PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications (pdf)
[ATC '21] INFaaS: Automated Model-less Inference Serving (pdf)
[arXiv '21] Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem (pdf)
[arXiv '21] Gati: Accelerating Deep Learning Inference via Learned Caches (pdf)
[ICML '22] DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale (pdf)
[OSDI '22] Achieving μs-scale Preemption for Concurrent GPU-accelerated DNN Inferences (pdf)
[OSDI '22] Orca: A Distributed Serving System for Transformer-Based Generative Models (pdf)
[ATC '22] Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing (pdf)
[SIGMOD '22] Serverless Data Science - Are We There Yet? A Case Study of Model Serving (pdf)
Optimizing Networks/Communications for ML
[ATC '17] Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters (pdf)
[MLSys '20] PLink: Discovering and Exploiting Datacenter Network Locality for Efficient Cloud-based Distributed Training (pdf)
[SoCC '20] Network-accelerated Distributed Machine Learning for Multi-Tenant Settings (pdf)
[NSDI '21] ATP: In-network Aggregation for Multi-tenant Learning (pdf)
[SIGCOMM '21] Efficient Sparse Collective Communication and its application to Accelerate Distributed Deep Learning (pdf)
[MLSys '21] In-network Aggregation for Shared Machine Learning Clusters
[arXiv '21] Cloud Collectives: Towards Cloud-aware Collectives for ML Workloads with Rank Reordering (pdf)
[PPoPP '21] Synthesizing Optimal Collective Algorithms (pdf)
[NSDI '22] Accelerating Collective Communication in Data Parallel Training across Deep Learning Frameworks (pdf)
[NSDI '23] Better Together: Jointly Optimizing ML Collective Scheduling and Execution Planning using SYNDICATE
Optical Networks for ML
[SIGCOMM '21] SiP-ML: High-Bandwidth Optical Network Interconnects for Machine Learning Training (pdf)
[SIGCOMM '21 OptSys workshop] IOI: In-network Optical Inference (pdf)
[OFC '22] Emerging Optical Interconnects for AI Systems (pdf)
[NSDI '23] TOPOOPT: Optimizing the Network Topology for Distributed DNN Training (pdf)
ML for Systems, Video Analytics & Streaming
[SIGCOMM '17] Pensieve: Neural Adaptive Video Streaming with Pensieve
[HotNets '17] Congestion-Control Throwdown
[NSDI '18] PCC Vivace: Online-Learning Congestion Control
[NSDI '18] Salsify: Low-Latency Network Video through Tighter Integration between a Video Codec and a Transport Protocol
[HotEdge '19] Edge-based Transcoding for Adaptive Live Video Streaming (pdf)
[SIGCOMM '20] DDS: Server-Driven Video Streaming for Deep Learning Inference
[MobiCom '20] OnRL: Improving Mobile Video Telephony via Online Reinforcement Learning
[NSDI '20] Learning in situ: a randomized experiment in video streaming
[OSDI '21] Polyjuice: High-Performance Transactions via Learned Concurrency Control (pdf)
[NSDI '22] Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers
[HotMobile '22] Understanding the Potential of Server-Driven Edge Video Analytics
[SIGCOMM '22] Genet: automatic curriculum generation for learning adaptation in networking
[NSDI '23] GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge
Tricks and Relaxations in Learning and Systems: Compression, Pruning, Freezing, and many more
[NIPS '13] More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server
[arXiv '16] Training Deep Nets with Sublinear Memory Cost
[ICLR '16] Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding
[NIPS '17] Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent
[ICLR '18] Mixed precision training
[ICLR '19] The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
[arXiv '21] AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning
[PVLDB '21] BAGUA: Scaling up Distributed Learning with System Relaxations
[arXiv '22] BagPipe: Accelerating Deep Recommendation Model Training
[arXiv '22] Efficient DNN Training with Knowledge-Guided Layer Freezing
Hongyi Wang's talk: On the Utility of Gradient Compression in Distributed Training Systems
[NIPS '18] ATOMO: Communication-efficient Learning via Atomic Sparsification (pdf)
[MLSys '21] Pufferfish: Communication-efficient Models At No Extra Cost (pdf)
[SOSP '21] Gradient Compression Supercharged High-Performance Data Parallel DNN Training (pdf)
[MLSys '22] On the utility of gradient compression in distributed training systems (pdf)
[arXiv '22] Cuttlefish: Factorized Model Training without All the Tuning
[arXiv '22] ByteComp: Revisiting Gradient Compression in Distributed Training (pdf)
Misc: Storage, Hyperparameter Tuning, Federated Learning, DL Compilers, Green Datacenters
[NIPS '16 workshop] Federated Learning: Strategies for Improving Communication Efficiency
[ICML '18 workshop] Tune: A research platform for distributed model selection and training
[OSDI '18] TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
[MLSys '19] Bandana: Using Non-Volatile Memory for Storing Deep Learning Models
[MLSys '19] Towards Federated Learning at Scale: System Design
[SOSP '19] TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions
[MLSys '20] A System for Massively Parallel Hyperparameter Tuning
[ICLR '20] Federated Learning with Matched Averaging
[OSDI '20] Ansor: Generating High-Performance Tensor Programs for Deep Learning
[OSDI '20] Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks
[EuroSys '21] RubberBand: Cloud-based Hyperparameter Tuning
[MLSys '21] Fluid: Resource-aware Hyperparameter Tuning Engine
[OSDI '21] Oort: Efficient Federated Learning via Guided Participant Selection (pdf)
[OSDI '21] PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
[SoCC '21] Elastic Hyperparameter Tuning on the Cloud (pdf)
[NSDI '22] Check-N-Run: a Checkpointing System for Training Deep Learning Recommendation Models
[ICML '22] FedScale: Benchmarking Model and System Performance of Federated Learning at Scale
[HotCarbon '22] Treehouse: A Case For Carbon-Aware Datacenter Software
[NSDI '23] Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training
Last updated