Page cover

Machine Learning Systems - Index

Distributed Training & Parallelism Paradigms

Workload Scheduling, Cluster Resource Management

Serving/Inference

Optimizing Networks/Communications for ML

ML for Systems, Video Analytics & Streaming

Tricks and Relaxations in Learning and Systems: Compression, Pruning, Freezing, and many more

  • [NIPS '13] More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server

  • [arXiv '16] Training Deep Nets with Sublinear Memory Cost

  • [ICLR '16] Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding

  • [NIPS '17] Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

  • [ICLR '18] Mixed precision training

  • [ICLR '19] The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

  • [arXiv '21] AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning

  • [PVLDB '21] BAGUA: Scaling up Distributed Learning with System Relaxations

  • [arXiv '22] BagPipe: Accelerating Deep Recommendation Model Training

  • [arXiv '22] Efficient DNN Training with Knowledge-Guided Layer Freezing

  • Hongyi Wang's talk: On the Utility of Gradient Compression in Distributed Training Systemsarrow-up-right

Misc: Storage, Hyperparameter Tuning, Federated Learning, DL Compilers, Green Datacenters

  • [NIPS '16 workshop] Federated Learning: Strategies for Improving Communication Efficiency

  • [ICML '18 workshop] Tune: A research platform for distributed model selection and training

  • [OSDI '18] TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

  • [MLSys '19] Bandana: Using Non-Volatile Memory for Storing Deep Learning Models

  • [MLSys '19] Towards Federated Learning at Scale: System Design

  • [SOSP '19] TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions

  • [MLSys '20] A System for Massively Parallel Hyperparameter Tuning

  • [ICLR '20] Federated Learning with Matched Averaging

  • [OSDI '20] Ansor: Generating High-Performance Tensor Programs for Deep Learning

  • [OSDI '20] Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks

  • [EuroSys '21] RubberBand: Cloud-based Hyperparameter Tuning

  • [MLSys '21] Fluid: Resource-aware Hyperparameter Tuning Engine

  • [OSDI '21] Oort: Efficient Federated Learning via Guided Participant Selection (pdfarrow-up-right)

  • [OSDI '21] PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections

  • [SoCC '21] Elastic Hyperparameter Tuning on the Cloud (pdfarrow-up-right)

  • [NSDI '22] Check-N-Run: a Checkpointing System for Training Deep Learning Recommendation Models

  • [ICML '22] FedScale: Benchmarking Model and System Performance of Federated Learning at Scale

  • [HotCarbon '22] Treehouse: A Case For Carbon-Aware Datacenter Software

  • [NSDI '23] Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training

Last updated