[2020 OSDI] AntMan: Dynamic Scaling on GPU Clusters for Deep Learning
Summary

Background & Motivation


Design & Implementation
Dynamic Memory Scaling

Computation management for minimizing interference

Evaluation



Links & References
Previous[2020 OSDI] Gavel: Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning WorkloadsNext[2020 OSDI] BytePS: A High Performance and Generic Framework for Distributed DNN Training
Last updated