# \[2020 OSDI] AntMan: Dynamic Scaling on GPU Clusters for Deep Learning

## Summary

AntMan is a cluster scheduler for GPU sharing. It introduces two techniques, dynamic memory scaling and opportunistic computation management, to accommodate multiple jobs and avoid interference.

![System architecture/workflow](/files/HSi1eNBYQii7jDYZNP0W)

## Background & Motivation

* GPUs in a shared cluster are not properly utilized (both SM and GRAM are under-utilized). One of the reasons is multi-GPU jobs require gang scheduling, which creates GPU idleness. Moreover, DL training jobs have dynamic resource demand over time.&#x20;
* Training jobs in the Alibaba cluster have the following characterstics:
  * Small model size: Most GPU memory can be shared
  * Short mini-batch: Fast resource coordination
  * Similar mini-batch: Mini-batch time can be used to quantify inter-job interference

![](/files/hQXug93N47a01OPjlvhd)

![](/files/RZnu4VI6GJGfpb1pPTi9)

## Design & Implementation

### Dynamic Memory Scaling

AntMan dynamically co-locates jobs on shared GPUs. The goal is for resource-guarantee jobs to maintain the same performance as dedicated execution while co-locating opportunistic jobs to best utilize the resources.

AntMan monitors the memory usage of DL jobs and sets the corresponding memory upper bounds, allowing other jobs to utilize the spare memory. However, since DL jobs have dynamic resource demand, jobs may require more memory than before, which creates OOM and fails all jobs. In this case (Fig. 7a), these memory bursts are cached on the host (CPU) memory, and are moved back to GRAM after re-allocation. The same technique is applied to jobs that need to shrink their memory requirements to make way for other jobs (Fig. 7b).&#x20;

![](/files/H7i9Pg8J4CZp8pHoDqk1)

### Computation management for minimizing interference

The GpuOpManager is introduced in DL frameworks to opportunistically launch computation kernels during idle time slots to reduce interference.

![](/files/l7KK7JdGsdxH7r9tJ8Tb)

## Evaluation

![Micro benchmark 1: Memory scaling](/files/Adly3uubTOAhapgXsHE9)

![Micro benchmark 2: Computation management. Here, ESPnet is a resource-guaranteed job, while ResNet50 is an opportunistic job.](/files/KW0Y6oxAWyHDmKsC0ymB)

![End-to-end evaluation](/files/1iTTmUez5dzdCb5G45r3)

## Links & References

* [Paper PDF](https://www.usenix.org/system/files/osdi20-xiao.pdf)
* [Presentation video at OSDI '20](https://www.youtube.com/watch?v=8PSzcqL0eUA)
* [Presentation slides at OSDI '20](https://www.usenix.org/sites/default/files/conference/protected-files/osdi20_slides_xiao.pdf)
* [GPU-cluster-for-deep-learning on GitHub](https://github.com/alibaba/GPU-scheduler-for-deep-learning)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://blog.ruipan.xyz/machine-learning-systems/machine-learning-systems-index/2020-osdi-antman-dynamic-scaling-on-gpu-clusters-for-deep-learning.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
