# \[2021 EuroMLSys] Interference-Aware Scheduling for Inference Serving

## Summary

This work proposes a scheduler for inference workloads on heterogeneous hardware. The scheduler is aware of and proactive to interferences between co-located jobs, therefore outperforming baseline policies like lease-loaded.

## Background & Motivation

Inference serving schedulers co-locate models to improve resource utilization. However, the least-loaded scheduling policy, popular in the context of VM task scheduling, is agnostic to the interference/latency degradation created by co-location, thus yielding sub-optimal scheduling result.&#x20;

## Design & Implementation

![](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MMTslgmrrtRXvxD2lk9%2Fuploads%2FdEGxV1wPZ74vZGLEEuuD%2FScreen%20Shot%202022-06-30%20at%202.24.04%20PM.png?alt=media\&token=393ab5ff-5dad-4af4-a7b1-fa7f24bffefe)

![](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MMTslgmrrtRXvxD2lk9%2Fuploads%2FkxBZM95VJFvfR0Jok607%2FScreen%20Shot%202022-06-30%20at%202.29.56%20PM.png?alt=media\&token=fe87758d-c4ef-4c04-bef0-64644bf46392)

By using a unified predictor instead of maintaining separate predictors for different co-location degrees and machine types, we are able to (1) reduce the efforts needed to train multiple predictors and (2) exploit the similarity across co-location configurations (e.g., the same models on an 8vCPU VM vs. a 32vCPU VM).

## Evaluation

![](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MMTslgmrrtRXvxD2lk9%2Fuploads%2FxW1Ja1wNq7sjmmwfiIjy%2FScreen%20Shot%202022-06-30%20at%202.37.57%20PM.png?alt=media\&token=acc690f2-1395-4e9b-aebf-b6e2281fb369)

## Links & References

* [Paper PDF](https://dl.acm.org/doi/pdf/10.1145/3437984.3458837)
