[2020 EuroSys] AlloX: Compute Allocation in Hybrid Clusters
One-line Summary
Allocate interchangeable resources in a hybrid cluster (CPU, GPU, FPGA, problem-specific accelerator);
Scheduling -> Min-cost bipartite matching problem, provides dynamic fair allocation
reduce avg jct while providing fairness and preventing starvation
Paper Structure Outline
Introduction
Background & Motivation
Interchangeable Resources
Algorithm Design
Optimal Approach for Queued Up Jobs
Generate input for the matching problem
Solve the matching problem
Convert the matching solution to job scheduling
Handling Online Arrivals
Incorporating Fairness
Existing Fair Allocation Algorithms are Insufficient
Our Idea
Incorporating Fairness into AlloX
AlloX Implementation
Estimator
Scheduler
Placer
Operational Issues
Evaluation
Experimental Methodology
Baselines
AlloX Performance
Experiments on a Cluster
Simulation Results
Starvation
Performance and Fairness Trade-offs
Sensitivity Analysis
Estimation Errors
Profiling Overhead
Related Work
Concluding Remarks
Acknowledgments
Background & Motivation
Background X: Heterogeneous/interchangeable resources
CPU, GPU, distinct speedup rates for different applications
Motivation X: Current schedulers do not consider
best fit: pick the optimal config for each job. This creates load imbalance (heavy load on GPUs while CPUs idle)
join the shortest queue: choose resource with the shortest completion time. This is short-sighted (each job optimizes for itself w/o considering later jobs)
shortest job first: maintain ordered queue of all jobs by increasing processing time. Whenever a resource becomes avaialable, scheulde job wiht the shortest run time
Design and Implementation
Evaluation
Links
Last updated