[2019 FAST] DistCache: Provable Load Balancing for Large-Scale Storage Systems with Distributed...



This paper presents a new distributed caching mechanism for addressing load imbalance in large-scale storage systems.

Background & Motivation

Cloud service providers use large clusters to store data. The data access workload is skewed (power law distribution), which creates load imbalance, resulting in low throughput and long tail latencies. The objective is to achieve load balancing in distributed storage systems.

Thus, we need a layer of distributed caching as the load balancer.

Design & Implementation

Some key design choices include:

  • Cache allocation with independent hash functions: The intuition is that if one cache node in a layer is overloaded by receiving too many queries to its cached objects, because the hash functions of the two layers are independent, the set of hot objects would be distributed to multiple cache nodes in another layer with high probability.

  • Query routing with the power-of-two-choices: The sender of a query looks at the loads of the cache nodes that cache the queried object and sends the query to the less-loaded node.

These mechanisms can be applied recursively for multi-layer hierarchical caching.


Last updated