# RDP: Row-Diagonal Parity for Double Disk Failure Correction

## One-line Summary

RDP is an algorithm that protects against double disk failures. RDP can be applied to RAID systems. RDP is also known as RAID-DP/RAID-6 (There are other RAID-6 approaches to handle two disk failures, but RDP is the most intuitive).

## Paper Structure Outline

1. Introduction
2. Related Work
3. Double Disk Failure Models and Analysis
4. Row-Diagonal Parity Algorithm
5. Proof of Correctness
6. Performance Analysis
7. Algorithm Extensions
8. Implementation Experience
9. Measured Performance
10. Conclusions
11. Acknowledgments

## Background & Motivation

There are two types of disk failures: Individual disks can fail by whole-disk failure, whereby all the data on the disk becomes temporarily or permanently inaccessible, or by media failure, whereby a small portion of the data on a disk becomes temporarily inaccessible. The previous RAID only considers whole-disk failures.

Multiple disk errors are likely: the authors gave a detailed analysis of why this is the case in section 3 (which I'm not going to get into).

## Design and Implementation

RDP is built on RAID-4 or RAID-5. In this paper, we will focus on RAID-4.

![In this case, p = 5. We have (p+1) disks and (p-1) data disks.](/files/-MP_pPBNrgNRl3VyOsz7)

XOR is still used for parity. The figure shows the diagonal of each block. In the example above, if we have whole-disk failures on data disks 1 and 3, the data can be easily recovered in many ways.

RDP can also be extended to encompass multiple RAID-4 or RAID-5 disk arrays in a single RDP disk array.

## Evaluation

* Read performance is unaffected.
* Sequential write: Write p-1 stripes at once for best performance (update row and diagonal parity at the same time).
* Partial stripe writes: Writing d blocks by subtraction requires 2d+4 I/Os (d+2 for read, d+2 for write), and writing d blocks by additive requires n I/Os (n-d-2 for read, d+2 for write). Thus, we use a combination of additive and subtractive.
* Proof of correctness and optimality is covered in the paper.

![Write performance measured: RDP gives a much better reliability for the same cost and performance.](/files/-MP_rZejj5RKgRwnPV0F)

{% hint style="info" %}

* G: number of separate RAID groups connected to the filer
* d: number of data disks per RAID group
* p: number of parity disks per RAID group
  {% endhint %}

## New Vocabulary

* NetApp: A cloud data services and data management company.

## Links

* [Paper PDF](https://www.usenix.org/legacy/publications/library/proceedings/fast04/tech/corbett/corbett.pdf)

{% file src="/files/-MPkIt9XdUzLx2b-40Q4" %}
Prof. Andrea's slides on RAID and RDP
{% endfile %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://blog.ruipan.xyz/earlier-readings-and-notes/index/rdp-row-diagonal-parity-for-double-disk-failure-correction.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
