antek's tech blog - pythonZola2020-11-29T00:00:00+00:00https://anadoxin.org/blog/tags/python/atom.xmlReed-Solomon error recovery in RAID-62020-11-29T00:00:00+00:002020-11-29T00:00:00+00:00Unknownhttps://anadoxin.org/blog/error-recovery-in-raid6.html/<p>There are lots of resources on the Internet about RAID-6 error recovery and how you can create your own implementation of it, but most of those resources require spending a lot of time fighting with mathematical equations and figuring out the real algorithm.</p>
<p>In this post I'll try to give you a simple example how you can create your own error recovery solution based on what is used in RAID-6. More specifically, if you need to provide rendundancy across your mediums so that a failure of 1 or 2 mediums will be tolerated, look no further! ;)</p>
<p>If you'll read this post, as a bonus you'll gain knowledge about how RAID-5 error recovery works, because RAID-6 is an improved version of RAID-5 error recovery system.</p>
<span id="continue-reading"></span><h1 id="overview">Overview</h1>
<p>Let's assume you have 3 disk drives with some data. Let's name those drives as <code>D1</code>, <code>D2</code> and <code>D3</code>. In order to use the same error recovery technique as RAID-6 uses, you'll need two additional disk drives, the <code>PD</code> drive, and the <code>RS</code> drive. I'll describe what <code>PD</code> and <code>RS</code> means in few moments. So, you'll need a total of 5 disk drives: <code>D1</code>, <code>D2</code>, <code>D3</code>, <code>PD</code> and <code>RS</code>.</p>
<p><img src="https://anadoxin.org/blog/error-recovery-in-raid6.html/image1.svg" alt="" /></p>
<p>So, here's the situation:</p>
<ul>
<li><code>D1</code>, <code>D2</code> and <code>D3</code> contain <em>arbitrary user data</em>, and it doesn't matter what their contents are. FWIW you can assume those drives are full of cat pictures.</li>
<li>The special <code>PD</code> drive (named after <code>Parity Drive</code>, sometimes called <code>P</code> in whitepapers) contains the XOR data, generated automatically from <code>D1</code>, <code>D2</code> and <code>D3</code>.</li>
<li>The second special <code>RS</code> drive (named after <code>Reed-Solomon Drive</code>, sometimes also called <code>Q</code>) contains the Reed-Solomon codes, calculated from the same data as <code>PD</code>, namely from drives <code>D1</code>, <code>D2</code> and <code>D3</code>.</li>
</ul>
<p>Let's see how we can perform some basic operations on such disk array.</p>
<h1 id="how-the-recovery-works">How the recovery works</h1>
<p>If we have properly calculated <code>PD</code> and <code>RS</code> drives, we can lose up to 2 drives. How we recover from failures depends on which drives will fail. There are generally 7 cases that RAID-6 can handle. Next points will describe the scenarios, sorted from the easiest case, to the most complicated.</p>
<ol>
<li>
<p>Loss of the <code>PD</code> drive (failure of only one drive).</p>
<p><img src="https://anadoxin.org/blog/error-recovery-in-raid6.html/image-losspd.svg" alt="" /></p>
<p>This case is very straightforward. The <code>PD</code> drive contains only autogenerated data, so in case we lose the <code>PD</code> drive, we can regenerate it by using only user data (stored on disks <code>D1</code>, <code>D2</code> and <code>D3</code>).</p>
</li>
<li>
<p>Loss of one of the data drives: either <code>D1</code>, <code>D2</code> or <code>D3</code> (failure of only one drive).</p>
<p><img src="https://anadoxin.org/blog/error-recovery-in-raid6.html/image-datadrive.svg" alt="" /></p>
<p>In this case we're losing data, but since we only lose 1 disk, the recovery scenario is the same as in RAID-5 error recovery: we'll use <code>PD</code> drive <em>together with two non-missing data drives</em> to recover data from the missing data drive. It doesn't matter which data drive we lose, because if we have 2 data drives and the <code>PD</code> drive, we can always generate data for the third drive. The <code>RS</code> drive is not needed to regenerate the data drive in this case (and is not used at all in this failure case).</p>
</li>
<li>
<p>Loss of the <code>RS</code> drive (failure of only one drive).</p>
<p><img src="https://anadoxin.org/blog/error-recovery-in-raid6.html/image-lossrs.svg" alt="" /></p>
<p>Similar to the situation from point 1: we have all the data drives, and we can simply regenerate the <code>RS</code> drive by calculating Reed-Solomon codes from drives that did not fail.</p>
</li>
<li>
<p>Loss of the <code>PD</code> drive and the <code>RS</code> drive (failure of two drives).</p>
<p><img src="https://anadoxin.org/blog/error-recovery-in-raid6.html/image-losspdrs.svg" alt="" /></p>
<p>This case is very similar to points 1 or 3, we have all the data intact, so we can generate contents of <code>PD</code> drive and then <code>RS</code> drive very easily.</p>
</li>
<li>
<p>Loss of the <code>RS</code> drive and one data drive (failure of two drives).</p>
<p><img src="https://anadoxin.org/blog/error-recovery-in-raid6.html/image-lossdatars.svg" alt="" /></p>
<p>In this case we're losing two disks, but only one lost disk is filled with data. Since we have <code>PD</code> drive intact, we can use it to regenerate data from missing data drive, so this case is not so different than case #2. After that, we will have all the data drives, so we can regenerate the <code>RS</code> drive easily.</p>
</li>
<li>
<p>Loss of the <code>PD</code> drive and one data drive (failure of two drives).</p>
<p><img src="https://anadoxin.org/blog/error-recovery-in-raid6.html/image-lossdatapd.svg" alt="" /></p>
<p>This case is more complicated. We lose one user data drive (in this example <code>D3</code>), and we don't have the <code>PD</code> drive to aid with recovery, because we've lost it as well. We have to use the <code>RS</code> drive in conjunction with all user data drives that are still available (<code>D1</code> and <code>D2</code>) to regenerate the missing data drive <code>D3</code>. After we'll have all data drives regenerated, we can calculate the missing <code>PD</code> drive. This is the first case where recovery using Reed-Solomon codes comes into play.</p>
</li>
<li>
<p>Loss of two data drives (failure of two drives).</p>
<p><img src="https://anadoxin.org/blog/error-recovery-in-raid6.html/image-lossdatadata.svg" alt="" /></p>
<p>This is the most complicated scenario. We need to use both <code>PD</code> and <code>RS</code> to regenerate both data drives. Reed-Solomon coding makes this case possible.</p>
</li>
</ol>
<p>In the following sections, I'll try to describe those cases above with more detail, and provide some source code (in Python :P) that will perform actual recovery of data.</p>
<p>Please keep in mind that real RAID-6 arrays don't really dedicate whole disk for <code>PD</code> or <code>RS</code>. Real arrays span this additional checksum data across all disks. There are multiple methods used by various controllers: left asynchronous, right synchronous, there can be an offset to the RAID data, there can be pattern delays, etc. Why it's done this way, and how exactly RAID-6 stripes looks like is beyond the scope of this blog post. So let's stick only to Reed-Solomon codes.</p>
<h2 id="test-data">Test data</h2>
<p>Let's define how our "user data" looks like. To keep things simple, let's pretend that our "disk drives" are 5 bytes big.</p>
<table><thead><tr><th>Disk</th><th>Data in ASCII</th><th>Data in HEX</th></tr></thead><tbody>
<tr><td><code>D1</code></td><td>f i r s t</td><td>0x66, 0x69, 0x72, 0x73, 0x74</td></tr>
<tr><td><code>D2</code></td><td>s e c n d</td><td>0x73, 0x65, 0x63, 0x6e, 0x64</td></tr>
<tr><td><code>D3</code></td><td>t h i r d</td><td>0x74, 0x68, 0x69, 0x72, 0x64</td></tr>
</tbody></table>
<p>Let's go into the scenarios mentioned above in more details.</p>
<h1 id="case-1-loss-of-pd-drive">Case 1. Loss of <code>PD</code> drive.</h1>
<p>In order to generate the <code>PD</code> data, we need only the user data drives. In our case it's <code>D1</code>, <code>D2</code> and <code>D3</code>. The <code>PD</code> drive consists of nothing more than <a href="https://en.wikipedia.org/wiki/Bitwise_operation#XOR">XOR</a> of all user data.</p>
<ul>
<li>To generate offset 0 of the <code>PD</code> drive, you need to XOR all bytes from offset 0 from all disk drives.</li>
<li>Then, to generate offset 1 of the <code>PD</code> drive, you need to XOR bytes from offset 1 from all disk drives. E.g.:</li>
</ul>
<pre><code>PD[0] = D1[0] xor D2[0] xor D3[0]
PD[1] = D1[1] xor D2[1] xor D3[1]
PD[2] = D1[2] xor D2[2] xor D3[2]
PD[3] = D1[3] xor D2[3] xor D3[3]
PD[4] = D1[4] xor D2[4] xor D3[4]
</code></pre>
<p>Example:</p>
<pre><code>PD[0] = 0x66 xor 0x73 xor 0x74 => 0x61
PD[1] = 0x69 xor 0x65 xor 0x63 => 0x64
PD[2] = 0x72 xor 0x63 xor 0x69 => 0x78
PD[3] = 0x73 xor 0x6e xor 0x72 => 0x6f
PD[4] = 0x74 xor 0x64 xor 0x64 => 0x74
</code></pre>
<p>Yes, it's that simple. Do it for the whole drives (in our case, 5 bytes), and you'll have the properly generated <code>PD</code> drive:</p>
<table><thead><tr><th>Disk</th><th>Data in HEX</th></tr></thead><tbody>
<tr><td><code>PD</code></td><td>0x61, 0x64, 0x78, 0x6f, 0x74</td></tr>
</tbody></table>
<p>So, in case when only your <code>PD</code> drive will fail, you can see it's trivial to regenerate it from the data drives <code>D1</code>, <code>D2</code> and <code>D3</code>.</p>
<h1 id="case-2-loss-of-one-of-the-data-drives-either-d1-d2-or-d3">Case 2. Loss of one of the data drives: either <code>D1</code>, <code>D2</code> or <code>D3</code>.</h1>
<p>By the way, this is how RAID-5 error recovery works. If only one drive with user data will fail, we can use the <code>PD</code> drive to recalculate missing user data.</p>
<p>Let's say we're losing <code>D2</code>, so the drives that are still working are: <code>D1</code>, <code>D3</code>, <code>PD</code> and <code>RS</code>. In this case, we don't even look at <code>RS</code>. All we need are the <code>D1</code>, <code>D3</code> and <code>PD</code> drives. To calculate missing data, you can use the XOR function again, like in the previous point.</p>
<p>To recover user data from offset 0, XOR the bytes from offsets 0 of disks with user data you haven't lost (<code>D1</code> and <code>D3</code>) together with the byte from offset 0 from the <code>PD</code> drive. Do the same thing for offset 1, like this:</p>
<pre><code>D2[0] = D1[0] xor D3[0] xor PD[0]
D2[1] = D1[1] xor D3[1] xor PD[1]
D2[2] = D1[2] xor D3[2] xor PD[2]
D2[3] = D1[3] xor D3[3] xor PD[3]
D2[4] = D1[4] xor D3[4] xor PD[4]
</code></pre>
<p>Example:</p>
<pre><code>D2[0] = 0x66 xor 0x74 xor 0x61 => 0x73 (s)
D2[1] = 0x69 xor 0x63 xor 0x64 => 0x65 (e)
D2[2] = 0x72 xor 0x69 xor 0x78 => 0x63 (c)
D2[3] = 0x73 xor 0x72 xor 0x6f => 0x6e (n)
D2[4] = 0x74 xor 0x64 xor 0x74 => 0x64 (d)
</code></pre>
<p>As you can see, we can easily recover data from the missing drive. It doesn't matter which drive is missing; the <a href="https://en.wikipedia.org/wiki/Bitwise_operation#XOR">XOR function</a> will work anyway.</p>
<h1 id="case-3-loss-of-the-rs-drive">Case 3. Loss of the <code>RS</code> drive.</h1>
<p>Now we enter the realm of the Reed-Solomon codes and Galois fields. But don't worry, you don't have to be a mathematician in order to <em>use</em> it.</p>
<p>When we lose only <code>RS</code> drive, or when we initialize a new RAID-6-like redundation system, we simply need to regenerate it. In order to do that, we need to use the <code>gflog</code> and <code>gfilog</code> tables, which are always constant, plus data from our existing data drives <code>D1</code>, <code>D2</code> and <code>D3</code>.</p>
<p>This is the <code>gflog</code> table, it always stays the same:</p>
<pre data-lang="text" class="language-text "><code class="language-text" data-lang="text"> 0x00, 0x00, 0x01, 0x19, 0x02, 0x32, 0x1a, 0xc6, 0x03, 0xdf, 0x33, 0xee, 0x1b, 0x68, 0xc7, 0x4b,
0x04, 0x64, 0xe0, 0x0e, 0x34, 0x8d, 0xef, 0x81, 0x1c, 0xc1, 0x69, 0xf8, 0xc8, 0x08, 0x4c, 0x71,
0x05, 0x8a, 0x65, 0x2f, 0xe1, 0x24, 0x0f, 0x21, 0x35, 0x93, 0x8e, 0xda, 0xf0, 0x12, 0x82, 0x45,
0x1d, 0xb5, 0xc2, 0x7d, 0x6a, 0x27, 0xf9, 0xb9, 0xc9, 0x9a, 0x09, 0x78, 0x4d, 0xe4, 0x72, 0xa6,
0x06, 0xbf, 0x8b, 0x62, 0x66, 0xdd, 0x30, 0xfd, 0xe2, 0x98, 0x25, 0xb3, 0x10, 0x91, 0x22, 0x88,
0x36, 0xd0, 0x94, 0xce, 0x8f, 0x96, 0xdb, 0xbd, 0xf1, 0xd2, 0x13, 0x5c, 0x83, 0x38, 0x46, 0x40,
0x1e, 0x42, 0xb6, 0xa3, 0xc3, 0x48, 0x7e, 0x6e, 0x6b, 0x3a, 0x28, 0x54, 0xfa, 0x85, 0xba, 0x3d,
0xca, 0x5e, 0x9b, 0x9f, 0x0a, 0x15, 0x79, 0x2b, 0x4e, 0xd4, 0xe5, 0xac, 0x73, 0xf3, 0xa7, 0x57,
0x07, 0x70, 0xc0, 0xf7, 0x8c, 0x80, 0x63, 0x0d, 0x67, 0x4a, 0xde, 0xed, 0x31, 0xc5, 0xfe, 0x18,
0xe3, 0xa5, 0x99, 0x77, 0x26, 0xb8, 0xb4, 0x7c, 0x11, 0x44, 0x92, 0xd9, 0x23, 0x20, 0x89, 0x2e,
0x37, 0x3f, 0xd1, 0x5b, 0x95, 0xbc, 0xcf, 0xcd, 0x90, 0x87, 0x97, 0xb2, 0xdc, 0xfc, 0xbe, 0x61,
0xf2, 0x56, 0xd3, 0xab, 0x14, 0x2a, 0x5d, 0x9e, 0x84, 0x3c, 0x39, 0x53, 0x47, 0x6d, 0x41, 0xa2,
0x1f, 0x2d, 0x43, 0xd8, 0xb7, 0x7b, 0xa4, 0x76, 0xc4, 0x17, 0x49, 0xec, 0x7f, 0x0c, 0x6f, 0xf6,
0x6c, 0xa1, 0x3b, 0x52, 0x29, 0x9d, 0x55, 0xaa, 0xfb, 0x60, 0x86, 0xb1, 0xbb, 0xcc, 0x3e, 0x5a,
0xcb, 0x59, 0x5f, 0xb0, 0x9c, 0xa9, 0xa0, 0x51, 0x0b, 0xf5, 0x16, 0xeb, 0x7a, 0x75, 0x2c, 0xd7,
0x4f, 0xae, 0xd5, 0xe9, 0xe6, 0xe7, 0xad, 0xe8, 0x74, 0xd6, 0xf4, 0xea, 0xa8, 0x50, 0x58, 0xaf.
</code></pre>
<p>This is the <code>gfilog</code> table, it's also constant:</p>
<pre data-lang="text" class="language-text "><code class="language-text" data-lang="text"> 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1d, 0x3a, 0x74, 0xe8, 0xcd, 0x87, 0x13, 0x26,
0x4c, 0x98, 0x2d, 0x5a, 0xb4, 0x75, 0xea, 0xc9, 0x8f, 0x03, 0x06, 0x0c, 0x18, 0x30, 0x60, 0xc0,
0x9d, 0x27, 0x4e, 0x9c, 0x25, 0x4a, 0x94, 0x35, 0x6a, 0xd4, 0xb5, 0x77, 0xee, 0xc1, 0x9f, 0x23,
0x46, 0x8c, 0x05, 0x0a, 0x14, 0x28, 0x50, 0xa0, 0x5d, 0xba, 0x69, 0xd2, 0xb9, 0x6f, 0xde, 0xa1,
0x5f, 0xbe, 0x61, 0xc2, 0x99, 0x2f, 0x5e, 0xbc, 0x65, 0xca, 0x89, 0x0f, 0x1e, 0x3c, 0x78, 0xf0,
0xfd, 0xe7, 0xd3, 0xbb, 0x6b, 0xd6, 0xb1, 0x7f, 0xfe, 0xe1, 0xdf, 0xa3, 0x5b, 0xb6, 0x71, 0xe2,
0xd9, 0xaf, 0x43, 0x86, 0x11, 0x22, 0x44, 0x88, 0x0d, 0x1a, 0x34, 0x68, 0xd0, 0xbd, 0x67, 0xce,
0x81, 0x1f, 0x3e, 0x7c, 0xf8, 0xed, 0xc7, 0x93, 0x3b, 0x76, 0xec, 0xc5, 0x97, 0x33, 0x66, 0xcc,
0x85, 0x17, 0x2e, 0x5c, 0xb8, 0x6d, 0xda, 0xa9, 0x4f, 0x9e, 0x21, 0x42, 0x84, 0x15, 0x2a, 0x54,
0xa8, 0x4d, 0x9a, 0x29, 0x52, 0xa4, 0x55, 0xaa, 0x49, 0x92, 0x39, 0x72, 0xe4, 0xd5, 0xb7, 0x73,
0xe6, 0xd1, 0xbf, 0x63, 0xc6, 0x91, 0x3f, 0x7e, 0xfc, 0xe5, 0xd7, 0xb3, 0x7b, 0xf6, 0xf1, 0xff,
0xe3, 0xdb, 0xab, 0x4b, 0x96, 0x31, 0x62, 0xc4, 0x95, 0x37, 0x6e, 0xdc, 0xa5, 0x57, 0xae, 0x41,
0x82, 0x19, 0x32, 0x64, 0xc8, 0x8d, 0x07, 0x0e, 0x1c, 0x38, 0x70, 0xe0, 0xdd, 0xa7, 0x53, 0xa6,
0x51, 0xa2, 0x59, 0xb2, 0x79, 0xf2, 0xf9, 0xef, 0xc3, 0x9b, 0x2b, 0x56, 0xac, 0x45, 0x8a, 0x09,
0x12, 0x24, 0x48, 0x90, 0x3d, 0x7a, 0xf4, 0xf5, 0xf7, 0xf3, 0xfb, 0xeb, 0xcb, 0x8b, 0x0b, 0x16,
0x2c, 0x58, 0xb0, 0x7d, 0xfa, 0xe9, 0xcf, 0x83, 0x1b, 0x36, 0x6c, 0xd8, 0xad, 0x47, 0x8e, 0x01.
</code></pre>
<p>You don't have to include those tables in your program if you don't want to; you can use this algorithm to generate them in the runtime:</p>
<pre data-lang="python" class="language-python "><code class="language-python" data-lang="python"># gflog_tables.py
def generate_tables():
polynomial = 0x11d
s = 8
gf_elements = 1 << s
gflog = gf_elements * [0]
gfilog = gf_elements * [0]
b = 1
for i in range(0, gf_elements):
gflog[b] = i & 255
gfilog[i] = b & 255
b <<= 1
if b & gf_elements:
b ^= polynomial
gflog[1] = 0;
return (gflog, gfilog)
def dump_table(caption, tab):
item = 0
print("--- {} ---".format(caption))
for i in tab:
print("0x{:02x}, ".format(i), end="")
item += 1
if item % 16 == 0:
item = 0
print()
print("")
(gflog, gfilog) = generate_tables()
# Uncomment if you want to see the tables on the console:
#
# dump_table("gflog", gflog)
# dump_table("gfilog", gfilog)
</code></pre>
<p>After declaring those tables, we need to define some operations. We are working now in a <a href="https://en.wikipedia.org/wiki/Finite_field">finite field</a>, and basic arithmetic operations have a different implementation (although their meaning is kind of similar). We have to redefine basic operations like <code>add</code>, <code>multiply</code> or <code>divide</code>:</p>
<pre data-lang="python" class="language-python "><code class="language-python" data-lang="python"># rs_functions.py
from gflog_tables import *
# Addition
def gf_add(*args):
result = 0
for arg in args:
result ^= arg
return result
# Indexing
# First drive is 1, second drive is 2, etc...
def gf_drive(index):
global gfilog
return gfilog[index - 1]
# Multiplication
def gf_mul(a, b):
global gflog
global gfilog
if a == 0 or b == 0:
return 0
else:
return gfilog[(gflog[a] + gflog[b]) % 255]
# Division helper
def sub_gf8(a, b):
if a > b:
return a - b
else:
return (255 - (0 - (a - b)))
# Division
def gf_div(a, b):
global gfilog
global gflog
return gfilog[sub_gf8(gflog[a], gflog[b])]
</code></pre>
<p>Since we have our helper functions declared, let's try to generate the data of the <code>RS</code> drive.</p>
<pre data-lang="python" class="language-python "><code class="language-python" data-lang="python"># case 3 -- recover_rs.py
from rs_functions import *
# Here are our drives, together with their data.
image1 = [ ord('f'), ord('i'), ord('r'), ord('s'), ord('t') ]
image2 = [ ord('s'), ord('e'), ord('c'), ord('n'), ord('d') ]
image3 = [ ord('t'), ord('h'), ord('i'), ord('r'), ord('d') ]
# This is a placeholder for our RS drive. It will be regenerated
# in the lines below.
imageRS = [0] * 5
# And this is our loop that generates the RS data using nothing more
# than the user data drives.
for i in range(0, 5):
imageRS[i] = gf_add(gf_mul(gf_drive(1), image1[i]),
gf_mul(gf_drive(2), image2[i]),
gf_mul(gf_drive(3), image3[i]))
dump_table("imageRS", imageRS)
</code></pre>
<p>Thus, after running the <code>recover_rs.py</code> script, we can see that our <code>RS</code> drive contains this data:</p>
<table><thead><tr><th>Disk</th><th>Data in HEX</th></tr></thead><tbody>
<tr><td><code>RS</code></td><td>0x4d, 0x1e, 0x0d, 0x7a, 0x31</td></tr>
</tbody></table>
<p>As of this moment, our disks <code>D1</code>, <code>D2</code> and <code>D3</code> are guarded by a full RAID-6 error correction algorithm, because we have a properly generated <code>PD</code> and <code>RS</code> drives.</p>
<p>One important thing to remember that our current <code>RS</code> data is valid only for our drives <code>D1</code>, <code>D2</code> and <code>D3</code> <em>in this particular order</em>. So, <code>RS</code> for <code>D1</code>, <code>D2</code> and <code>D3</code> will be <em>different</em> than for disks <code>D3</code>, <code>D2</code> and <code>D1</code>, even if the actual data on the drives are the same. This is an important thing to remember, because when recovering data from broken RAID-6 arrays, you have to know the proper sequence of disks inside the array. Fortunately, if the array is not big, you can bruteforce generation of <code>RS</code> data so that you can discover the proper sequence of disks.</p>
<h1 id="case-4-loss-of-the-pd-drive-and-the-rs-drive">Case 4. Loss of the <code>PD</code> drive and the <code>RS</code> drive.</h1>
<p>This case is not complicated, because it can be solved by executing case #1 first, then proceeding with execution of case #3.</p>
<p>To reiterate, in this case we have all user data intact. We can use this data to generate the <code>PD</code> drive. Then, we can also use the same data to generate the <code>RS</code> drive. Both cases were already described in point 1 and point 3 above.</p>
<h1 id="case-5-loss-of-the-rs-drive-and-one-data-drive">Case 5. Loss of the <code>RS</code> drive and one data drive.</h1>
<p>This case is also not complicated. Since we've lost only 1 data drive, but we still have the <code>PD</code> drive, we can execute case #2 to regenerate missing data drive. Then, we can use all data disks to regenerate the <code>RS</code> drive, like in case #3. After that, we'll have all drives regenerated without errors.</p>
<h1 id="case-6-loss-of-the-pd-drive-and-one-data-drive">Case 6. Loss of the <code>PD</code> drive and one data drive.</h1>
<p>The general approach is to first regenerate the missing data drive using other data drives combined with the <code>RS</code> drive, then after we'll have all data drives regenerated we can proceed with regenerating the <code>PD</code> drive (case #2).</p>
<p>We have to do some calculations in order to recover in this case. Let's assume that together with losing <code>PD</code>, we're losing data drive <code>D2</code> as well. So, we have drives: <code>D1</code>, <code>D3</code>, and <code>RS</code>.</p>
<p>Thanks to the <code>RS</code> drive, we can regenerate <code>D2</code> by combining <code>D1</code>, <code>D3</code> and <code>RS</code>, like this:</p>
<pre data-lang="python" class="language-python "><code class="language-python" data-lang="python"># case 6 -- recover_d2_and_pd.py
from rs_functions import *
# We have these drives...
image1 = [ ord('f'), ord('i'), ord('r'), ord('s'), ord('t') ]
image3 = [ ord('t'), ord('h'), ord('i'), ord('r'), ord('d') ]
imageRS = [ 0x4d, 0x1e, 0x0d, 0x7a, 0x31 ]
# ...and these drives are dead
imagePD = [0] * 5
image2 = [0] * 5
for i in range(0, 5):
partialRS = gf_add(gf_mul(gf_drive(1), image1[i]),
imageRS[i], # Use RS drive instead of the dead drive.
gf_mul(gf_drive(3), image3[i]))
# gf_drive(2) is our dead drive.
div_result = gf_div(1, gf_drive(2))
# This will generate the data from the dead D2 drive.
image2[i] = gf_mul(div_result, partialRS)
# This will generate the data from the dead PD drive.
imagePD[i] = gf_add(image1[i], image2[i], image3[i])
dump_table("image2", image2)
dump_table("imagePD", imagePD)
</code></pre>
<p>First, we need to generate the <code>partialRS</code> value by <code>gf_add</code>'ing the return values of <code>gf_mul</code> for all bytes of all valid disks, together with the value of the <code>RS</code> drive in place for the missing data disk (in our case, <code>D2</code>).</p>
<p>Then, we use the <code>partialRS</code> value to regenerate the <code>D2</code> data by dividing <code>1</code> with the dead drive's index (<code>gf_drive(2)</code>), and multiplying the result by <code>partialRS</code>. The <code>gf_drive(2)</code> argument specifies the index of our dead disk -- if our <code>D1</code> drive would fail, we would use <code>gf_drive(1)</code> here.</p>
<p>After regeneration of <code>D2</code>, we will have all data disks regenerated. In this case we can proceed with regeneration of <code>PD</code> like in case #1: it's done in the code above by <code>gf_add</code>'ing data from all disk drives. If you remember, <code>gf_add</code> over a Galois field is a simple XOR operation, so instead of manually XOR'ing the bytes from all data drives, we can use the <code>gf_add</code> operation ;).</p>
<h1 id="case-7-loss-of-two-data-drives">Case 7. Loss of two data drives.</h1>
<p>This is the most interesting and most complicated scenario. Let's assume that the <code>D2</code> and <code>D3</code> drives failed. In this case, we need to somehow use <code>D1</code>, <code>PD</code> and <code>RS</code> drives to regenerate the missing drives.</p>
<p>This is a special approach, different than previous cases. The general approach is to generate the data for <code>D2</code> drive first, and then use the same appraoch as in case #2 to generate the data for <code>D3</code>. Here's the code:</p>
<pre data-lang="python" class="language-python "><code class="language-python" data-lang="python"># case 7 -- recover_d2_and_d3.py
from rs_functions import *
# These drives are still alive.
image1 = [ ord('f'), ord('i'), ord('r'), ord('s'), ord('t') ]
imagePD = [ 0x61, 0x64, 0x78, 0x6f, 0x74 ]
imageRS = [ 0x4d, 0x1e, 0x0d, 0x7a, 0x31 ]
# These drives are dead, we can't read from them.
image2 = [0] * 5
image3 = [0] * 5
for i in range(0, 5):
partialPD = gf_add(image1[i]) # add other drives if they exist
partialRS = gf_add(gf_mul(gf_drive(1), image1[i])) # add other drives if they exist
g = gf_div(1, gf_add(gf_drive(2), gf_drive(3)))
xoredPD = gf_add(partialPD, imagePD[i])
xoredRS = gf_add(partialRS, imageRS[i])
mid = gf_add(gf_mul(gf_drive(3), xoredPD), xoredRS) # gf_drive(3) is the second drive we've lost
# Regenerate data for D2.
data = gf_mul(mid, g)
image2[i] = data
# Regenerate data for D3.
image3[i] = gf_add(image1[i], image2[i], imagePD[i])
# or:
#
# image3[i] = gf_add(data, xoredPD)
dump_table("image2", image2)
dump_table("image3", image3)
</code></pre>
<p>First, we need to add all bytes from all existing data drives in order to generate a <code>partialPD</code>. In this example, all we have is just 1 data drive, so the <code>partialPD</code> parameter will simply be the content of the <code>D1</code> data drive. But, RAID-6 arrays can span across lots of disks, so if we'd have more than 1 data drive left, e.g. if we would have 3 still alive data drives, then the <code>partialPD</code> calculation would look like this:</p>
<pre data-lang="python" class="language-python "><code class="language-python" data-lang="python">partialPD = gf_add(image1[i], image2[i], image3[i])
</code></pre>
<p>Next, we'll need <code>partialRS</code> parameter. It can be calculated by adding the data from <em>existing drives</em> like this:</p>
<pre><code>partialRS = gf_add(A, B, C, ..., Z)
where A = gf_mul(gf_drive(1), image1[i])
B = gf_mul(gf_drive(2), image2[i]) if we have drive 2
C = gf_mul(gf_drive(3), image3[i]) if we have drive 3
etc.
</code></pre>
<p>In our case we have only 1 data drive left (<code>D1</code>), so our <code>partialRS</code> is just <code>gf_mul(gf_drive(1), image1[i])</code>.</p>
<p>Next, we need to generate the <code>g</code> parameter by dividing <code>1</code> with a sum of our dead drive indexes (<code>D2</code> and <code>D3</code>).</p>
<p>Next, the <code>xoredPD</code> parameter; it's calculated by adding the contents of the <code>PD</code> drive to our <code>partialPD</code> parameter, calculated earlier. Calculating the next <code>xoredRS</code> parameter is similar, it's a matter of adding <code>partialRS</code> to the contents of the <code>RS</code> drive.</p>
<p>Now comes the tricky part. We can calculate the data from the <em>first drive</em> that failed, so the data from <code>D2</code> drive. All we need to do is to multiply the index of the <em>second drive that failed</em> (drive <code>D3</code>) with <code>xoredPD</code> parameter, and add <code>xoredRS</code> parameter to the result. Then, after multiplying the result with <code>g</code> parameter, we'll have the contents of the <code>D2</code> drive. Whew!</p>
<p>Since we have just regenerated data for <code>D2</code> drive, from now on this case is no different than case #2 -- loss of one data drive (<code>D3</code>). In order to generate the <code>D3</code> drive, we simply have to add all our data drives that are alive (<code>D1</code> and <code>D2</code>) together with <code>PD</code>.</p>
<p>Done! We have all our data drives back.</p>
<h1 id="epilogue">Epilogue</h1>
<p>I've chosen the Python language to demonstrate that error recovery based on Reed-Solomon codes doesn't need much programming, nor computation power. It's very fast and the implementation can be pretty compact. Of course, in order to make things even faster, the implementation would probably need to be written using parallel processing, but since every byte can be calculated independently of others, the problem of parallelising isn't very hard.</p>
<p>It's also worth noting that the recovery method described in this post doesn't necessarily have to use whole disks in order to be effective. You can treat 'drives' as 'buffers' when transmiting data over a non-reliable medium and this error recovery still should be effective. It's more computation-heavy than e.g. Hamming codes, but the ability to correct 2 missing streams is pretty powerful when creating systems designed to be fail-safe.</p>
<p>Of course, RAID-6 is nowhere near being a new invention, and Reed-Solomon is even older. It was used even in Voyager 2's <a href="https://trs.jpl.nasa.gov/bitstream/handle/2014/34531/94-0881.pdf?sequence=1">exploration of the solar system mission</a>, which is pretty cool.</p>
<p>Newer alternatives for RS codes include <a href="https://blogrecherche.wp.imt.fr/en/2016/09/16/what-are-turbo-codes/">Turbocodes</a> -- I hope I'll have a chance to dig into them soon ;)</p>
<p>Anyway, have fun!</p>