Reed-Solomon error recovery in RAID-6

https://anadoxin.org/blog/error-recovery-in-raid6.html

Table of contents:

Sun, 29 November 2020 :: #datarecovery :: #python

There are lots of resources on the Internet about RAID-6 error recovery and how you can create your own implementation of it, but most of those resources require spending a lot of time fighting with mathematical equations and figuring out the real algorithm.

In this post I'll try to give you a simple example how you can create your own error recovery solution based on what is used in RAID-6. More specifically, if you need to provide rendundancy across your mediums so that a failure of 1 or 2 mediums will be tolerated, look no further! ;)

If you'll read this post, as a bonus you'll gain knowledge about how RAID-5 error recovery works, because RAID-6 is an improved version of RAID-5 error recovery system.

Overview

Let's assume you have 3 disk drives with some data. Let's name those drives as D1, D2 and D3. In order to use the same error recovery technique as RAID-6 uses, you'll need two additional disk drives, the PD drive, and the RS drive. I'll describe what PD and RS means in few moments. So, you'll need a total of 5 disk drives: D1, D2, D3, PD and RS.

So, here's the situation:

D1, D2 and D3 contain arbitrary user data, and it doesn't matter what their contents are. FWIW you can assume those drives are full of cat pictures.
The special PD drive (named after Parity Drive, sometimes called P in whitepapers) contains the XOR data, generated automatically from D1, D2 and D3.
The second special RS drive (named after Reed-Solomon Drive, sometimes also called Q) contains the Reed-Solomon codes, calculated from the same data as PD, namely from drives D1, D2 and D3.

Let's see how we can perform some basic operations on such disk array.

How the recovery works

If we have properly calculated PD and RS drives, we can lose up to 2 drives. How we recover from failures depends on which drives will fail. There are generally 7 cases that RAID-6 can handle. Next points will describe the scenarios, sorted from the easiest case, to the most complicated.

Loss of the PD drive (failure of only one drive).

This case is very straightforward. The PD drive contains only autogenerated data, so in case we lose the PD drive, we can regenerate it by using only user data (stored on disks D1, D2 and D3).
Loss of one of the data drives: either D1, D2 or D3 (failure of only one drive).

In this case we're losing data, but since we only lose 1 disk, the recovery scenario is the same as in RAID-5 error recovery: we'll use PD drive together with two non-missing data drives to recover data from the missing data drive. It doesn't matter which data drive we lose, because if we have 2 data drives and the PD drive, we can always generate data for the third drive. The RS drive is not needed to regenerate the data drive in this case (and is not used at all in this failure case).
Loss of the RS drive (failure of only one drive).

Similar to the situation from point 1: we have all the data drives, and we can simply regenerate the RS drive by calculating Reed-Solomon codes from drives that did not fail.
Loss of the PD drive and the RS drive (failure of two drives).

This case is very similar to points 1 or 3, we have all the data intact, so we can generate contents of PD drive and then RS drive very easily.
Loss of the RS drive and one data drive (failure of two drives).

In this case we're losing two disks, but only one lost disk is filled with data. Since we have PD drive intact, we can use it to regenerate data from missing data drive, so this case is not so different than case #2. After that, we will have all the data drives, so we can regenerate the RS drive easily.
Loss of the PD drive and one data drive (failure of two drives).

This case is more complicated. We lose one user data drive (in this example D3), and we don't have the PD drive to aid with recovery, because we've lost it as well. We have to use the RS drive in conjunction with all user data drives that are still available (D1 and D2) to regenerate the missing data drive D3. After we'll have all data drives regenerated, we can calculate the missing PD drive. This is the first case where recovery using Reed-Solomon codes comes into play.
Loss of two data drives (failure of two drives).

This is the most complicated scenario. We need to use both PD and RS to regenerate both data drives. Reed-Solomon coding makes this case possible.

In the following sections, I'll try to describe those cases above with more detail, and provide some source code (in Python :P) that will perform actual recovery of data.

Please keep in mind that real RAID-6 arrays don't really dedicate whole disk for PD or RS. Real arrays span this additional checksum data across all disks. There are multiple methods used by various controllers: left asynchronous, right synchronous, there can be an offset to the RAID data, there can be pattern delays, etc. Why it's done this way, and how exactly RAID-6 stripes looks like is beyond the scope of this blog post. So let's stick only to Reed-Solomon codes.

Test data

Let's define how our "user data" looks like. To keep things simple, let's pretend that our "disk drives" are 5 bytes big.

Disk	Data in ASCII	Data in HEX
`D1`	f i r s t	0x66, 0x69, 0x72, 0x73, 0x74
`D2`	s e c n d	0x73, 0x65, 0x63, 0x6e, 0x64
`D3`	t h i r d	0x74, 0x68, 0x69, 0x72, 0x64

Let's go into the scenarios mentioned above in more details.

Case 1. Loss of `PD` drive.

In order to generate the PD data, we need only the user data drives. In our case it's D1, D2 and D3. The PD drive consists of nothing more than XOR of all user data.

To generate offset 0 of the PD drive, you need to XOR all bytes from offset 0 from all disk drives.
Then, to generate offset 1 of the PD drive, you need to XOR bytes from offset 1 from all disk drives. E.g.:

PD[0] = D1[0] xor D2[0] xor D3[0]
PD[1] = D1[1] xor D2[1] xor D3[1]
PD[2] = D1[2] xor D2[2] xor D3[2]
PD[3] = D1[3] xor D2[3] xor D3[3]
PD[4] = D1[4] xor D2[4] xor D3[4]

Example:

PD[0] = 0x66 xor 0x73 xor 0x74  =>  0x61
PD[1] = 0x69 xor 0x65 xor 0x63  =>  0x64
PD[2] = 0x72 xor 0x63 xor 0x69  =>  0x78
PD[3] = 0x73 xor 0x6e xor 0x72  =>  0x6f
PD[4] = 0x74 xor 0x64 xor 0x64  =>  0x74

Yes, it's that simple. Do it for the whole drives (in our case, 5 bytes), and you'll have the properly generated PD drive:

Disk	Data in HEX
`PD`	0x61, 0x64, 0x78, 0x6f, 0x74

So, in case when only your PD drive will fail, you can see it's trivial to regenerate it from the data drives D1, D2 and D3.

Case 2. Loss of one of the data drives: either `D1`, `D2` or `D3`.

By the way, this is how RAID-5 error recovery works. If only one drive with user data will fail, we can use the PD drive to recalculate missing user data.

Let's say we're losing D2, so the drives that are still working are: D1, D3, PD and RS. In this case, we don't even look at RS. All we need are the D1, D3 and PD drives. To calculate missing data, you can use the XOR function again, like in the previous point.

To recover user data from offset 0, XOR the bytes from offsets 0 of disks with user data you haven't lost (D1 and D3) together with the byte from offset 0 from the PD drive. Do the same thing for offset 1, like this:

D2[0] = D1[0] xor D3[0] xor PD[0]
D2[1] = D1[1] xor D3[1] xor PD[1]
D2[2] = D1[2] xor D3[2] xor PD[2]
D2[3] = D1[3] xor D3[3] xor PD[3]
D2[4] = D1[4] xor D3[4] xor PD[4]

Example:

D2[0] = 0x66 xor 0x74 xor 0x61  =>  0x73 (s)
D2[1] = 0x69 xor 0x63 xor 0x64  =>  0x65 (e)
D2[2] = 0x72 xor 0x69 xor 0x78  =>  0x63 (c)
D2[3] = 0x73 xor 0x72 xor 0x6f  =>  0x6e (n)
D2[4] = 0x74 xor 0x64 xor 0x74  =>  0x64 (d)

As you can see, we can easily recover data from the missing drive. It doesn't matter which drive is missing; the XOR function will work anyway.

Case 3. Loss of the `RS` drive.

Now we enter the realm of the Reed-Solomon codes and Galois fields. But don't worry, you don't have to be a mathematician in order to use it.

When we lose only RS drive, or when we initialize a new RAID-6-like redundation system, we simply need to regenerate it. In order to do that, we need to use the gflog and gfilog tables, which are always constant, plus data from our existing data drives D1, D2 and D3.

This is the gflog table, it always stays the same:

    0x00, 0x00, 0x01, 0x19, 0x02, 0x32, 0x1a, 0xc6, 0x03, 0xdf, 0x33, 0xee, 0x1b, 0x68, 0xc7, 0x4b,
    0x04, 0x64, 0xe0, 0x0e, 0x34, 0x8d, 0xef, 0x81, 0x1c, 0xc1, 0x69, 0xf8, 0xc8, 0x08, 0x4c, 0x71,
    0x05, 0x8a, 0x65, 0x2f, 0xe1, 0x24, 0x0f, 0x21, 0x35, 0x93, 0x8e, 0xda, 0xf0, 0x12, 0x82, 0x45,
    0x1d, 0xb5, 0xc2, 0x7d, 0x6a, 0x27, 0xf9, 0xb9, 0xc9, 0x9a, 0x09, 0x78, 0x4d, 0xe4, 0x72, 0xa6,
    0x06, 0xbf, 0x8b, 0x62, 0x66, 0xdd, 0x30, 0xfd, 0xe2, 0x98, 0x25, 0xb3, 0x10, 0x91, 0x22, 0x88,
    0x36, 0xd0, 0x94, 0xce, 0x8f, 0x96, 0xdb, 0xbd, 0xf1, 0xd2, 0x13, 0x5c, 0x83, 0x38, 0x46, 0x40,
    0x1e, 0x42, 0xb6, 0xa3, 0xc3, 0x48, 0x7e, 0x6e, 0x6b, 0x3a, 0x28, 0x54, 0xfa, 0x85, 0xba, 0x3d,
    0xca, 0x5e, 0x9b, 0x9f, 0x0a, 0x15, 0x79, 0x2b, 0x4e, 0xd4, 0xe5, 0xac, 0x73, 0xf3, 0xa7, 0x57,
    0x07, 0x70, 0xc0, 0xf7, 0x8c, 0x80, 0x63, 0x0d, 0x67, 0x4a, 0xde, 0xed, 0x31, 0xc5, 0xfe, 0x18,
    0xe3, 0xa5, 0x99, 0x77, 0x26, 0xb8, 0xb4, 0x7c, 0x11, 0x44, 0x92, 0xd9, 0x23, 0x20, 0x89, 0x2e,
    0x37, 0x3f, 0xd1, 0x5b, 0x95, 0xbc, 0xcf, 0xcd, 0x90, 0x87, 0x97, 0xb2, 0xdc, 0xfc, 0xbe, 0x61,
    0xf2, 0x56, 0xd3, 0xab, 0x14, 0x2a, 0x5d, 0x9e, 0x84, 0x3c, 0x39, 0x53, 0x47, 0x6d, 0x41, 0xa2,
    0x1f, 0x2d, 0x43, 0xd8, 0xb7, 0x7b, 0xa4, 0x76, 0xc4, 0x17, 0x49, 0xec, 0x7f, 0x0c, 0x6f, 0xf6,
    0x6c, 0xa1, 0x3b, 0x52, 0x29, 0x9d, 0x55, 0xaa, 0xfb, 0x60, 0x86, 0xb1, 0xbb, 0xcc, 0x3e, 0x5a,
    0xcb, 0x59, 0x5f, 0xb0, 0x9c, 0xa9, 0xa0, 0x51, 0x0b, 0xf5, 0x16, 0xeb, 0x7a, 0x75, 0x2c, 0xd7,
    0x4f, 0xae, 0xd5, 0xe9, 0xe6, 0xe7, 0xad, 0xe8, 0x74, 0xd6, 0xf4, 0xea, 0xa8, 0x50, 0x58, 0xaf.

This is the gfilog table, it's also constant:

    0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1d, 0x3a, 0x74, 0xe8, 0xcd, 0x87, 0x13, 0x26,
    0x4c, 0x98, 0x2d, 0x5a, 0xb4, 0x75, 0xea, 0xc9, 0x8f, 0x03, 0x06, 0x0c, 0x18, 0x30, 0x60, 0xc0,
    0x9d, 0x27, 0x4e, 0x9c, 0x25, 0x4a, 0x94, 0x35, 0x6a, 0xd4, 0xb5, 0x77, 0xee, 0xc1, 0x9f, 0x23,
    0x46, 0x8c, 0x05, 0x0a, 0x14, 0x28, 0x50, 0xa0, 0x5d, 0xba, 0x69, 0xd2, 0xb9, 0x6f, 0xde, 0xa1,
    0x5f, 0xbe, 0x61, 0xc2, 0x99, 0x2f, 0x5e, 0xbc, 0x65, 0xca, 0x89, 0x0f, 0x1e, 0x3c, 0x78, 0xf0,
    0xfd, 0xe7, 0xd3, 0xbb, 0x6b, 0xd6, 0xb1, 0x7f, 0xfe, 0xe1, 0xdf, 0xa3, 0x5b, 0xb6, 0x71, 0xe2,
    0xd9, 0xaf, 0x43, 0x86, 0x11, 0x22, 0x44, 0x88, 0x0d, 0x1a, 0x34, 0x68, 0xd0, 0xbd, 0x67, 0xce,
    0x81, 0x1f, 0x3e, 0x7c, 0xf8, 0xed, 0xc7, 0x93, 0x3b, 0x76, 0xec, 0xc5, 0x97, 0x33, 0x66, 0xcc,
    0x85, 0x17, 0x2e, 0x5c, 0xb8, 0x6d, 0xda, 0xa9, 0x4f, 0x9e, 0x21, 0x42, 0x84, 0x15, 0x2a, 0x54,
    0xa8, 0x4d, 0x9a, 0x29, 0x52, 0xa4, 0x55, 0xaa, 0x49, 0x92, 0x39, 0x72, 0xe4, 0xd5, 0xb7, 0x73,
    0xe6, 0xd1, 0xbf, 0x63, 0xc6, 0x91, 0x3f, 0x7e, 0xfc, 0xe5, 0xd7, 0xb3, 0x7b, 0xf6, 0xf1, 0xff,
    0xe3, 0xdb, 0xab, 0x4b, 0x96, 0x31, 0x62, 0xc4, 0x95, 0x37, 0x6e, 0xdc, 0xa5, 0x57, 0xae, 0x41,
    0x82, 0x19, 0x32, 0x64, 0xc8, 0x8d, 0x07, 0x0e, 0x1c, 0x38, 0x70, 0xe0, 0xdd, 0xa7, 0x53, 0xa6,
    0x51, 0xa2, 0x59, 0xb2, 0x79, 0xf2, 0xf9, 0xef, 0xc3, 0x9b, 0x2b, 0x56, 0xac, 0x45, 0x8a, 0x09,
    0x12, 0x24, 0x48, 0x90, 0x3d, 0x7a, 0xf4, 0xf5, 0xf7, 0xf3, 0xfb, 0xeb, 0xcb, 0x8b, 0x0b, 0x16,
    0x2c, 0x58, 0xb0, 0x7d, 0xfa, 0xe9, 0xcf, 0x83, 0x1b, 0x36, 0x6c, 0xd8, 0xad, 0x47, 0x8e, 0x01.

You don't have to include those tables in your program if you don't want to; you can use this algorithm to generate them in the runtime:

# gflog_tables.py

def generate_tables():
    polynomial = 0x11d
    s = 8
    gf_elements = 1 << s

    gflog = gf_elements * [0]
    gfilog = gf_elements * [0]

    b = 1
    for i in range(0, gf_elements):
        gflog[b] = i & 255
        gfilog[i] = b & 255
        b <<= 1
        if b & gf_elements:
            b ^= polynomial

    gflog[1] = 0;
    return (gflog, gfilog)

def dump_table(caption, tab):
    item = 0
    print("--- {} ---".format(caption))
    for i in tab:
        print("0x{:02x}, ".format(i), end="")
        item += 1
        if item % 16 == 0:
            item = 0
            print()
    print("")

(gflog, gfilog) = generate_tables()

# Uncomment if you want to see the tables on the console:
#
# dump_table("gflog", gflog)
# dump_table("gfilog", gfilog)

After declaring those tables, we need to define some operations. We are working now in a finite field, and basic arithmetic operations have a different implementation (although their meaning is kind of similar). We have to redefine basic operations like add, multiply or divide:

# rs_functions.py

from gflog_tables import *

# Addition
def gf_add(*args):
    result = 0
    for arg in args:
        result ^= arg

    return result

# Indexing
# First drive is 1, second drive is 2, etc...
def gf_drive(index):
    global gfilog

    return gfilog[index - 1]

# Multiplication
def gf_mul(a, b):
    global gflog
    global gfilog

    if a == 0 or b == 0:
        return 0
    else:
        return gfilog[(gflog[a] + gflog[b]) % 255]

# Division helper
def sub_gf8(a, b):
    if a > b:
        return a - b
    else:
        return (255 - (0 - (a - b)))

# Division
def gf_div(a, b):
    global gfilog
    global gflog

    return gfilog[sub_gf8(gflog[a], gflog[b])]

Since we have our helper functions declared, let's try to generate the data of the RS drive.

# case 3 -- recover_rs.py

from rs_functions import *

# Here are our drives, together with their data.
image1 = [ ord('f'), ord('i'), ord('r'), ord('s'), ord('t') ]
image2 = [ ord('s'), ord('e'), ord('c'), ord('n'), ord('d') ]
image3 = [ ord('t'), ord('h'), ord('i'), ord('r'), ord('d') ]

# This is a placeholder for our RS drive. It will be regenerated
# in the lines below.
imageRS = [0] * 5

# And this is our loop that generates the RS data using nothing more
# than the user data drives.
for i in range(0, 5):
    imageRS[i] = gf_add(gf_mul(gf_drive(1), image1[i]),
                        gf_mul(gf_drive(2), image2[i]),
                        gf_mul(gf_drive(3), image3[i]))

dump_table("imageRS", imageRS)

Thus, after running the recover_rs.py script, we can see that our RS drive contains this data:

Disk	Data in HEX
`RS`	0x4d, 0x1e, 0x0d, 0x7a, 0x31

As of this moment, our disks D1, D2 and D3 are guarded by a full RAID-6 error correction algorithm, because we have a properly generated PD and RS drives.

One important thing to remember that our current RS data is valid only for our drives D1, D2 and D3 in this particular order. So, RS for D1, D2 and D3 will be different than for disks D3, D2 and D1, even if the actual data on the drives are the same. This is an important thing to remember, because when recovering data from broken RAID-6 arrays, you have to know the proper sequence of disks inside the array. Fortunately, if the array is not big, you can bruteforce generation of RS data so that you can discover the proper sequence of disks.

Case 4. Loss of the `PD` drive and the `RS` drive.

This case is not complicated, because it can be solved by executing case #1 first, then proceeding with execution of case #3.

To reiterate, in this case we have all user data intact. We can use this data to generate the PD drive. Then, we can also use the same data to generate the RS drive. Both cases were already described in point 1 and point 3 above.

Case 5. Loss of the `RS` drive and one data drive.

This case is also not complicated. Since we've lost only 1 data drive, but we still have the PD drive, we can execute case #2 to regenerate missing data drive. Then, we can use all data disks to regenerate the RS drive, like in case #3. After that, we'll have all drives regenerated without errors.

Case 6. Loss of the `PD` drive and one data drive.

The general approach is to first regenerate the missing data drive using other data drives combined with the RS drive, then after we'll have all data drives regenerated we can proceed with regenerating the PD drive (case #2).

We have to do some calculations in order to recover in this case. Let's assume that together with losing PD, we're losing data drive D2 as well. So, we have drives: D1, D3, and RS.

Thanks to the RS drive, we can regenerate D2 by combining D1, D3 and RS, like this:

# case 6 -- recover_d2_and_pd.py

from rs_functions import *

# We have these drives...
image1 = [ ord('f'), ord('i'), ord('r'), ord('s'), ord('t') ]
image3 = [ ord('t'), ord('h'), ord('i'), ord('r'), ord('d') ]
imageRS = [ 0x4d, 0x1e, 0x0d, 0x7a, 0x31 ]

# ...and these drives are dead
imagePD = [0] * 5
image2 = [0] * 5

for i in range(0, 5):
    partialRS = gf_add(gf_mul(gf_drive(1), image1[i]),
                       imageRS[i],  # Use RS drive instead of the dead drive.
                       gf_mul(gf_drive(3), image3[i]))

    # gf_drive(2) is our dead drive.
    div_result = gf_div(1, gf_drive(2))

    # This will generate the data from the dead D2 drive.
    image2[i] = gf_mul(div_result, partialRS)

    # This will generate the data from the dead PD drive.
    imagePD[i] = gf_add(image1[i], image2[i], image3[i])

dump_table("image2", image2)
dump_table("imagePD", imagePD)

First, we need to generate the partialRS value by gf_add'ing the return values of gf_mul for all bytes of all valid disks, together with the value of the RS drive in place for the missing data disk (in our case, D2).

Then, we use the partialRS value to regenerate the D2 data by dividing 1 with the dead drive's index (gf_drive(2)), and multiplying the result by partialRS. The gf_drive(2) argument specifies the index of our dead disk -- if our D1 drive would fail, we would use gf_drive(1) here.

After regeneration of D2, we will have all data disks regenerated. In this case we can proceed with regeneration of PD like in case #1: it's done in the code above by gf_add'ing data from all disk drives. If you remember, gf_add over a Galois field is a simple XOR operation, so instead of manually XOR'ing the bytes from all data drives, we can use the gf_add operation ;).

Case 7. Loss of two data drives.

This is the most interesting and most complicated scenario. Let's assume that the D2 and D3 drives failed. In this case, we need to somehow use D1, PD and RS drives to regenerate the missing drives.

This is a special approach, different than previous cases. The general approach is to generate the data for D2 drive first, and then use the same appraoch as in case #2 to generate the data for D3. Here's the code:

# case 7 -- recover_d2_and_d3.py

from rs_functions import *

# These drives are still alive.
image1 = [ ord('f'), ord('i'), ord('r'), ord('s'), ord('t') ]
imagePD = [ 0x61, 0x64, 0x78, 0x6f, 0x74 ]
imageRS = [ 0x4d, 0x1e, 0x0d, 0x7a, 0x31 ]

# These drives are dead, we can't read from them.
image2 = [0] * 5
image3 = [0] * 5

for i in range(0, 5):
    partialPD = gf_add(image1[i]) # add other drives if they exist
    partialRS = gf_add(gf_mul(gf_drive(1), image1[i])) # add other drives if they exist

    g = gf_div(1, gf_add(gf_drive(2), gf_drive(3)))
    xoredPD = gf_add(partialPD, imagePD[i])
    xoredRS = gf_add(partialRS, imageRS[i])
    mid = gf_add(gf_mul(gf_drive(3), xoredPD), xoredRS) # gf_drive(3) is the second drive we've lost

    # Regenerate data for D2.
    data = gf_mul(mid, g)
    image2[i] = data

    # Regenerate data for D3.
    image3[i] = gf_add(image1[i], image2[i], imagePD[i])

    # or:
    #
    # image3[i] = gf_add(data, xoredPD)

dump_table("image2", image2)
dump_table("image3", image3)

First, we need to add all bytes from all existing data drives in order to generate a partialPD. In this example, all we have is just 1 data drive, so the partialPD parameter will simply be the content of the D1 data drive. But, RAID-6 arrays can span across lots of disks, so if we'd have more than 1 data drive left, e.g. if we would have 3 still alive data drives, then the partialPD calculation would look like this:

partialPD = gf_add(image1[i], image2[i], image3[i])

Next, we'll need partialRS parameter. It can be calculated by adding the data from existing drives like this:

partialRS = gf_add(A, B, C, ..., Z)

where A = gf_mul(gf_drive(1), image1[i])
      B = gf_mul(gf_drive(2), image2[i]) if we have drive 2
      C = gf_mul(gf_drive(3), image3[i]) if we have drive 3

etc.

In our case we have only 1 data drive left (D1), so our partialRS is just gf_mul(gf_drive(1), image1[i]).

Next, we need to generate the g parameter by dividing 1 with a sum of our dead drive indexes (D2 and D3).

Next, the xoredPD parameter; it's calculated by adding the contents of the PD drive to our partialPD parameter, calculated earlier. Calculating the next xoredRS parameter is similar, it's a matter of adding partialRS to the contents of the RS drive.

Now comes the tricky part. We can calculate the data from the first drive that failed, so the data from D2 drive. All we need to do is to multiply the index of the second drive that failed (drive D3) with xoredPD parameter, and add xoredRS parameter to the result. Then, after multiplying the result with g parameter, we'll have the contents of the D2 drive. Whew!

Since we have just regenerated data for D2 drive, from now on this case is no different than case #2 -- loss of one data drive (D3). In order to generate the D3 drive, we simply have to add all our data drives that are alive (D1 and D2) together with PD.

Done! We have all our data drives back.

Epilogue

I've chosen the Python language to demonstrate that error recovery based on Reed-Solomon codes doesn't need much programming, nor computation power. It's very fast and the implementation can be pretty compact. Of course, in order to make things even faster, the implementation would probably need to be written using parallel processing, but since every byte can be calculated independently of others, the problem of parallelising isn't very hard.

It's also worth noting that the recovery method described in this post doesn't necessarily have to use whole disks in order to be effective. You can treat 'drives' as 'buffers' when transmiting data over a non-reliable medium and this error recovery still should be effective. It's more computation-heavy than e.g. Hamming codes, but the ability to correct 2 missing streams is pretty powerful when creating systems designed to be fail-safe.

Of course, RAID-6 is nowhere near being a new invention, and Reed-Solomon is even older. It was used even in Voyager 2's exploration of the solar system mission, which is pretty cool.

Newer alternatives for RS codes include Turbocodes -- I hope I'll have a chance to dig into them soon ;)

Anyway, have fun!

Reed-Solomon error recovery in RAID-6

Overview

How the recovery works

Test data

Case 1. Loss of `PD` drive.

Case 2. Loss of one of the data drives: either `D1`, `D2` or `D3`.

Case 3. Loss of the `RS` drive.

Case 4. Loss of the `PD` drive and the `RS` drive.

Case 5. Loss of the `RS` drive and one data drive.

Case 6. Loss of the `PD` drive and one data drive.

Case 7. Loss of two data drives.

Epilogue

Links

Social

Tags

Feeds

Reed-Solomon error recovery in RAID-6

Overview

How the recovery works

Test data

Case 1. Loss of PD drive.

Case 2. Loss of one of the data drives: either D1, D2 or D3.

Case 3. Loss of the RS drive.

Case 4. Loss of the PD drive and the RS drive.

Case 5. Loss of the RS drive and one data drive.

Case 6. Loss of the PD drive and one data drive.

Case 7. Loss of two data drives.

Epilogue

Links

Social

Tags

Feeds

Case 1. Loss of `PD` drive.

Case 2. Loss of one of the data drives: either `D1`, `D2` or `D3`.

Case 3. Loss of the `RS` drive.

Case 4. Loss of the `PD` drive and the `RS` drive.

Case 5. Loss of the `RS` drive and one data drive.

Case 6. Loss of the `PD` drive and one data drive.