Losing 2 disks on a RAID-5 array

Today I was quite disappointed when I saw that my RAID-5 array had suddenly lost 2 out of 4 drives. As you may know, losing 1 drive on RAID-5 is OK, losing 2 is not ok at all, it usually means that you have lost all your data.

In fact, my failure today was due to some electrical problems. If you are following this blog you know that my RAID drives are plugged to the server (Debian GNU/Linux) using USB, which is an extremely bad idea (don’t do that at home ;-) ). And to add more on my stupidity, in order to reduce power consumption I changed my hard drives to laptop hard drives and have them powered through the USB hub… which was not plugged to the UPS. So today there was a power failure at home and since the server’s USB was not providing enough power, two drives went off.

Since nothing was being written when it occurred, I know that the content on every drive was still good, but mdadm reported the array as degraded and reading was not really possible anymore.

So, what to do in that case? From what I have seen, the first thing is to stop the array, then to try to reassemble it with various options (but do not try to re-add the “failing” drives). Obviously I did some mistake… So, if at some point mdadm --assemble with any kind of options does not work, re-creating the array might be your last solution. At least it worked for me.

But be careful, when creating the array, you have to provide the same options (chunk size…) as it was before, and you have to keep the drives in the SAME ORDER. And when you have drives on USB, the order is a bit random (maybe I should have looked at each disk’s UUID and write the order somewhere).

So I re created the array with the following command (DON’T FORGET “—assume-clean” otherwise mdadm will start re-synchronizing your disks and it’s something you may not want):

mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 --assume-clean /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

After doing so… it was still not working. Why? Because I didn’t gave the right disk order! With 4 drives I have something like 24 different possibilities. How to find out if it’s the right one? Well, that’s quite easy, you should be able to mount the disk ;-) . Doing a fsck might also be a good idea (don’t forget the -n option as you don’t want to write on the drive until you are sure that it’s the correct order).

I was quite lucky since I found the right one on the second try.

Related posts:

Restoring your RAID array

I’m very lucky those days. 2 weeks ago my laptop hard drive died. Today I lost a drive in my RAID array.

Few month ago I moved my RAID array from SATA to USB but USB components seems to be of lower quality than SATA ones. So from time to time I see some “USB reset” when reading/writing a lot of data, fortunately it only stop transfers for a few seconds before resuming.

Today my server was fsck‘ing the array when a disk failed badly. It looks like the external drive’s USB controller crashed. The disk was not visible anymore from my server (I had to manually restart the external drive).

Since it’s a RAID 5 array, everything was still working fine, in degraded mode:

$ sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Sat Jan 19 17:03:46 2008
     Raid Level : raid5
     Array Size : 1465151808 (1397.28 GiB 1500.32 GB)
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent
 
    Update Time : Wed Jul 22 22:16:04 2009
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
 
         Layout : left-symmetric
     Chunk Size : 64K
 
           UUID : 02918692:35547ed8:ccb6a325:e4cda885 (local to host seth)
         Events : 0.3774
 
    Number   Major   Minor   RaidDevice State
       0       8       65        0      active sync   /dev/sde1
       1       0        0        1      removed
       2       8       81        2      active sync   /dev/sdf1
       3       8       33        3      active sync   /dev/sdc1

Then adding back the missing drive:

$ sudo mdadm --manage /dev/md0 --add /dev/sdd1
mdadm: re-added /dev/sdd1
$ sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Sat Jan 19 17:03:46 2008
     Raid Level : raid5
     Array Size : 1465151808 (1397.28 GiB 1500.32 GB)
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent
 
    Update Time : Wed Jul 22 22:19:23 2009
          State : clean, degraded, recovering
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1
 
         Layout : left-symmetric
     Chunk Size : 64K
 
 Rebuild Status : 0% complete
 
           UUID : 02918692:35547ed8:ccb6a325:e4cda885 (local to host seth)
         Events : 0.3780
 
    Number   Major   Minor   RaidDevice State
       0       8       65        0      active sync   /dev/sde1
       4       8       49        1      spare rebuilding   /dev/sdd1
       2       8       81        2      active sync   /dev/sdf1
       3       8       33        3      active sync   /dev/sdc1

Now I can monitor the slow process of rebuilding the array:

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd1[4] sde1[0] sdc1[3] sdf1[2]
      1465151808 blocks level 5, 64k chunk, algorithm 2 [4/3] [U_UU]
      [>....................]  recovery =  2.2% (11184896/488383936) finish=661.5min speed=12019K/sec
 
unused devices: <none>

At least 11 hours to rebuild!!! So, for people as stupid as me: don’t use USB for a RAID array if you are not obliged to, it’s not very safe and it’s a bit slow.

Moving a RAID array

I blogged about how to create a RAID-5 array with GNU/Linux in a previous entry.

So my home server is running a 4 disks array (about 500 GB each, for a total of 1.5 TB available). I used SATA disks plugged in the server. For some silly reasons I wanted to move the disks out of the server and plug them using a USB interface (no, not a eSATA one, USB (I told you, it’s silly, but it could have been worse, like creating a floppy RAID array)).

I bought 4 Icy Box enclosures and a USB hub. Shut down the server, move the disks into the external enclosures, plug everything (lots of wires), switch on the server, cry.

As quite expected, it didn’t work right away. My Debian server stopped on a maintenance shell, complaining that it was not able to check /dev/md0 (the array).

No problem, I tried a simple command:

mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

Array assembled, exit the shell, Debian finished booting, everything works. But, in doubt, I did a reboot. Again, array not recognized. After a bit of googling and man reading, I tried the same command with a little option added:

mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 --update=homehost

Not really sure about what it did, but after rebooting, the array was recognized and assembled automatically.

So YES, you can change the controller interface used to plug your disks in an RAID array.

What about speed? Well… high speed is obviously not the purpose of this experiment ;-) (but at least it’s > 30 MB/s in continuous read).

What about reliability? Well… it’s Raid-5 so I might have to rebuild from time to time. So far I only get some USB reset events so transfer stall during ~20 seconds then resume. Of course I get more resets when doing a big transfer.