Losing 2 disks on a RAID-5 array

Today I was quite disappointed when I saw that my RAID-5 array had suddenly lost 2 out of 4 drives. As you may know, losing 1 drive on RAID-5 is OK, losing 2 is not ok at all, it usually means that you have lost all your data.

In fact, my failure today was due to some electrical problems. If you are following this blog you know that my RAID drives are plugged to the server (Debian GNU/Linux) using USB, which is an extremely bad idea (don’t do that at home ;-) ). And to add more on my stupidity, in order to reduce power consumption I changed my hard drives to laptop hard drives and have them powered through the USB hub… which was not plugged to the UPS. So today there was a power failure at home and since the server’s USB was not providing enough power, two drives went off.

Since nothing was being written when it occurred, I know that the content on every drive was still good, but mdadm reported the array as degraded and reading was not really possible anymore.

So, what to do in that case? From what I have seen, the first thing is to stop the array, then to try to reassemble it with various options (but do not try to re-add the “failing” drives). Obviously I did some mistake… So, if at some point mdadm --assemble with any kind of options does not work, re-creating the array might be your last solution. At least it worked for me.

But be careful, when creating the array, you have to provide the same options (chunk size…) as it was before, and you have to keep the drives in the SAME ORDER. And when you have drives on USB, the order is a bit random (maybe I should have looked at each disk’s UUID and write the order somewhere).

So I re created the array with the following command (DON’T FORGET “—assume-clean” otherwise mdadm will start re-synchronizing your disks and it’s something you may not want):

mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 --assume-clean /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

After doing so… it was still not working. Why? Because I didn’t gave the right disk order! With 4 drives I have something like 24 different possibilities. How to find out if it’s the right one? Well, that’s quite easy, you should be able to mount the disk ;-) . Doing a fsck might also be a good idea (don’t forget the -n option as you don’t want to write on the drive until you are sure that it’s the correct order).

I was quite lucky since I found the right one on the second try.

Related posts:

Goodbye TB

A long time ago (2004?) I bought a LaCie Bigger Disk Extreme 1TB. At that time it was quite a huge hard drive (by size and capacity), biggest single disks were about 250GB (given the size I supposed that the LaCie was made of 4 250GB disks in some RAID-0 or JBOD array).

So five years later (last week) it died. But even now, losing 1TB of data is a bit disappointing. Hopefully there was not much important data on it (I store important data on a RAID-5 array or on a TimeMachined part of my Mac).

So, what to do with a dead disk? Dismantle it of course.

First unscrew the back panel:

Push the content / pull the case:

Unscrew everything else:

So yes, there was 4 Maxtor MaXLine Plus II 250GB hard disks inside. I already tested 2 of them, one is dead, the other one is fine. I hope only one is dead so I will be able to reuse 750GB of the original 1TB.

In the overall I think it’s the third hard drive dying this year. Last year it was two mother boards. Why electronic hardware is so unreliable lately? (and I try to do my best to take care of it but it looks like it’s not enough, hardware is still failing at some point)

I still have somewhere an old Victor computer from the 80s (I can’t find it on the internet, but it’s not a Victor 9000) and it’s still working (at least last time I plugged it in).

Posted in Uncategorized. Tags: , . No Comments »

Restoring your RAID array

I’m very lucky those days. 2 weeks ago my laptop hard drive died. Today I lost a drive in my RAID array.

Few month ago I moved my RAID array from SATA to USB but USB components seems to be of lower quality than SATA ones. So from time to time I see some “USB reset” when reading/writing a lot of data, fortunately it only stop transfers for a few seconds before resuming.

Today my server was fsck‘ing the array when a disk failed badly. It looks like the external drive’s USB controller crashed. The disk was not visible anymore from my server (I had to manually restart the external drive).

Since it’s a RAID 5 array, everything was still working fine, in degraded mode:

$ sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Sat Jan 19 17:03:46 2008
     Raid Level : raid5
     Array Size : 1465151808 (1397.28 GiB 1500.32 GB)
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent
 
    Update Time : Wed Jul 22 22:16:04 2009
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
 
         Layout : left-symmetric
     Chunk Size : 64K
 
           UUID : 02918692:35547ed8:ccb6a325:e4cda885 (local to host seth)
         Events : 0.3774
 
    Number   Major   Minor   RaidDevice State
       0       8       65        0      active sync   /dev/sde1
       1       0        0        1      removed
       2       8       81        2      active sync   /dev/sdf1
       3       8       33        3      active sync   /dev/sdc1

Then adding back the missing drive:

$ sudo mdadm --manage /dev/md0 --add /dev/sdd1
mdadm: re-added /dev/sdd1
$ sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Sat Jan 19 17:03:46 2008
     Raid Level : raid5
     Array Size : 1465151808 (1397.28 GiB 1500.32 GB)
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent
 
    Update Time : Wed Jul 22 22:19:23 2009
          State : clean, degraded, recovering
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1
 
         Layout : left-symmetric
     Chunk Size : 64K
 
 Rebuild Status : 0% complete
 
           UUID : 02918692:35547ed8:ccb6a325:e4cda885 (local to host seth)
         Events : 0.3780
 
    Number   Major   Minor   RaidDevice State
       0       8       65        0      active sync   /dev/sde1
       4       8       49        1      spare rebuilding   /dev/sdd1
       2       8       81        2      active sync   /dev/sdf1
       3       8       33        3      active sync   /dev/sdc1

Now I can monitor the slow process of rebuilding the array:

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd1[4] sde1[0] sdc1[3] sdf1[2]
      1465151808 blocks level 5, 64k chunk, algorithm 2 [4/3] [U_UU]
      [>....................]  recovery =  2.2% (11184896/488383936) finish=661.5min speed=12019K/sec
 
unused devices: <none>

At least 11 hours to rebuild!!! So, for people as stupid as me: don’t use USB for a RAID array if you are not obliged to, it’s not very safe and it’s a bit slow.

Restoring your system from a Linux based remote Time Machine backup

In a previous post I explained that I configured my Debian GNU/Linux server to act as a Time Machine server.

The purpose of Time Machine is to backup your Mac and to allow you to retrieve some files you deleted or to restore your system. Last week my MacBook’s hard drive suddenly died without a warning. Since I was moving in my new apartment, the last backup was few days before, but at least I had a quite recent backup.

The question was: does Time Machine will allow me to restore my backup as expected?

I bought a new hard drive, plugged it, boot on Mac OS X install DVD, selected Utilities/Restore system from backup… then nothing. No backup listed and the “Connect to remote disk” is grayed/disabled :-( .

I was a bit disappointed. I launched the terminal and tried to manually mount the remote backup but I failed. This was the good solution, I just didn’t know the right command to mount a remote apple volume. After googling a bit (thanks to my iPhone) I found the right instructions:

  • Create a directory where to mount the remote backup
# mkdir /Volumes/backup
  • Mount remote volume
# mount_afp afp://login:password@hostname/volumename /Volumes/backup

Then relaunch the “Restore from backup” utility, the remote volume was now listed. Just had to select and start waiting (restoring the system may be quite long).

RAID-5 software sous GNU/Linux : la panne

Un jour où l’autre, la panne peut arriver, autant s’y préparer un peu. Malheureusement c’est souvent la panne à laquelle on ne s’attend pas qui se produit.

Pour tester mon installation en RAID-5, après avoir mis une centaine de giga octets de données dessus, j’ai débranché un disque puis je l’ai rebranché. Comme prévu le disque s’est remis en synchronisation.

Malheureusement, pendant la synchronisation un des autres disques a disparu. 2 disques non synchronisés = perte de toutes les données. Raison de la perte du disque : câble SATA défectueux (dans un ordinateur il y a plein de composants très compliqués et sensibles, forcément c’est un câble tout bête qui déconne).

Après quelques tentatives avec mdadm pour remettre sur pied l’array de disques, je suis arrivé à avoir 3 disques actifs sur 4, mais impossible de démarrer l’array, même avec les options pour forcer le démarrage.

Lorsque linux écrit sur les disques, il indique une sorte de numéro de transaction. Malheureusement, mon disque avec le problème de câble a un numéro pas à jour, alors que je n’ai rien écrit sur le RAID-5 depuis la perte des 2 disques (l’opération aurait de toute façon été impossible). C’est bien dommage d’avoir 3 disques bon (ce qui dans mon cas est donc suffisant) et ne pas pouvoir faire redémarrer l’array.

Grâce à mon ami Google, je suis tombé sur la commande suivante :

# echo "clean" > /sys/block/md0/md/array_state

Elle consiste juste à faire croire que l’array de disques fonctionne bien… et ça marche. Tout est reparti d’un coup, mes données toujours accessibles et non corrompues.

La page qui m’a beaucoup aidé : http://www.linuxquestions.org/questions/linux-general-1/raid5-with-mdadm-does-not-ron-or-rebuild-505361/

Posted in Uncategorized. Tags: , , , , . No Comments »