LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 10-19-2009, 05:24 AM   #1
petrijooste
LQ Newbie
 
Registered: Oct 2009
Posts: 3

Rep: Reputation: 0
Question Recovering data from out-of-sync RAID 5


I messed up my first recovery attempt and hope I have one last shot.

Here's what happened (Not all of it may be relevant. Feel free to skip this part and go to the question at the end):
I had RAID 5 md1, md2 and md3 constructed from equal sized partitions over 3 x 500GB disks. Then had a disk failure (sda) and only a 1TB drive to replace it with. So I used the first part of the new drive and created the correctly sized partitions and after adding them experience the joy of seeing the arrays resyncing.

Later a second 500GB disk was showing signs of imminent failure and I did not have a replacement. So I decided to use the open 500GB on the rest of the 1TB drive to build spares (sda8 sda9 sda10) for sda5, sda6 and sda7.
I then removed the troubled disk. The arrays were now on 2 physical disks and I decided to fork out the extra cash to buy a raid-edition 500GB disk and proceeded to add it to the system.

Since I don't have hot-swap, I shut down (several times) to remove and insert disks, partitioning after booting in recovery mode etc. In the process I probably changed a disk while resyncing of md2 was not complete.
I now have:
  • md1 running fine on sda5 sdc5 and sdd5
  • md3 running fine on sda7 sdc7 and sdd7
When trying to assemble md2, it added all the partitions I specified as spares with message: not enough disks to start array. I tried many combinations and some --force switches also.

After reading man pages and feedback in this forum I then decided to do a create:
Code:
mdadm --create /dev/md2 -v --level=5 --raid-disks=3 /dev/sda6 /dev/sdc6 /dev/sdd6
This gave me info on the partitions belonging to a raid array. I assumed (and did not check) that they all belonged to the same array (having the same ?superblock?). And when I was asked to actually go ahead I answered y. I then spent several hours to resync, but when I then tried to mount /dev/md2 the file system was damaged. fsck.ext3 tried to rebuild it, by I did not know how to anser its questions, and although it warned of possible MASSIVE DATA LOSS, i pressed on wit y y y y y y y y. Needless to say I ended up with almost 300GB of unrecognizable files in lost-and-found.

My last hope? I have two partitions with raid content representing my data, but they are out of sync:[LIST][*]sda9 on the second half of the 1TB drive[*]sdb6 on the limping disk (not completely failed yet)[LIST]

The partitions with the same size (sda6, sdc6 and sdd6) were at this point in md2. So I did the following:
Code:
mdadm --misc --stop /dev/md2
mkfs.ext2 /dev/sdd6

and then tried:

mdadm --create /dev/md2 -v --level=5 --raid-disks=3 /dev/sda9 /dev/sdb6 /dev/sdd6

and this gave me:

mdadm: layout defaults to left-symmetric
mdadm: chunk size defaults to 64K

mdadm: /dev/sda9 appears to contain an ext2fs file system
    size=585938432K  mtime=Fri Oct 16 10:35:39 2009
mdadm: /dev/sda9 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Oct 17 09:27:10 2009

mdadm: /dev/sdb6 appears to contain an ext2fs file system
    size=585938432K  mtime=Fri Oct  9 13:42:45 2009
mdadm: /dev/sdb6 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Tue Feb 10 16:37:55 2009

mdadm: /dev/sdd6 appears to contain an ext2fs file system
    size=292969340K  mtime=Thu Jan  1 02:00:00 1970

mdadm: size set to 292969216K
Continue creating array? n
mdadm: create aborted.
So, sdd6 is empty and ready. sda9 and sdb6 have my data, but are out of sync.

How do I proceed?

In the howto at /tldp.org/HOWTO/Software-RAID-HOWTO-8.html#ss8.1 it has the warning "if it doesn't EXACTLY match ... will most likely completely obliterate whatever data you used to have on your disks".
I think this is what happened the first time doing mdadm --create

It suggests using mkraid with the failed-disk option. I don't have mkraid installed and sudo apt-get install mkraid cannot find it.

Some more info. It doesn't help me, but I think it helps explain my situation:

sda9 was created during a resync which started at Oct 17 9:27
Code:
sudo mdadm -E /dev/sda9

/dev/sda9:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : d0181738:bf787ff1:07a53285:f62db7fd (local to host rkv-lnx3)
  Creation Time : Sat Oct 17 09:27:10 2009
     Raid Level : raid5
  Used Dev Size : 292969216 (279.40 GiB 300.00 GB)
     Array Size : 585938432 (558.79 GiB 600.00 GB)
   Raid Devices : 3
  Total Devices : 4
Preferred Minor : 2

    Update Time : Sat Oct 17 16:28:29 2009
          State : clean
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1
       Checksum : ddb97346 - correct
         Events : 6

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8        9        1      active sync   /dev/sda9

   0     0       8        6        0      active sync   /dev/sda6
   1     1       8        9        1      active sync   /dev/sda9
   2     2       8       54        2      active sync   /dev/sdd6
   3     3       8       38        3      spare   /dev/sdc6
sdb6 was on the disk which was physically removed on Oct 16. It was part of the original array and was then known to the system as sdc6.
Code:
sudo mdadm -E /dev/sdb6

/dev/sdb6:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 0c36da7e:adb9753c:f9003a38:ceab2003
  Creation Time : Tue Feb 10 16:37:55 2009
     Raid Level : raid5
  Used Dev Size : 292969216 (279.40 GiB 300.00 GB)
     Array Size : 585938432 (558.79 GiB 600.00 GB)
   Raid Devices : 3
  Total Devices : 4
Preferred Minor : 2

    Update Time : Fri Oct 16 08:38:27 2009
          State : clean
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1
       Checksum : d0eba69a - correct
         Events : 2217384

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8       38        0      active sync   /dev/sdc6

   0     0       8       38        0      active sync   /dev/sdc6
   1     1       8       22        1      active sync   /dev/sdb6
   2     2       8        6        2      active sync   /dev/sda6
   3     3       8        9        3      spare   /dev/sda9
So, sdd6 is empty and ready. sda9 and sdb6 have my data, but are out of sync.

How do I proceed?

Last edited by petrijooste; 10-19-2009 at 05:58 AM. Reason: more information about the problem
 
Old 10-19-2009, 06:54 AM   #2
xeleema
Member
 
Registered: Aug 2005
Location: D.i.t.h.o, Texas
Distribution: Slackware 13.x, rhel3/5, Solaris 8-10(sparc), HP-UX 11.x (pa-risc)
Posts: 988
Blog Entries: 4

Rep: Reputation: 254Reputation: 254Reputation: 254
Wow.

Firstly, I want to say that this is a great post. You have all the detail, output from the commands, and have listed step-by-step everything you've done. I only wish posts like this were in the majority.

Secondly, my gut tells me you're hosed, and if you have *any* backups, now would be the time to dust them off. It's things like this that make me stick to either RAID 1 or RAID 10.

However, see if you can forcefully start the RAID 5 with the failed/missing component;

mdadm --start --force /dev/md2
bbbbb
Had you posted before running any "mdadm --create" commands, I would have suggested against it.

If forcefully starting it doesn't work, then try this option;

Code:
--assume-clean
       Tell mdadm that the array pre-existed and is known to be clean.  It
       can be useful when trying to recover from a major  failure  as  you
       can be sure that no data will be affected unless you actually write
       to the array.  It can also be used when creating a RAID1 or  RAID10
       if  you  want to avoid the initial resync, however this practice --
       while normally safe -- is not recommended.   Use this only  if  you
       really know what you are doing.
Also, keep this in mind;

Code:
To create a "degraded" array in which some  devices  are  missing,  simply
give  the word "missing" in place of a device name.  This will cause mdadm
to leave the corresponding slot in the array empty.  For a RAID4 or  RAID5
array  at  most  one  slot can be "missing"; for a RAID6 array at most two
slots.  For a RAID1 array, only one real device needs to be given.  All of
the others can be "missing".

When  creating  a  RAID5 array, mdadm will automatically create a degraded
array with an extra spare drive.  This is because building the spare  into
a  degraded array is in general faster than resyncing the parity on a non-
degraded, but not clean, array.  This feature can be overridden  with  the
--force option.
Note: This is all from the mdadm man page.

Good luck!

Last edited by xeleema; 10-19-2009 at 06:56 AM.
 
Old 10-20-2009, 02:58 AM   #3
petrijooste
LQ Newbie
 
Registered: Oct 2009
Posts: 3

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by xeleema View Post
Wow.

Firstly, I want to say that this is a great post. You have all the detail, output from the commands, and have listed step-by-step everything you've done. I only wish posts like this were in the majority.

Secondly, my gut tells me you're hosed, and if you have *any* backups, now would be the time to dust them off. It's things like this that make me stick to either RAID 1 or RAID 10.

However, see if you can forcefully start the RAID 5 with the failed/missing component;

mdadm --start --force /dev/md2
bbbbb
Had you posted before running any "mdadm --create" commands, I would have suggested against it.
Thanks, I've read many threads where info were requested and given, and thought it will save time if I start out with as much as possible.

I tried:
Code:
sudo mdadm --manage --start /dev/md2 /dev/sda9 /dev/sdb6 missing
and got:
mdadm: unrecognized option '--start'

so I checked the version: mdadm -V. It turns out I have "mdadm - v2.6.7 - 6th June 2008". Quite old.

From the "man mdadm" I found:
  • the --start option does not appear anywhere
  • --readonly as option to --create has the comment "start the array readonly — not supported yet." (I wanted to use this to stop the automatic resyncing which messed up my array the first time.)
  • simply give the word "missing" in place of a device name. This did not work. It kept complaining that there were not enough devices given.

It looks to me that I need a newer version of mdadm.
 
Old 10-21-2009, 03:01 PM   #4
xeleema
Member
 
Registered: Aug 2005
Location: D.i.t.h.o, Texas
Distribution: Slackware 13.x, rhel3/5, Solaris 8-10(sparc), HP-UX 11.x (pa-risc)
Posts: 988
Blog Entries: 4

Rep: Reputation: 254Reputation: 254Reputation: 254
"Magic" File Time.

Ah!

My apologies, you have to specify --raid-devices=#, then specify the devices, like this;

Code:
mdadm --assume-clean  --raid-devices=4 /dev/md2 /dev/sda9 missing /dev/sdb6
However, I must admit, I would have run this after swapping disk, then having the second disk freak-out on you.

As for your version of mdadm, 2.6.7 (06jun08), that's not terribly old. The current version is 2.6.9 released about seven months ago, 10mar09.
The project's GIT repository is over here.

I made a boo-boo referencing the "--start" command, my mistake.

On a side note, "mkraid" is a part of the raidtools package developed for Red Hat-based distributions, and from what I can tell, that's been deprecated in favor of mdadm. As it seems it was just a set of wrapper scripts to begin with (but I could be wrong).

As for the current start of your filesystem; With the rebuilding you've done, I'm surprised there was a filesystem to fsck in the first place. Since you now have ~300GB in the lost+found on that filesystem, I think we're going to need to start from there.

This is where the "file" command comes in handy. If you have an extravagant "magic" file, then it should do a good job of identifying what those files are (JPEGs, Videos, MP3s etc). You can grab the latest magic file from ftp.astron.com:/pub/file.
The current one is file-5.03.tar.gz (as of May 6th, 2009).

On the few occasions where I've run systems without backups, I've kept an MD5 of all the files, along with their filenames. Things like "md5deep" come in rather handy for that. (It also acts as a poor-man's TripWire.) A downside is that you have to fetch and compile it, as it's not included in any Linux distribution that I'm aware of. (Also helps if the information was generated before a data loss.)

I know it's going to be arduous, but at this point, I don't think it's possible to return the filesystem to a usable state. You can give tools like e2undelete a shot (after all ext3 is just ext2 with journaling).

However, the first thing I would seriously consider doing is finding a good backup solution. There's NAS devices out there that you could pick up and dump your filesystem onto with rsync, NFS, or heck, SMB. Several BYOD (bring your own disks) NAS devices can be had for under $200.
I would recommend the NSLU2, as there's a great Linux project out there to tweak it up.

Good luck, and let me know how it goes!

P.S: I realize the irony of pointing out things that could have prevented this data loss. However, you do have my condolences, and I seriously hope there was no Production data on there.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Recovering Data from Remaining RAID 1 Disk nko Linux - Software 18 01-10-2016 09:43 PM
Recovering Linux RAID Partition Data carlosinfl Linux - General 3 09-23-2009 10:14 AM
recovering data from a software raid partition f14f21 Linux - Newbie 3 11-06-2008 06:29 AM
Recovering data on RAID 5 with semi-faulty hdd davidsaxton Linux - Hardware 2 09-16-2006 03:45 PM
recovering data from an old RAID -0 dominant Linux - Software 1 01-26-2005 02:42 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 11:11 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration