LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 05-16-2008, 10:21 AM   #1
Curlyau
LQ Newbie
 
Registered: Jan 2008
Posts: 3

Rep: Reputation: 0
Broken software RAID5 set - help!


I have a debian box, running 2.6.18, and I appear to have broken my RAID5 set.

Previously I had 4 500GB drives, and it was working perfectly. I added another disk and did:

mdadm --add /dev/md1 /dev/sde1
mdadm --grow /dev/md1 --raid-devices=5

It started to reshape the array, but then the new drive started giving errors, and the machine hung. I think it was due to a problem with a PCI sata card.

I resolved that, but now I've somehow broken the array.


cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md1 : inactive sdd1[5] sdb1[3] sdc1[1]
1465151616 blocks super 1.0

unused devices: <none>




mdadm -D /dev/md1
/dev/md1:
Version : 01.00.03
Creation Time : Sun Dec 23 01:28:08 2007
Raid Level : raid5
Device Size : 488383744 (465.76 GiB 500.10 GB)
Raid Devices : 5
Total Devices : 3
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Fri May 16 22:05:16 2008
State : clean, degraded, Not Started
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 128K

Delta Devices : 1, (4->5)

Name : 'Fuckyfucky3':1
UUID : 43eff327:8d1aa506:c0df2849:005c003f
Events : 1420750

Number Major Minor RaidDevice State
5 8 49 0 active sync /dev/sdd1
1 8 33 1 active sync /dev/sdc1
3 8 17 2 active sync /dev/sdb1
3 0 0 3 removed
4 0 0 4 removed



mdadm -E /dev/sda1
/dev/sda1:
Magic : a92b4efc
Version : 01
Feature Map : 0x4
Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
Name : 'Fuckyfucky3':1
Creation Time : Sun Dec 23 01:28:08 2007
Raid Level : raid5
Raid Devices : 5

Device Size : 976767856 (465.76 GiB 500.11 GB)
Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
Used Size : 976767488 (465.76 GiB 500.10 GB)
Super Offset : 976767984 sectors
State : clean
Device UUID : a15eee10:6cd6b795:d18cb3b2:770139c2

Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
Delta Devices : 1 (4->5)

Update Time : Fri May 16 21:40:35 2008
Checksum : c6697c39 - correct
Events : 1420746

Layout : left-symmetric
Chunk Size : 128K

Array Slot : 4 (failed, 1, failed, 2, 3, 0)
Array State : uuuU_ 2 failed


mdadm -E /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 01
Feature Map : 0x4
Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
Name : 'Fuckyfucky3':1
Creation Time : Sun Dec 23 01:28:08 2007
Raid Level : raid5
Raid Devices : 5

Device Size : 976767856 (465.76 GiB 500.11 GB)
Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
Used Size : 976767488 (465.76 GiB 500.10 GB)
Super Offset : 976767984 sectors
State : clean
Device UUID : 5b38c5a2:798c6793:91ad6d1e:9cfee153

Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
Delta Devices : 1 (4->5)

Update Time : Fri May 16 22:05:16 2008
Checksum : 53542fac - correct
Events : 1420750

Layout : left-symmetric
Chunk Size : 128K

Array Slot : 3 (failed, 1, failed, 2, failed, 0)
Array State : uuU__ 3 failed


mdadm -E /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 01
Feature Map : 0x4
Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
Name : 'Fuckyfucky3':1
Creation Time : Sun Dec 23 01:28:08 2007
Raid Level : raid5
Raid Devices : 5

Device Size : 976767856 (465.76 GiB 500.11 GB)
Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
Used Size : 976767488 (465.76 GiB 500.10 GB)
Super Offset : 976767984 sectors
State : clean
Device UUID : 673ba6d4:6c46fd55:745c9c93:3fa8bf21

Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
Delta Devices : 1 (4->5)

Update Time : Fri May 16 22:05:16 2008
Checksum : 8ad7452f - correct
Events : 1420750

Layout : left-symmetric
Chunk Size : 128K

Array Slot : 1 (failed, 1, failed, 2, failed, 0)
Array State : uUu__ 3 failed



mdadm -E /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 01
Feature Map : 0x4
Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
Name : 'Fuckyfucky3':1
Creation Time : Sun Dec 23 01:28:08 2007
Raid Level : raid5
Raid Devices : 5

Device Size : 976767856 (465.76 GiB 500.11 GB)
Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
Used Size : 976767488 (465.76 GiB 500.10 GB)
Super Offset : 976767984 sectors
State : clean
Device UUID : 99b87c50:a919bd63:599a135f:9af385ba

Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
Delta Devices : 1 (4->5)

Update Time : Fri May 16 22:05:16 2008
Checksum : 78ab1ee2 - correct
Events : 1420750

Layout : left-symmetric
Chunk Size : 128K

Array Slot : 5 (failed, 1, failed, 2, failed, 0)
Array State : Uuu__ 3 failed


mdadm -E /dev/sde1
/dev/sde1:
Magic : a92b4efc
Version : 01
Feature Map : 0x4
Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
Name : 'Fuckyfucky3':1
Creation Time : Sun Dec 23 01:28:08 2007
Raid Level : raid5
Raid Devices : 5

Device Size : 976767856 (465.76 GiB 500.11 GB)
Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
Used Size : 976767488 (465.76 GiB 500.10 GB)
Super Offset : 976767984 sectors
State : clean
Device UUID : 89b53542:d1d820bc:f2ece884:4785869a

Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
Delta Devices : 1 (4->5)

Update Time : Fri May 16 22:05:16 2008
Checksum : c89db84b - correct
Events : 1418968

Layout : left-symmetric
Chunk Size : 128K

Array Slot : 6 (failed, 1, failed, 2, failed, 0)
Array State : uuu__ 3 failed






What should I do next? I should zero the superblock on a drive?


If I try to force the array to start:

mdadm --assemble --force /dev/md1 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
mdadm: forcing event count in /dev/sda1(3) from 1420746 upto 1420750
mdadm: clearing FAULTY flag for device 0 in /dev/md1 for /dev/sda1
mdadm: /dev/md1 has been started with 4 drives (out of 5).


Then in dmesg.

raid5:md1: read error not correctable (sector 96720 on sda1).
raid5:md1: read error not correctable (sector 96728 on sda1).
raid5:md1: read error not correctable (sector 96736 on sda1).
raid5:md1: read error not correctable (sector 96744 on sda1).
raid5:md1: read error not correctable (sector 96752 on sda1).
raid5:md1: read error not correctable (sector 96760 on sda1).
ata1: EH complete
md: md1: sync done.
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: (BMDMA stat 0x20)
ata1.00: tag 0 cmd 0xc8 Emask 0x9 stat 0x51 err 0x40 (media error)
ata1: EH complete



Any ideas or suggestions are much appreciated.
 
Old 05-16-2008, 11:52 PM   #2
JimBass
Senior Member
 
Registered: Oct 2003
Location: New York City
Distribution: Debian Sid 2.6.32
Posts: 2,100

Rep: Reputation: 49
With 3 failed drives, I suspect you're learning the hard way that software RAID5 is not the best of choices for data. Probably whatever crashed the system is what screwed up the drives, which is the unfortunate nature of software RAID. Obviously if your OS goes down for any reason, and the OS is also controlling the array, the array has no controller, so data will probably get lost.

It is also possible you have a hardware error with some of the drives, but I think that is a remote possibility at best.

I would expect the data could be recovered, but it would probably need recovery experts. Their cost would be way above the cost of a hardware controller.

Peace,
JimBass
 
Old 05-18-2008, 07:33 AM   #3
Curlyau
LQ Newbie
 
Registered: Jan 2008
Posts: 3

Original Poster
Rep: Reputation: 0
Hmm, yeah, that's probably not the answer I was looking for.

Thanks for your honesty though.

Now, the data is just a bunch of movies and music I'd ripped at home, so it's not overly critical that I get it all back. However I'd like to think that I didn't do anything rashly irreversable to lose it.

Can anyone point me in the direction of some documentation I can read in order to better understand software raid in order to maybe try and recover the array? Bear in mind that I have (within reason) all the time in the world to fiddle around with the nuts and bolts rather than looking for one magic command to rebuild the array?
 
Old 05-20-2008, 10:21 PM   #4
Curlyau
LQ Newbie
 
Registered: Jan 2008
Posts: 3

Original Poster
Rep: Reputation: 0
Well, just in case it helps anyone else out.

I pulled each drive from the box seperately, and did a full SMART test. The new Seagate failed, as well as one of the Samsungs.

I then did a full surface scan on the Samsung, which showed 5 faulty blocks.

Both should be replaced under warranty, but that doesn't help me. Since the 'grow' operation only got a very small way through, almost all of my data should still be on the old array members, right?

So what I'm thinking is that I get a couple of new drives, and dd the faulty samsung onto a new one. Then do the same for the faulty seagate and assemble the array. That /should/ contiue the grow operation, or at least bring up the array in a degraded state, right?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Software RAID5 - beware! Micro420 Ubuntu 7 05-20-2008 05:52 AM
how to retrieve data from broken RAID5 DreamerX Linux - Enterprise 9 11-08-2007 09:37 AM
Software Raid5 wierdness Trionnis Linux - Software 2 05-02-2007 12:16 AM
Software RAID5 disasters Hamsjael Linux - Hardware 8 06-15-2005 04:05 AM
adding new hd in raid5 software slack66 Linux - Hardware 1 08-27-2003 12:29 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 05:31 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration