LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Did my RAID10 storage array survive os reinstall? (https://www.linuxquestions.org/questions/linux-server-73/did-my-raid10-storage-array-survive-os-reinstall-825712/)

tops008 08-12-2010 01:06 AM

Did my RAID10 storage array survive os reinstall?
 
Hello,

I have 4 drives (sd[bcde]) in a raid10 array. It has made it through a few os reinstalls with few problems (os is on different disk), but I'm trying to recover it after this last one. After running 'mdadm --assemble --scan', it came out as 'active, degraded, recovering', and went through that process. Afterwards things don't look so hot..

Code:

# mdadm -E /dev/sd[bcde] -vv
/dev/sdb:
          Magic : a92b4efc
        Version : 00.90.00
          UUID : 967aec6e:cbbe7390:aba2e195:aa6694ad
  Creation Time : Wed Aug 20 00:14:12 2008
    Raid Level : raid10
  Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
    Array Size : 1465148928 (1397.27 GiB 1500.31 GB)
  Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Thu Aug 12 01:35:17 2010
          State : clean
 Active Devices : 1
Working Devices : 2
 Failed Devices : 2
  Spare Devices : 1
      Checksum : 22680f6c - correct
        Events : 2083120

        Layout : near=2, far=1
    Chunk Size : 64K

      Number  Major  Minor  RaidDevice State
this    2      8      16        2      active sync  /dev/sdb

  0    0      0        0        0      removed
  1    1      0        0        1      faulty removed
  2    2      8      16        2      active sync  /dev/sdb
  3    3      0        0        3      faulty removed
  4    4      8      64        4      faulty  /dev/sde
/dev/sdc:
          Magic : a92b4efc
        Version : 00.90.00
          UUID : 967aec6e:cbbe7390:aba2e195:aa6694ad
  Creation Time : Wed Aug 20 00:14:12 2008
    Raid Level : raid10
  Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
    Array Size : 1465148928 (1397.27 GiB 1500.31 GB)
  Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Thu Aug 12 01:35:17 2010
          State : clean
 Active Devices : 1
Working Devices : 2
 Failed Devices : 2
  Spare Devices : 1
      Checksum : 22680f7e - correct
        Events : 2083120

        Layout : near=2, far=1
    Chunk Size : 64K

      Number  Major  Minor  RaidDevice State
this    6      8      32        6      spare  /dev/sdc

  0    0      0        0        0      removed
  1    1      0        0        1      faulty removed
  2    2      8      16        2      active sync  /dev/sdb
  3    3      0        0        3      faulty removed
  4    4      8      64        4      faulty  /dev/sde
/dev/sdd:
          Magic : a92b4efc
        Version : 00.90.00
          UUID : 967aec6e:cbbe7390:aba2e195:aa6694ad
  Creation Time : Wed Aug 20 00:14:12 2008
    Raid Level : raid10
  Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
    Array Size : 1465148928 (1397.27 GiB 1500.31 GB)
  Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Thu Aug 12 01:34:54 2010
          State : clean
 Active Devices : 1
Working Devices : 3
 Failed Devices : 2
  Spare Devices : 2
      Checksum : 22680f66 - correct
        Events : 2083112

        Layout : near=2, far=1
    Chunk Size : 64K

      Number  Major  Minor  RaidDevice State
this    5      8      48        5      spare  /dev/sdd

  0    0      0        0        0      removed
  1    1      0        0        1      faulty removed
  2    2      8      16        2      active sync  /dev/sdb
  3    3      0        0        3      faulty removed
  4    4      8      64        4      faulty  /dev/sde
  5    5      8      48        5      spare  /dev/sdd
/dev/sde:
          Magic : a92b4efc
        Version : 00.90.00
          UUID : 967aec6e:cbbe7390:aba2e195:aa6694ad
  Creation Time : Wed Aug 20 00:14:12 2008
    Raid Level : raid10
  Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
    Array Size : 1465148928 (1397.27 GiB 1500.31 GB)
  Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Thu Aug 12 01:19:01 2010
          State : clean
 Active Devices : 2
Working Devices : 4
 Failed Devices : 2
  Spare Devices : 2
      Checksum : 22680ba8 - correct
        Events : 2083110

        Layout : near=2, far=1
    Chunk Size : 64K

      Number  Major  Minor  RaidDevice State
this    0      8      64        0      active sync  /dev/sde

  0    0      8      64        0      active sync  /dev/sde
  1    1      0        0        1      faulty removed
  2    2      8      16        2      active sync  /dev/sdb
  3    3      0        0        3      faulty removed
  4    4      8      48        4      spare  /dev/sdd
  5    5      8      32        5      spare  /dev/sdc

Here's what mdstat says:

Code:

$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : inactive sdb[2](S) sdc[6](S) sdd[5](S) sde[0](S)
      2930297856 blocks
     
unused devices: <none>

Finally, when I try to examine the /dev/md0 itself, it says it doesn't have a superblock:

Code:

# mdadm -E /dev/md0
mdadm: No md superblock detected on /dev/md0.

It seems odd to me that each drive would say something different under 'examine'. Can this array be recovered? To make matters worse I made a stupid mistake in formatting a separate drive at the same time that had the only other backup of some critical files..

cbtshare 08-13-2010 10:21 AM

try to run fsck -y /dev/md0 , what happens?

tops008 08-13-2010 11:55 PM

Here's the output

Code:

# fsck -y /dev/md0
fsck from util-linux-ng 2.17.2
e2fsck 1.41.11 (14-Mar-2010)
fsck.ext2: Invalid argument while trying to open /dev/md0

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

Running e2fsck -b 8193 /dev/md0 prints the same thing. If it helps, I have two drives that can work as spares in a pinch..

mahi_nix 08-14-2010 01:09 AM

Hi,

It seems that you Primary superblock is currepted and now you have to restore it from backup superblock.

To find out superblock and backup superblock location

Code:

dumpe2fs device | grep superblock
i.e
dumpe2fs /dev/sda2 | grep superblock

it will show you the available backup superblock, then restore backup superblock by e2fsck command.

Thanks,

Mahi

sem007 08-14-2010 02:43 AM

@ tops008

Quote:


Code:

$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : inactive sdb[2](S) sdc[6](S) sdd[5](S) sde[0](S)
2930297856 blocks

unused devices: <none>

It seems your raid device not started.

Assemble your raid device.
Code:

# mdadm -A /dev/md0 your raid partitions
In my case

Code:

# mdadm -A /dev/md0 /dev/sda5 /dev/sda6 /dev/hda7
after that check array status

Code:

cat /proc/mdstat
mdadm --detail /dev/md0

HTH

tops008 08-14-2010 12:14 PM

I ran dumpe2fs /dev/md0 at first, but again it said there was no superblock. Then I ran 'mdadm -A /dev/md0 /dev/sd[bcde]'. It started the array and the recovery process:

Code:



# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90
  Creation Time : Wed Aug 20 00:14:12 2008
    Raid Level : raid10
    Array Size : 1465148928 (1397.27 GiB 1500.31 GB)
  Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
  Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sat Aug 14 03:35:45 2010
          State : clean, degraded, recovering
 Active Devices : 2
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 2

        Layout : near=2, far=1
    Chunk Size : 64K

 Rebuild Status : 49% complete

          UUID : 967aec6e:cbbe7390:aba2e195:aa6694ad
        Events : 0.2083144

    Number  Major  Minor  RaidDevice State
      0      8      64        0      active sync  /dev/sde
      5      8      32        1      spare rebuilding  /dev/sdc
      2      8      16        2      active sync  /dev/sdb
      3      0        0        3      removed

      4      8      48        -      spare  /dev/sdd

After reaching 100% here's what it says:

Code:

# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90
  Creation Time : Wed Aug 20 00:14:12 2008
    Raid Level : raid10
    Array Size : 1465148928 (1397.27 GiB 1500.31 GB)
  Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
  Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sat Aug 14 12:53:51 2010
          State : clean, degraded
 Active Devices : 1
Working Devices : 2
 Failed Devices : 2
  Spare Devices : 1

        Layout : near=2, far=1
    Chunk Size : 64K

          UUID : 967aec6e:cbbe7390:aba2e195:aa6694ad
        Events : 0.2083172

    Number  Major  Minor  RaidDevice State
      0      0        0        0      removed
      1      0        0        1      removed
      2      8      16        2      active sync  /dev/sdb
      3      0        0        3      removed

      4      8      48        -      spare  /dev/sdd
      5      8      64        -      faulty spare  /dev/sde
      6      8      32        -      faulty spare  /dev/sdc

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid10 sdd[4](S) sde[5](F) sdc[6](F) sdb[2]
      1465148928 blocks 64K chunks 2 near-copies [4/1] [__U_]

unused devices: <none>

It says the superblock is persistent, but when I run dumpe2fs, it says it couldn't find it:

Code:

# dumpe2fs /dev/md0
dumpe2fs 1.41.11 (14-Mar-2010)
dumpe2fs: Attempt to read block from filesystem resulted in short read while trying to open /dev/md0
Couldn't find valid filesystem superblock.

# fsck -y /dev/md0
fsck from util-linux-ng 2.17.2
e2fsck 1.41.11 (14-Mar-2010)
fsck.ext2: Attempt to read block from filesystem resulted in short read while trying to open /dev/md0
Could this be a zero-length partition?

It was in this state last time I reassembled it, but after rebooting it went back into the state in post 1. In any case, I'll keep it like this and won't reboot until I get some more advice.


All times are GMT -5. The time now is 10:59 PM.