LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 11-07-2013, 01:57 AM   #1
reano
Member
 
Registered: Nov 2013
Posts: 39

Rep: Reputation: Disabled
RAID degraded, partition missing from md0


Hey guys,
We're having a very weird issue at work. Our Ubuntu server has 6 drives, set up with RAID1 as follows:

/dev/md0, consisting of:
/dev/sda1
/dev/sdb1

/dev/md1, consisting of:
/dev/sda2
/dev/sdb2

/dev/md2, consisting of:
/dev/sda3
/dev/sdb3

/dev/md3, consisting of:
/dev/sdc1
/dev/sdd1

/dev/md4, consisting of:
/dev/sde1
/dev/sdf1

As you can see, md0, md1 and md2 all use the same 2 drives (split into 3 partitions). I also have to note that this is done via ubuntu software raid, not hardware raid.

Today, the /md0 RAID1 array shows as degraded - it is missing the /dev/sdb1 drive. But since /dev/sdb1 is only a partition (and /dev/sdb2 and /dev/sdb3 are working fine), it's obviously not the drive that's gone AWOL, it seems the partition itself is missing.

How is that even possible? And what could we do to fix it?

My output of cat /proc/mdstat:

Code:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]

md1 : active raid1 sda2[0] sdb2[1]
      24006528 blocks super 1.2 [2/2] [UU]


md2 : active raid1 sda3[0] sdb3[1]
      1441268544 blocks super 1.2 [2/2] [UU]


md0 : active raid1 sda1[0]
      1464710976 blocks super 1.2 [2/1] [U_]


md3 : active raid1 sdd1[1] sdc1[0]
      2930133824 blocks super 1.2 [2/2] [UU]


md4 : active raid1 sdf2[1] sde2[0]
      2929939264 blocks super 1.2 [2/2] [UU]


unused devices: <none>

Any help would be greatly appreciated!

Last edited by reano; 11-07-2013 at 08:52 AM.
 
Old 11-07-2013, 02:09 AM   #2
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and CentOS
Posts: 6,724

Rep: Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705
Hi,

it's not so unusual to have problems with just one partition on a disk.

You can try to rebuild with the existing sdb, or you can replace the sdb and then rebuild. See for example http://www.howtoforge.com/replacing_..._a_raid1_array for the latter option.

However, before doing anything make sure you are familiar with: https://raid.wiki.kernel.org/index.php/Linux_Raid

Evo2.
 
Old 11-07-2013, 02:13 AM   #3
reano
Member
 
Registered: Nov 2013
Posts: 39

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by evo2 View Post
Hi,

it's not so unusual to have problems with just one partition on a disk.

You can try to rebuild with the existing sdb, or you can replace the sdb and then rebuild. See for example http://www.howtoforge.com/replacing_..._a_raid1_array for the latter option.

However, before doing anything make sure you are familiar with: https://raid.wiki.kernel.org/index.php/Linux_Raid

Evo2.
Thanks Evo2. Can you please explain how I'd go about trying the first option (rebuild with the existing sdb)? Safely, that is :P

Last edited by reano; 11-07-2013 at 02:16 AM.
 
Old 11-07-2013, 02:23 AM   #4
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and CentOS
Posts: 6,724

Rep: Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705
Hi,

Quote:
Originally Posted by reano View Post
Thanks Evo2. Can you please explain how I'd go about trying the first option (rebuild with the existing sdb)? Safely, that is :P
didn't remember off the top of my head but from a quick scan of https://raid.wiki.kernel.org/index.php/Reconstruction and the mdadm man page it looks like the first thing to try should be:
Code:
mdadm --assemble --scan
However, please check for yourself.

Evo2.
 
Old 11-07-2013, 02:28 AM   #5
reano
Member
 
Registered: Nov 2013
Posts: 39

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by evo2 View Post
Hi,



didn't remember off the top of my head but from a quick scan of https://raid.wiki.kernel.org/index.php/Reconstruction and the mdadm man page it looks like the first thing to try should be:
Code:
mdadm --assemble --scan
However, please check for yourself.

Evo2.
Thanks - I've been doing a bit of reading on mdadm --assemble as well. Will this not damage or endanger any of the other raid devices or the raid setup itself? I can't have any of the other partitions or md-devices go down, as our mail services etc run on this same server.

Last edited by reano; 11-07-2013 at 02:29 AM.
 
Old 11-07-2013, 06:02 AM   #6
reano
Member
 
Registered: Nov 2013
Posts: 39

Original Poster
Rep: Reputation: Disabled
Actually, let me clarify - if I do a:

Code:
mdadm --assemble --scan
Then it will essentially be doing the same as:

Code:
mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1
My main concern here is, while it's doing that, what's happening with md0? Because md0 is online right now (albeit without it's sdb1 mirror, only with sda1) and the root filesystem is mounted on md0. So if I do an assemble, will it interrupt the filesystem in any way, or can I safely do it while the server is running with users connected to it? (which is 24/7 unfortunately).
 
Old 11-07-2013, 07:51 AM   #7
vishesh
Member
 
Registered: Feb 2008
Distribution: Fedora,RHEL,Ubuntu
Posts: 661

Rep: Reputation: 66
I think its better to stop md device. What is output of mdadm --detail /dev/md0

Thanks
 
Old 11-07-2013, 07:56 AM   #8
reano
Member
 
Registered: Nov 2013
Posts: 39

Original Poster
Rep: Reputation: Disabled
I can't stop the device
Also, the / root filesystem is mounted on md0.

The output you requested is:

Code:
/dev/md0:
        Version : 1.2
  Creation Time : Sat Dec 29 17:09:45 2012
     Raid Level : raid1
     Array Size : 1464710976 (1396.86 GiB 1499.86 GB)
  Used Dev Size : 1464710976 (1396.86 GiB 1499.86 GB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Thu Nov  7 15:55:07 2013
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : lia:0  (local to host lia)
           UUID : eb302d19:ff70c7bf:401d63af:ed042d59
         Events : 26216

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       0        0        1      removed
What's interesting is that it shows sdb1 as removed, not failed or spare.

Last edited by reano; 11-07-2013 at 07:57 AM.
 
Old 11-07-2013, 08:04 AM   #9
vishesh
Member
 
Registered: Feb 2008
Distribution: Fedora,RHEL,Ubuntu
Posts: 661

Rep: Reputation: 66
I think if its showing removed that following command should recover

mdadm /dev/md0 -a /dev/sdb1

Thanks
 
Old 11-07-2013, 08:13 AM   #10
reano
Member
 
Registered: Nov 2013
Posts: 39

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by vishesh View Post
I think if its showing removed that following command should recover

mdadm /dev/md0 -a /dev/sdb1

Thanks
Is that not the same as mdadm /dev/md0 --add /dev/sdb1 ? If so, that doesn't work (see above for the error message I got when I tried that).

Last edited by reano; 11-07-2013 at 08:18 AM.
 
Old 11-07-2013, 08:51 AM   #11
vishesh
Member
 
Registered: Feb 2008
Distribution: Fedora,RHEL,Ubuntu
Posts: 661

Rep: Reputation: 66
I am unable to see any error message above . Ideally for replacing device , I follow

mdadm /dev/md0 -f /dev/sdb1
mdadm /dev/md0 -r /dev/sdb1
mdadm /dev/md0 -a /dev/sdb1

Thanks
 
Old 11-07-2013, 08:54 AM   #12
reano
Member
 
Registered: Nov 2013
Posts: 39

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by vishesh View Post
I am unable to see any error message above . Ideally for replacing device , I follow

mdadm /dev/md0 -f /dev/sdb1
mdadm /dev/md0 -r /dev/sdb1
mdadm /dev/md0 -a /dev/sdb1

Thanks
Ah sorry, seems I didn't post the result in the original post. When I do the -a (or --add) I get the following:

Code:
mdadm: add new device failed for /dev/sdb1 as 2: Invalid argument
I haven't tried to do it in that order (first f, then r, then a). I can't damage anything further than it already is, can I? Keep in mind that sda1 and sdb1 (in other words, md0) contains the root filesystem. At the moment md0 seems to run only on sda1 (and not on sdb1). At least the server is still running.

Last edited by reano; 11-07-2013 at 09:00 AM.
 
Old 11-08-2013, 12:34 AM   #13
reano
Member
 
Registered: Nov 2013
Posts: 39

Original Poster
Rep: Reputation: Disabled
Got the following results:

Code:
root@lia:~# mdadm /dev/md0 -f /dev/sdb1
mdadm: set device faulty failed for /dev/sdb1:  No such device

root@lia:~# mdadm /dev/md0 -r /dev/sdb1
mdadm: hot remove failed for /dev/sdb1: No such device or address

root@lia:~# mdadm /dev/md0 -a /dev/sdb1
mdadm: add new device failed for /dev/sdb1 as 2: Invalid argument
 
Old 11-13-2013, 01:13 AM   #14
reano
Member
 
Registered: Nov 2013
Posts: 39

Original Poster
Rep: Reputation: Disabled
Hate to bump a thread, but I still need help with this. Any advice, anyone?
 
Old 11-13-2013, 01:18 AM   #15
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and CentOS
Posts: 6,724

Rep: Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705
Hi,

mdadm doesn't seem to see /dev/sdb1 at all. I suggest you investigate its status with other tools. Eg fdisk

Evo2.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Raid 1 Array Degraded reveal Linux - Hardware 3 11-04-2013 10:59 AM
dirty degraded md raid array edgjerp Linux - Hardware 1 01-07-2009 01:51 PM
Raid 1 degraded cferron Linux - Server 6 10-19-2008 10:15 AM
raid 5 degraded unable to log in neonorm Linux - Hardware 4 06-10-2007 09:03 AM
RAID 1 Degraded Array gsoft Debian 2 08-18-2006 02:17 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 02:57 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration