LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 04-28-2016, 07:44 PM   #1
r00tk1ll
LQ Newbie
 
Registered: Apr 2016
Posts: 14

Rep: Reputation: Disabled
Recover Data From Striped Logical Volume Group With Failing Drive


Hello,
I am working on a CentOS 6 server that has 7 physical drives in a striped logical volume group. The server will not boot and fails with a kernel panic. I booted it up with a live CentOS cd and in the GUI under Utilities->Drives it shows one of the drives in red saying "Disk Likely to Fail Soon". The files I am looking for are in the /var/www/hmtl directory.

My initial thought was to read only mount the LVG and just copy the files in that directory to an external drive but the entire /var directory doesn't even appear when I give the "ls" command on the mounted LVG. It does list other folders though, I.E. /etc, /boot, /usr, etc..

So my question is what would be the next step to try to recover the data?

Also, being that it's a "striped" logical volume, would I be able to just replace the failing drive to repair the system or would that make matters worse?

I'm new to working with logical volumes and would appreciate any help,
Thanks
 
Old 04-28-2016, 10:12 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,131

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
Quote:
Originally Posted by r00tk1ll View Post
My initial thought was to read only mount the LVG and just copy the files in that directory to an external drive but the entire /var directory doesn't even appear when I give the "ls" command on the mounted LVG. It does list other folders though, I.E. /etc, /boot, /usr, etc..
That might indicate /var was mounted using a separate lv; maybe on a separate vg - can you get to /etc/fstab ?. Let's see it.
Quote:
Also, being that it's a "striped" logical volume, would I be able to just replace the failing drive to repair the system or would that make matters worse?
No, you don't want to do that - to quote the linux RAID wiki
Quote:
RAID-0 has no redundancy, so when a disk dies, the array goes with it.
Here's another from that site
Quote:
It is, however, very important to understand that RAID is not a general substitute for good backups.
I don't have any good news for you if you don't have backups - you may get lucky and be able to retrieve your data, but it's not looking good.
If you were using one of the higher RAID level (5,6,10, ...) you would be much better placed to fail that disk and replace it - but I understand a "striped" disk to be RAID0.
 
1 members found this post helpful.
Old 04-28-2016, 11:11 PM   #3
r00tk1ll
LQ Newbie
 
Registered: Apr 2016
Posts: 14

Original Poster
Rep: Reputation: Disabled
Thanks syg00, I figured as much... It died during a backup at 75%, and the new files were the other 25%..

Here is /etc/fstab on the failed LVG:
/dev/VolGroup00/LogVol00 / ext3 defaults 1 1
LABEL=/boot /boot ext3 defaults 1 2
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/VolGroup00/LogVol01 swap swap defaults 0 0


Thank you
 
Old 04-28-2016, 11:39 PM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,131

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
Well that kills the separate /var theory.
Not much else than to replace the drive, rebuild the array and restore from an older backup.
 
1 members found this post helpful.
Old 04-28-2016, 11:49 PM   #5
gradinaruvasile
Member
 
Registered: Apr 2010
Location: Cluj, Romania
Distribution: Debian Testing
Posts: 731

Rep: Reputation: 158Reputation: 158
Maybe some data MAY be recovered IF you can mount your RAID (boot from a live media!) to with a failing HDD (which might mean very slow operations, HDD link resets). Failing HDD usually means a drive that has SMART errors logged, mainly the Reallocated_Event_Count and Current_Pending_Sector indices are non-zero.
 
1 members found this post helpful.
Old 04-29-2016, 01:35 AM   #6
r00tk1ll
LQ Newbie
 
Registered: Apr 2016
Posts: 14

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by gradinaruvasile View Post
Maybe some data MAY be recovered IF you can mount your RAID (boot from a live media!) to with a failing HDD (which might mean very slow operations, HDD link resets). Failing HDD usually means a drive that has SMART errors logged, mainly the Reallocated_Event_Count and Current_Pending_Sector indices are non-zero.
Well at this point anything is worth a shot, I still can mount the failing drive. What would be the commands?

I had an idea of taking a DD image of the LVG, but it was going to copy everything including empty space (which would take forever), just for kicks could I DD the individual failing disk and try some sort of data recovery technique or would that only give partial data due to it being "striped" over 7 drives?

Thanks in advance
 
Old 04-29-2016, 03:07 AM   #7
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,131

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
I would use ddrescue (note not dd) to image the bad drive from a liveCD. Rerun as necessary - see the doco, it will try to "fill-in" what it missed previously.
Then introduce that drive to the array and see what happens. Nothing to lose by trying really.
 
1 members found this post helpful.
Old 05-01-2016, 03:29 PM   #8
r00tk1ll
LQ Newbie
 
Registered: Apr 2016
Posts: 14

Original Poster
Rep: Reputation: Disabled
Ok guys,
Thanks for the help so far, so here's where I'm at. I created an image of the failing drive using ddrescue and stored it on an external HDD. So here's the question I have now, the drive was part of a Logical Volume Group written across 7 drives, which has been encrypted with a LUKS passphrase (which I found a way to unlock using cryptsetup luksOpen), so from this point can I mount the image ddrescue made like a regular drive to search for the lost files or do I have to somehow add it back to the LVG array then search that?

Any help would be appreciated, Im really confused on what approach to take.

Thanks
 
Old 05-01-2016, 07:06 PM   #9
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,780

Rep: Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213
It depends on just where the encryption layer was. You can encrypt the partition or full drive and put the LVM PV inside the encrypted container, or you can encrypt the LV. The output from lsblk would be useful here, and also indicate exactly what you used as the source volume for ddrescue. Depending on just how you did things, I see the possibility of an array with 6 stripes encrypted and 1 not, which would be a mess.

The simplest thing to do is probably to unplug the failing drive and boot with the external drive connected. That should allow LVM to assemble the array. Having both the failing and external drives connected means you have 2 PVs with the same UUID, and that complicates things.
 
1 members found this post helpful.
Old 05-01-2016, 07:43 PM   #10
r00tk1ll
LQ Newbie
 
Registered: Apr 2016
Posts: 14

Original Poster
Rep: Reputation: Disabled
Ok so I'll attach what I'm looking at in the GUI and describe what I did. As you can see I have 7 1 TB Physical Drives with the 1 (/dev/sdd) in red. I clicked on the lock icon and unlocked it with the passphrase. Then I mounted the 2 TB Drive under the directory (/mnt/win) it's a NTFS drive connected via USB.

I then ran the command "ddrescue -n -N -vvv /dev/sdd /mnt/win/sdd.img /mnt/win/sdd.log"

That ran all night now I have the outputted 1 TB image, it appears the encryption was copied over as well because when I tried to mount the sdd.img file it says it is a file system type Luks.

Here is the output of lsblk:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 931.5G 0 disk
ââsda1 8:1 0 102M 0 part
ââsda2 8:2 0 931.4G 0 part
sdb 8:16 0 931.5G 0 disk
ââsdb1 8:17 0 931.5G 0 part
sdc 8:32 0 931.5G 0 disk
ââsdc1 8:33 0 931.5G 0 part
sdd 8:48 0 931.5G 0 disk
ââsdd1 8:49 0 931.5G 0 part
ââluks-25db43f5-fa88-4ab8-8568-0e439e1b62df
253:3 0 931.5G 0 crypt
sde 8:64 0 931.5G 0 disk
ââsde1 8:65 0 931.5G 0 part
sdf 8:80 0 931.5G 0 disk
ââsdf1 8:81 0 931.5G 0 part
sdg 8:96 0 931.5G 0 disk
sdh 8:112 0 1.8T 0 disk
ââsdh1 8:113 0 100M 0 part
ââsdh2 8:114 0 1.8T 0 part
sr0 11:0 1 696M 0 rom /run/initramfs/live
loop0 7:0 0 20K 1 loop
loop1 7:1 0 4.2M 1 loop
ââlive-osimg-min 253:2 0 8G 1 dm
loop2 7:2 0 626.1M 1 loop
loop3 7:3 0 8G 1 loop
ââlive-rw 253:0 0 8G 0 dm /
ââlive-base 253:1 0 8G 1 dm
ââlive-osimg-min 253:2 0 8G 1 dm
loop4 7:4 0 512M 0 loop
ââlive-rw 253:0 0 8G 0 dm /

I hope that helps explain the situation I'm in now.
Attached Thumbnails
Click image for larger version

Name:	Screenshot from 2016-05-02 02_26_08.png
Views:	24
Size:	149.0 KB
ID:	21671  
 
Old 05-01-2016, 08:47 PM   #11
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,780

Rep: Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213
That is different from what I expected to see. When you said, "striped logical volume group," I thought you meant you had used LVM to do the striping. What I see in that lsblk output makes sense only for a hardware or software RAID array that is then encrypted as a single unit. My guess is that it's an MD RAID array with a version 0.9 or 1.0 superblock.

This would have been easier if you had used a raw disk drive rather than a file as the ddrescue destination. Trying to assemble a RAID array from 6 devices and 1 file would be difficult, perhaps impossible if that array is essential for booting.

What are your plans for reconstructing the system? If you already have a suitable replacement drive, the simplest thing to do would be to install that drive in place of the failing /dev/sdd (I presume that's the failing drive.) and copy the image back to the new drive:
Code:
dd if=/mnt/win/sdd.img of=/dev/sdd bs=256k
Then, everything should "just work." Do be absolutely sure that the "of=" is going to the right drive. It should be identifiable by its lack of a partition table prior to restoring the image to it. Get that wrong, and all is lost. That "bs=256k" is a fairly arbitrary block size. You just want something substantially larger than the default 512 bytes or the operation will be very slow.

Last edited by rknichols; 05-01-2016 at 08:52 PM.
 
1 members found this post helpful.
Old 05-01-2016, 09:38 PM   #12
r00tk1ll
LQ Newbie
 
Registered: Apr 2016
Posts: 14

Original Poster
Rep: Reputation: Disabled
Hello rknichols,
Thank you for your help.

Quote:
What I see in that lsblk output makes sense only for a hardware or software RAID array that is then encrypted as a single unit
That makes perfect sense because when the OS was installed, it was handled through the CENTOS installation DVD specifying to use all the detected SATA drives as one file system and encrypt it. There is no hardware RAID controller, all the drives are plugged directly into the motherboard.

The goal for me is to just extract any retrievable data from the /var/www/html directory, I do have a replacement drive for the failed one.

I wanted to consult with those who have more experience in this area before risk ruining my chance of recovering any data so I made the ddrescue image file in case the drive failed.

Replacing the drive and copying the image over to a new physical drive and putting it in place of the failed one does seem like the most logical solution, so my last questions are:

1.) Once I copy the ddrescue image of the failed drive to a new drive, and introduce it into the array, will the LVG become confused and need to be rebuilt? If so, where should I look for how to do that (being thats it's a Luks encrypted image, will it be different from a standard LVG rebuild procedure)

2.) Ive read on some other postings about multiple passes with ddrescue, would it be worth the trouble to do that or just try to recover with the image I just copied from the failed drive. Here is the post Im referring to:
Either way, I really appreciate everyone's help and if I solve this I will post a full breakdown of what it took to accomplish it as to help someone else out.

Last edited by r00tk1ll; 05-01-2016 at 09:39 PM.
 
Old 05-01-2016, 10:07 PM   #13
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,780

Rep: Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213
1. The system won't see any difference with the new drive (except that it works, of course). Do be sure to unplug the old drive. You really don't want to have two identically labeled drives in the system.

2. How much was ddrescue able to recover? What did the final status display look like? If there are still unrecovered sectors, you can rerun it with a nonzero number for --retry-passes. Be sure to use the same image and log files, and ddrescue will pick up where it left off.

Last edited by rknichols; 05-01-2016 at 10:11 PM. Reason: Add, "Do be sure ..."
 
1 members found this post helpful.
Old 05-01-2016, 10:55 PM   #14
r00tk1ll
LQ Newbie
 
Registered: Apr 2016
Posts: 14

Original Poster
Rep: Reputation: Disabled
The log file only shows the start time and the finish time, from the original CLI it seemed like everything was ok there was nothing printed that said unrecoverable or anything of that nature. So from this point, would the procedure be copy the ddrescue image over to the new disk? Then use the CENTOS DVD in recovery mode to mount the LVG as if it weere never disturbed?

Thanks
 
Old 05-02-2016, 09:09 AM   #15
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,780

Rep: Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213
The statistics are just displayed on the screen, not recorded in the log file. The only indication that some sectors were not recovered is a non-zero number for "errors: " in the stats. If you just run the ddrescue command again (without requesting retries) you can see it. Or, you can look through the log file for lines with a status other than "+". From the manpage, the meaning of those status characters is
Code:
'?' 	non-tried block
'*' 	failed block non-trimmed
'/' 	failed block non-scraped
'-' 	failed block bad-sector(s)
'+' 	finished block
Since ddrescue did finish, I believe you should not see any status other than "+" or "-".

Back in #11, I gave the command for copying the image back to the new drive. Then you can assemble the array, unlock the encryption, and run lvscan to find the LVM logical volumes. I recommend running "fsck -f -n" on each of the filesystems to verify its condition. Do include the "-n" option so that fsck won't try to "fix" anything. That could be disasterous if there are unrecovered blocks in the filesystem metadata. You need to see the extent of any damage first.

Last edited by rknichols; 05-02-2016 at 09:43 AM.
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
help mounting drive that has redhat volume group to copy data ron7000 Linux - General 5 10-29-2015 06:37 PM
LVM Mount Physical Volume/Logical Volume without a working Volume Group mpivintis Linux - Newbie 10 01-11-2014 07:02 AM
Extended LVM Volume group and Logical Volume. But space not usable linuxlover.chaitanya Linux - Server 1 11-19-2012 09:37 AM
I have spce in volume group but it can not increase the size of logical volume anis123 Linux - Newbie 14 04-16-2012 06:23 AM
[SOLVED] Redhat volume group,logical volume group dhairysheel Red Hat 3 08-02-2011 05:07 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 12:36 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration