CentOS Software RAID:1 disk replacement

LQParsons · 08-11-2018, 06:37 AM

Hi.
I have two 2TiB drives. When I installed my CentOS on my new machine oh so many years ago, I chose to let it do a software RAID. The next thing I chose was to let LVM handle everything. So those things have been invoked using the usual incantations, I didn't make a lot of choices.

The RAID is checked occassionally, and comes up clean.
However, logwatch is giving me lots of UC errors on my sdb.
So perhaps I should change it out.
This link seems to be a clear and simple procedure to follow.

Code:

https://linuxadminonline.com/replace-faulty-hard-disk-software-raid-1-centos-7/

My question is, does my use of LVM complicate matters?
Is there something else I need to do before or after?

My mdadm output follows:

Code:

$ sudo mdadm --detail /dev/md127

/dev/md127:
           Version : 1.2
     Creation Time : Mon Jun 15 20:18:05 2015
        Raid Level : raid1
        Array Size : 1952870400 (1862.40 GiB 1999.74 GB)
     Used Dev Size : 1952870400 (1862.40 GiB 1999.74 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Mon Aug  6 09:33:52 2018
             State : clean 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : bitmap

              Name : localhost:pv00
              UUID : 3683bca4:d82b68ff:fa27cb84:d69dd1b1
            Events : 698350

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       17        1      active sync   /dev/sdb1

jlinkels · 08-11-2018, 09:51 AM

You should check if your phsycial volume is on /dev/md0. Just to be sure. It is the normal way to create LVM on top of RAID, not the other way around. Use pvdisplay.
It also seems that you boot partition is on /dev/sda1. The RAID comprised sda2 and sdb1. It seems that you can safely replace sdb.
Note that mdadm is very resilient. If you fail one disk, and things go wrong with the new disk you can re-install and re-add the old disk and that is surprisingly successful.
As always, I recommend to create a VM and simulate the complete process on a test environment before doing this in production.
I miss the boot partition on /dev/sdb. It means that you cannot boot if /dev/sda fails. That is something which should have been covered during installation of the RAID, but apparently it is not.

jlinkels

LQParsons · 08-12-2018, 02:20 PM

Hi.
Thanks.
I can't be sure of the order I built things, it was so long ago.
I suspect that the build would do it correctly, that neophytes in building a CentOS system like myself at the time, would just follow the line of questioning.

As to /dev/md, this is what I get, and it's the only 'md' device.

Code:

sudo pvdisplay
  --- Physical volume ---
  PV Name               /dev/md127
  VG Name               centos
  PV Size               <1.82 TiB / not usable 4.00 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              476774
  Free PE               31
  Allocated PE          476743
  PV UUID               mq3sH4-h8wr-ciOf-BsQ1-Tnh5-6wNL-qVufCT

The boot worries me as well.
The "fdisk.txt" enclosed in the original post shows /dev/sda2 equivalent in size to /dev/sdb1 so they are "raid'd" up, which leads me believe I can easily replace my second TiB drive (which, fortunately, is my problem at the moment), but I'm S*OuttaLuck if I need to replace my first TiB drive because of the missing "boot" designation. I was hoping I was mis-reading something, but it seems you've confirmed my fears.

LQParsons · 08-12-2018, 02:38 PM

I'd love to VM and test before I go live, but, my 'raid' is my physical system.
Essentially the only thing I do when I physically boot the system, other than check root's email and do a

Code:

# yum update

weekly, is run the KVM (initiates at boot) then with

Code:

virt-manager

access the VMs that I use for work-stations.

If it's of any interest/use, my disk in my Linux VM looks like this:

Code:

sudo fdisk -l

Disk /dev/vda: 34.4 GB, 34359738368 bytes, 67108864 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x0008c089

   Device Boot      Start         End      Blocks   Id  System
/dev/vda1   *        2048     1026047      512000   83  Linux
/dev/vda2         1026048    67108863    33041408   8e  Linux LVM

Disk /dev/mapper/centos_gamgee-root: 30.3 GB, 30349983744 bytes, 59277312 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/centos_gamgee-swap: 3435 MB, 3435134976 bytes, 6709248 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Looks like I'm sunk if my primary TiB drive goes bad.

syg00 · 08-12-2018, 06:46 PM

It's only a boot partition - the data should be safe. And of course, you could always restore your backup.
It is interesting that anaconda would build a system like that when a boot partition was allocated. Let's see this.

Code:

lsblk -f

jlinkels · 08-12-2018, 07:19 PM

Like syg00 it is only a boot sector which is missing. It is just annoying if the boot disk fails you'd have more work and it takes more time to get a running system again. Fortunately "not booting" is a small problem in Linux.

As for the VM, you'd be able to build a VM on this very system on which you have the failing disk problem. With 10GB space you have plenty to install Centos and a few RAID disks. It is all in the VDI or VMX file, remember?

jlinkels

LQParsons · 08-13-2018, 09:10 AM

Code:

sudo lsblk -f
[sudo] password for petc: 
NAME              FSTYPE            LABEL          UUID                                   MOUNTPOINT
sda                                                                                       
├─sda1            xfs                              5ce4e9c8-6a9f-49d8-b82d-a40a26c39383   /boot
└─sda2            linux_raid_member localhost:pv00 3683bca4-d82b-68ff-fa27-cb84d69dd1b1   
  └─md127         LVM2_member                      mq3sH4-h8wr-ciOf-BsQ1-Tnh5-6wNL-qVufCT 
    ├─centos-swap swap                             b5da9226-c983-48e0-90cc-1fa7f6a62600   [SWAP]
    ├─centos-root xfs                              2eca4343-7f4b-411c-a260-f96fe60c0d1b   /
    └─centos-home xfs                              23c9f0a4-ebf0-4b12-8dbc-599020e6e4b8   /home
sdb                                                                                       
└─sdb1            linux_raid_member localhost:pv00 3683bca4-d82b-68ff-fa27-cb84d69dd1b1   
  └─md127         LVM2_member                      mq3sH4-h8wr-ciOf-BsQ1-Tnh5-6wNL-qVufCT 
    ├─centos-swap swap                             b5da9226-c983-48e0-90cc-1fa7f6a62600   [SWAP]
    ├─centos-root xfs                              2eca4343-7f4b-411c-a260-f96fe60c0d1b   /
    └─centos-home xfs                              23c9f0a4-ebf0-4b12-8dbc-599020e6e4b8   /home
sr0

jlinkels · 08-13-2018, 09:15 AM

Quote:

Originally Posted by syg00

Code:

lsblk -f

Woooow... that is a nice command!

jlinkels

LQParsons · 08-13-2018, 09:19 AM

Thanks.

Code:

Like syg00 it is only a boot sector which is missing. 
It is just annoying if the boot disk fails you'd have more work and it takes more time to 
get a running system again. 
Fortunately "not booting" is a small problem in Linux.

As for the VM, you'd be able to build a VM on this very system on which you have the 
failing disk problem. 
With 10GB space you have plenty to install Centos and a few RAID disks. 
It is all in the VDI or VMX file, remember?

jlinkels

I'll keep a link to this discussion in my notes.
So far, the uncorrectable errors on the 'spare' disks are annoying.
When I'm ready, later in the Fall, I'll start doing the step-by-step as you recommend.

Thank you for your help, counsel and advice.
I'll NOT mark this 'solved', as it won't be solved until I actually do it -- I may have further questions later.
(Unless you'd rather I do otherwise.)

Enjoy.
-d

syg00 · 08-13-2018, 08:09 PM

It's not just the MBR code that would need replacing - the boot partition will need creating and the grub package itself re-installed to get the code installed on the second disk as well. Then grub2-install (I assume, Fedora uses that naming), then mkconfig.

Presumably there is sufficient free space on that second disk you could allocate a partition for /boot to be mirrored to. Not trivial, but you could arrange a RAID1 set yourself for /boot after that - the MBR would still need updating on that (second) disk as well. Not sure about dracut for the booting - must be some doco on the web somewhere.