LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 03-31-2016, 11:02 AM   #1
Manager
LQ Newbie
 
Registered: Dec 2011
Location: Bangkok
Distribution: CentOS
Posts: 8

Rep: Reputation: Disabled
raid 1 disk errors, how to fail then securely wipe one disk at remote data center


This must be a common problem for those diligent about security, but I've had difficulty finding answers which are clear enough, so I wonder if anybody here knows the answer.

I have a root linux server at a giant remote data center for which I have a longterm contract, just one of their countless rack server customers. I have no physical access, and no personal help or support there, except they are obligated to replace any failed hardware.

I run a standard raid 1 system with two disks. One disk is starting to fail so it must be replaced very soon now, but it has sensitive information on it, so I want to wipe the disk before requesting a replacement disk, for obvious security reasons. The disks are not encrypted (and maybe that's a different question for another day... but let's not talk encryption now, just talk about wiping).

Now, I want to:
1. fail the disk
2. wipe the disk
without affecting the other disk so I don't lose any data.

(After years of flawless service, one of those disks, sda, has started flaking out over the past few weeks, with SMART reporting rapidly increasing Reallocated Sector Count, and other errors, quite unlike sdb which is completely stable, and the server is getting very slow due to long sync attempts between the two disks.)

So, if I understand correctly, I should:
1. Make sure GRUB is mirrored to sdb (this is what I read elsewhere)
2. Fail sda
3. Wipe the failed sda (this is what I'm not sure how to do safely)
4. Request they replace sda with a working disk
5. Boot up on sdb
6. Start the sync process to the new sda

I don't know how to do step 3 above safely without endangering the data on sdb. It is loaded with data. I also don't want to have the server down for a long time. It is possible to take this server offline overnight if necessary, or even a little longer, but the less downtime, the better. The disks are only 250 GB each, of which much less than half has been used, so it's not a lot of data. The main issue is just replacing the bad disk without losing data.

Does anybody know how to do this, or have any clues which might point in the right direction?
 
Old 03-31-2016, 11:12 AM   #2
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,679
Blog Entries: 4

Rep: Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947
I would suggest installing LVM (Logical Volume Management) into the operating-system, even "retroactively," then use this to define a physical storage pool consisting of both the failed drive and its replacement. Once you have done this, you can instruct LVM to migrate all data off of the failed drive, and not to put anything else onto it. Once the migration is complete, you can then remove the failed drive from the pool.

In the future, LVM will make storage management much easier for you to handle. It is a very thorough solution to a common, and vexing, problem.

Last edited by sundialsvcs; 03-31-2016 at 11:13 AM.
 
Old 03-31-2016, 01:25 PM   #3
Manager
LQ Newbie
 
Registered: Dec 2011
Location: Bangkok
Distribution: CentOS
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by sundialsvcs View Post
I would suggest installing LVM (Logical Volume Management) ...
Thanks, but I don't think that is a solution, unless I've misunderstood something.
The disks already had LVM set up years back, and have always been LVM disks.
Since the two disks are mirrors and hence identical, I don't see any benefit to copying the bad disk to the good disk as you suggest.
The bad disk can simply be removed and replaced, and no data is lost.

The issue is wiping the bad one before removal, so my users' data cannot be accessed by anybody who handles the disk.

I must first fail the bad disk so there is no longer any mirroring.
Then I must wipe the bad disk after failing it, maybe using one of these programs:

wipe
shred
badblocks
hdparm
fdisk (using some trick)
dd
[maybe something else; I'm using CentOS]

... all making sure that the wiped disk is not mirrored to the disk with the data, and making sure the disk with all the data is not affected at all.

I've never used any of the above programs except fdisk, and this is quite scary because it would affect a lot of users if I make a mistake and accidentally deleted their data. Any recommendations would be appreciated.

I'm also not sure I can access a disk which mdadm has failed, or which programs can access it for the purpose of wiping it. I'm learning here ... but there's a first time for everything in life. :-)

This situation needs to be thought thru and planned step-by-step before starting the process.

Last edited by Manager; 03-31-2016 at 01:29 PM.
 
Old 03-31-2016, 01:57 PM   #4
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,783

Rep: Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214
Are you using the whole, unpartitioned disk as a RAID member, or did you partition the disk and then set up the partition as a RAID member?

Actually, that doesn't make a lot of difference. Since you're mentioning "/dev/sda" and "/dev/sdb" you are presumably talking about software RAID with mdadm. Once you "--fail" the drive or partition and "--remove" it from the array, you can then access the drive independently and run "dd if=/dev/zero of=/dev/sda bs=1M" to zero it. (The "bs=1M" is for efficiency and is fairly arbitrary. You just want something a good deal larger than the default 512 byte block size.) Note that there is no way to wipe the bad sectors that were reallocated, and more of the pending sectors will get reallocated when you try to zero them. The data in those sectors might be revealed by forensic recovery, and there is nothing you can do about that.

The level of your concern about losing data on /dev/sdb suggests that you might not have any backup. If that is the case, your first task is to write the words "RAID is not a backup!" one hundred times with a permanent marker all over the interior walls of your residence.
 
1 members found this post helpful.
Old 03-31-2016, 06:21 PM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,145

Rep: Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124
+1.
But "s/residence/office" - don't want to upset the family too much ...

Having just yesterday set up a LVM RAID1 on a Pi 3 I am using as a gateway/firewall for the home LAN, I would suggest you keep an eye on LVM RAID support these days. That is, just LVM defined RAID, not LVM over mdadm. Has some nice features - like failure policy. And yes, it uses md and dm under the covers.
Doesn't help with your concern about scrubbing the wrong disk, but will be more usable in the future methinks.

Does your vendor use some-one (certified) to dispose of the disks ?. Might save all the angst.
 
1 members found this post helpful.
Old 03-31-2016, 07:30 PM   #6
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,783

Rep: Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214
Replacing a member of a RAID array is about as basic a procedure as you're going to find. Here's a sample session from a 2-disk RAID 1 device that I set up quickly on a VM, with a couple of looks at /proc/mdstat to show what is going on. Commands entered are shown in blue. There was an ext4 filesystem mounted from that device throughout the procedure:
Code:
[localhost ~]# mdadm /dev/md0 --fail /dev/sda
mdadm: set /dev/sda faulty in /dev/md0
[localhost ~]# cat /proc/mdstat
Personalities : [raid1] 
md0 : active raid1 sdb[1] sda[0](F)
      10477568 blocks super 1.2 [2/1] [_U]
      
unused devices: <none>
[localhost ~]# mdadm /dev/md0 --remove /dev/sda
mdadm: hot removed /dev/sda from /dev/md0
[localhost ~]# cat /proc/mdstat
Personalities : [raid1] 
md0 : active raid1 sdb[1]
      10477568 blocks super 1.2 [2/1] [_U]
      
unused devices: <none>
Now that it can be seen that /dev/sda is no longer part of the array, it's safe to wipe it:
Code:
[localhost ~]# dd if=/dev/zero of=/dev/sda bs=1M
dd: writing `/dev/sda': No space left on device
10241+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 56.5405 s, 190 MB/s
And here is the "replacement" disk being added to the array:
Code:
[localhost ~]# mdadm /dev/md0 --add /dev/sda
mdadm: added /dev/sda
[localhost ~]# cat /proc/mdstat
Personalities : [raid1] 
md0 : active raid1 sda[2] sdb[1]
      10477568 blocks super 1.2 [2/1] [_U]
      [>....................]  recovery =  4.2% (449728/10477568) finish=2.2min speed=74954K/sec
      
unused devices: <none>
[localhost ~]# cat /proc/mdstat
Personalities : [raid1] 
md0 : active raid1 sda[2] sdb[1]
      10477568 blocks super 1.2 [2/1] [_U]
      [=========>...........]  recovery = 45.7% (4792960/10477568) finish=1.2min speed=77560K/sec
      
unused devices: <none>
[localhost ~]# cat /proc/mdstat
Personalities : [raid1] 
md0 : active raid1 sda[2] sdb[1]
      10477568 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>
Assuming your provider can hotplug the disks (and there's not much sense in offering RAID support if they can't), your system can remain up the entire time.
 
2 members found this post helpful.
Old 03-31-2016, 08:59 PM   #7
JJJCR
Senior Member
 
Registered: Apr 2010
Posts: 2,162

Rep: Reputation: 449Reputation: 449Reputation: 449Reputation: 449Reputation: 449
Just my 2 cents in understanding your issue.

RAID 1 is mirroring. So both disk are identical.

If there's a way to break the RAID temporarily and mount the bad disk as a local drive then you can wipe it.

If you have physical access, then just destroy the disk rather than going through all this difficulties and even if you format or wipe the drive and fill it with zeroes; who knows that there might be a way to reverse it and get back the data.

But if you destroy the disc into pieces as what other companies do, then who would dare to bring it back to life just to get the data.

Cheers!!!

Last edited by JJJCR; 03-31-2016 at 08:59 PM. Reason: edit
 
Old 04-01-2016, 04:52 AM   #8
Manager
LQ Newbie
 
Registered: Dec 2011
Location: Bangkok
Distribution: CentOS
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by rknichols View Post
Replacing a member of a RAID array is about as basic a procedure as you're going to find. Here's a sample session from a 2-disk RAID 1 device that I set up quickly on a VM ...
Thank you, RK Nichols, this is a complete answer, I think the simplest and best answer possible, and very well written, including using dd to wipe the disk. I think this topic can be closed now.

Thanks, everybody. And yes, a mirror is not a backup, and everybody should keep offsite backups, too, e.g., by rsync using SSH to another system. Actually, it's better to keep more than one backup, and be careful because you may not know how good your backups are until you must use them. An occasional test / audit / restore can help you get a good night's sleep.

As y'all know, a mirror just keeps the system up and running in case of a disk failure, and prevents the need for a reinstallation and reconfiguration of everything in such a situation.
 
Old 04-01-2016, 05:47 AM   #9
TenTenths
Senior Member
 
Registered: Aug 2011
Location: Dublin
Distribution: Centos 5 / 6 / 7
Posts: 3,483

Rep: Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556
If you go with full disk encryption remember that you'll either have to set it up to mount and automatically use the decryption password (which totally defeats the purpose in a mirrored volume!) or someone will have to manually enter the password on the console if the server is re-booted.

Alternatively create an un-encrypted partition for "everything" and create an encrypted partition that has to be manually mounted, that way you can log in to the server after a reboot and issue the mount command and password for the partition.
 
1 members found this post helpful.
Old 04-03-2016, 12:41 PM   #10
hortageno
Member
 
Registered: Aug 2015
Distribution: Ubuntu 22.04 LTS
Posts: 240

Rep: Reputation: 67
Just to make you aware of that dd exits if it can't write to a bad sector as far as I know. So if you don't see the "No space left on device" message at the end it might not have wiped the whole disk. I use ddrescue with a logfile for that reason. You can resume the wiping if it aborts for the remainder of the disk.
 
2 members found this post helpful.
Old 04-13-2016, 05:00 PM   #11
Manager
LQ Newbie
 
Registered: Dec 2011
Location: Bangkok
Distribution: CentOS
Posts: 8

Original Poster
Rep: Reputation: Disabled
Does the following mean that both sdb and sda are bootable in case I fail and remove sda?

# cat /boot/grub/device.map
(fd0) /dev/fd0
(hd0) /dev/sda
(hd1) /dev/sdb

#/sbin/grub

grub> find /boot/grub/stage1
find /boot/grub/stage1
(hd0,0)
(hd1,0)
grub> find /boot/grub/grub.conf
find /boot/grub/grub.conf
(hd0,0)
(hd1,0)

[GNU GRUB version 0.97]

[Some advice on the web seems to suggest that I should switch sdb to be hd0 by these commands in order to boot off sdb:

# grub-install /dev/sdb

#grub
grub> device (hd0) /dev/sdb
grub> root (hd0,0)
grub> setup (hd0)

... but is this really necessary? It appears to me from the above that sdb is already bootable and I don't need to do the latter steps.]

[Here is my system:]

# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sdb3[0] sda3[1]
238324160 blocks [2/2] [UU]
[i.e., 238 GB]

md1 : active raid1 sdb1[1] sda1[0]
3911680 blocks [2/2] [UU]
[i.e., 4 GB]

[I am a little bit confused that there is no md2 but I guess it is swap and this is not important?]

# /sbin/fdisk -l /dev/sda

Disk /dev/sda: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 1 487 3911796 fd Linux raid autodetect
/dev/sda2 488 731 1959930 82 Linux swap / Solaris
/dev/sda3 732 30401 238324275 fd Linux raid autodetect

Disk /dev/sdb: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 1 487 3911796 fd Linux raid autodetect
/dev/sdb2 488 731 1959930 82 Linux swap / Solaris
/dev/sdb3 732 30401 238324275 fd Linux raid autodetect

[Note that there is no * Boot partition. This is confusing to me.]

# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md1 3.7G 1.2G 2.6G 32% /
/dev/mapper/vg00-usr 20G 1.3G 19G 7% /usr
/dev/mapper/vg00-var 80G 25G 56G 31% /var
/dev/mapper/vg00-home 40G 18G 23G 43% /home
none 989M 8.0K 989M 1% /tmp

If everything above looks good, then I'll go ahead and fail sda and remove it for replacement.
 
Old 04-13-2016, 05:43 PM   #12
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,783

Rep: Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214
The output from lsblk might give you a better view of what is where. Devices /dev/sda1 and /dev/sdb1 are the ~4G RAID 1 array that holds your root filesystem, which includes the /boot directory. Since it is RAID 1, the same thing exists on both drives.

If you remove /dev/hda from the system, the drive that is now /dev/hdb (hd1) will then be /dev/hda (hd0), and /dev/hdb will no longer exist. You do, however, want to be sure that there is a boot loader installed on /dev/sdb, and that what is now /dev/sdb will indeed be (hd0) if that boot loader is ever used. The boot loader is installed in sectors that are outside the RAID partitions, and so would not be automatically replicated.

Since you have 2 RAID arrays, you need to "--fail" and "--remove" both /dev/sda1 and /dev/sda3, so that /proc/mdstat no longer has any reference to /dev/sda. The example I gave above was for a single RAID array on the whole disks /dev/sda and /dev/sdb.
 
1 members found this post helpful.
Old 04-16-2016, 12:48 PM   #13
Manager
LQ Newbie
 
Registered: Dec 2011
Location: Bangkok
Distribution: CentOS
Posts: 8

Original Poster
Rep: Reputation: Disabled
Post RAID 1 /sda replacement step-by-step done successfully

Here is a report of a successful RAID 1 disk replacement, step by step, which I successfully implemented this weekend during a time of minimal server demand.
The steps below worked for me, for replacing a faulty /sda in a RAID 1 array with only /sda and /sdb of configuration noted below.
This is a hotplugged server.
I am running CentOS 5.11

[First, make sure your offsite backup is current.]

[Get the serial number of the disk drive to replace, both to double check that the correct disk is later removed, and also for the data center's records (and they might not remove the disk for you unless you provide this information):]

# /usr/sbin/smartctl -i /dev/sda

Model Family: Seagate Barracuda 7200.10
Device Model: ST3250310AS
Serial Number: 5KLMNP6
User Capacity: 250,059,350,016 bytes [250 GB]

[Also get the serial number(s) of other disk(s), just in case they take out the wrong one and tell you the serial number of the wrong one, so you will recognize it.]

# /usr/sbin/smartctl -i /dev/sdb

[Note its serial number, too. Two disks. I won't repeat it here, but it looks similar to the above.]

[Make sure the other disk is bootable:]

# /sbin/grub-install /dev/sdb

[Double check everything:]

#/sbin/grub

grub> find /boot/grub/stage1
find /boot/grub/stage1
(hd0,0)
(hd1,0)
grub> find /boot/grub/grub.conf
find /boot/grub/grub.conf
(hd0,0)
(hd1,0)

# cat /boot/grub/device.map
(fd0) /dev/fd0
(hd0) /dev/sda
(hd1) /dev/sdb

[Get the current status of the disks:]

# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sdb3[0] sda3[1]
238324160 blocks [2/2] [UU]
[i.e., 238 GB]

md1 : active raid1 sdb1[1] sda1[0]
3911680 blocks [2/2] [UU]
[i.e., 4 GB]

[Note: My configuration has md1 and md3 for some odd reason. Yours may vary.]
[Note: I needed to fail sda3 on md3 and sda1 on md1 .]

# /sbin/mdadm /dev/md3 --fail /dev/sda3
mdadm: set /dev/sda3 faulty in /dev/md3

# /sbin/mdadm /dev/md1 --fail /dev/sda1
mdadm: set /dev/sda1 faulty in /dev/md1

[Now double check the new current status of the disks:]

# cat /proc/mdstat

Personalities : [raid1]
md3 : active raid1 sdb3[0] sda3[2](F)
238324160 blocks [2/1] [U_]

md1 : active raid1 sdb1[1] sda1[2](F)
3911680 blocks [2/1] [_U]

unused devices: <none>

# /sbin/mdadm /dev/md3 --remove /dev/sda3
mdadm: hot removed /dev/sda3

# /sbin/mdadm /dev/md1 --remove /dev/sda1
mdadm: hot removed /dev/sda1

# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sdb3[0]
238324160 blocks [2/1] [U_]

md1 : active raid1 sdb1[1]
3911680 blocks [2/1] [_U]

unused devices: <none>

[Wipe your data on the bad disk, to make sure any bad guys cannot access your private data on the discarded disk.]
[Of course, be extremely careful in the next step that you reference the correct disk, the one to be removed.]

# dd if=/dev/zero of=/dev/sda bs=1M
dd: writing `/dev/sda': No space left on device
238476+0 records in
238475+0 records out
250059350016 bytes (250 GB) copied, 2833.18 seconds, 88.3 MB/s

[Of course, this took a long time. If you don't get "No space left on device" then try it again, as the forum member hortageno notes, because dd may have exited due to an error in writing to a bad sector or something like that, so if that happens, try resuming by running it again as many times as necessary to get "No space left on device". For me, it worked the first time, no need to rerun it.]

[Request the disk be removed and replaced. Wait for tech support to report that they have completed their work replacing the disk.]

[ DISK REPLACED ]

[Look for the disk.]
# /usr/sbin/smartctl -i /dev/sda
Smartctl open device: /dev/sda failed: No such device

# /usr/sbin/smartctl -i /dev/sdc
[Gives lots of details on the new device, including the new serial number. Make sure this is a new serial number, and thus a new disk!]

[NOTE: After replacement, the new disk was recognized by my Linux as /sdc so there is no /sda in it now! I assume this is because it was a hotplugged replacement. However, I will choose to deal with it as /sdc for now, set it up, then reboot, for better or for worse. Opinions may vary on this.]

[Check from the above smartctl -i commands that the new hard disk has at least as much disk space as the original.]
[Record the new serial number of the replacement disk in your records.]
[I suggest you also run this command to check the health of the disk:]

# /usr/sbin/smartctl -a /dev/sdc

[Check the Reallocated Sector Count, and any self-test log. You will also notice the Power On Hours, and the Start Stop Count, which tells you whether or not they gave you a new disk, and if it's a used disk, then how many hours the disk had previously been operating. If all is acceptable, then proceed with the following steps.]

[First, clone the partition table from the working disk to the new disk:]

# sfdisk -d /dev/sdb | sfdisk /dev/sdc

[The next step may really slow down your server, as it starts copying one disk to the other, so you may want to wait until a time of minimum server load by users.]

# /sbin/mdadm /dev/md1 --add /dev/sdc1
mdadm: added /dev/sdc

[After that command, it automatically starts syncing the disks. You don't need to do anything more as regards md1, except wait for them to sync. To display status:]

# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sdb3[0]
238324160 blocks [2/1] [U_]

md1 : active raid1 sdc1[2] sdb1[1]
3911680 blocks [2/1] [_U]
[======>..............] recovery = 32.1% (1259840/3911680) finish=0.6min speed=69991K/sec

unused devices: <none>

[I waited until that completed, just rerunning the command. Finally, it will complete and show this:]

# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sdb3[0]
238324160 blocks [2/1] [U_]

md1 : active raid1 sdc1[0] sdb1[1]
3911680 blocks [2/2] [UU]

unused devices: <none>

[It has completed for md1, so next it's time do start doing md3.]

# /sbin/mdadm /dev/md3 --add /dev/sdc3

# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sdc3[2] sdb3[0]
238324160 blocks [2/1] [U_]
[>....................] recovery = 0.1% (308992/238324160) finish=102.7min speed=38624K/sec

md1 : active raid1 sdc1[0] sdb1[1]
3911680 blocks [2/2] [UU]

unused devices: <none>

[Wait for that to complete.]

# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sdc3[1] sdb3[0]
238324160 blocks [2/2] [UU]

md1 : active raid1 sdc1[0] sdb1[1]
3911680 blocks [2/2] [UU]

unused devices: <none>

[Finally confirmed completed.]

[Now make the new disk bootable.]

# /sbin/grub-install /dev/sdc
/dev/sdc does not have any corresponding BIOS drive.

[Oops. Must find a solution to that.]

# cat /boot/grub/device.map

(fd0) /dev/fd0
(hd0) /dev/sda
(hd1) /dev/sdb

[The reported /sda disk no longer exists at the moment, and it does not report the existence of the new /sdc , so to proceed further, we need to make it recognize /sdc for the moment. Instead of editing device.map I did this to add the new disk:]

# /sbin/grub-install --recheck /dev/sdc
Probing devices to guess BIOS drives. This may take a long time.
Installation finished. No error reported.
This is the contents of the device map /boot/grub/device.map.
Check if this is correct or not. If any of the lines is incorrect,
fix it and re-run the script `grub-install'.

(fd0) /dev/fd0
(hd0) /dev/sdb
(hd1) /dev/sdc

[... which automatically edited device.map for me.]

# cat /boot/grub/device.map

(fd0) /dev/fd0
(hd0) /dev/sdb
(hd1) /dev/sdc

# /sbin/grub
Probing devices to guess BIOS drives. This may take a long time.
GNU GRUB version 0.97 (640K lower / 3072K upper memory)

[Find the grub setup files.]
[Some instructions say to find /boot/grub/grub.conf but the examples I saw on the internet for Raid 1 said to use /boot/grub/stage1 and did not mention grub.conf .]

grub> find /boot/grub/stage1
find /boot/grub/stage1
(hd0,0)
(hd1,0)

[Install grub on the MBR:]

grub> device (hd1) /dev/sdc
device (hd1) /dev/sdc

grub> root (hd1,0)
root (hd1,0)
Filesystem type is ext2fs, partition type 0xfd

[(Oh my goodness, this server is still on ext2fs ?? ... but moving to ext3 or ext4 is another task for another time.)]

grub> setup (hd1)
setup (hd1)
Checking if "/boot/grub/stage1" exists... yes
Checking if "/boot/grub/stage2" exists... yes
Checking if "/boot/grub/e2fs_stage1_5" exists... yes
Running "embed /boot/grub/e2fs_stage1_5 (hd1)"... 15 sectors are embedded.
succeeded
Running "install /boot/grub/stage1 (hd1) (hd1)1+15 p (hd1,0)/boot/grub/stage2 /boot/grub/grub.conf"... succeeded
Done.

grub> quit

# /sbin/grub-install /dev/sdc
Installation finished. No error reported.
This is the contents of the device map /boot/grub/device.map.
Check if this is correct or not. If any of the lines is incorrect,
fix it and re-run the script `grub-install'.

(fd0) /dev/fd0
(hd0) /dev/sdb
(hd1) /dev/sdc

[At this point, this task is finished, but I was curious what would happen with a reboot, especially since there was no /sda . So I rebooted later (at a time of minimum server demand).]

# reboot

[Wait until it boots back up.]

# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sdb3[0] sda3[1]
238324160 blocks [2/2] [UU]

md1 : active raid1 sdb1[1] sda1[0]
3911680 blocks [2/2] [UU]

unused devices: <none>

# cat /boot/grub/device.map
(fd0) /dev/fd0
(hd0) /dev/sdb
(hd1) /dev/sdc

[Note that /proc/mdstat does not have /sdc any longer, it having been reidentified as /sda after boot, but grub/devicemap still has /sdc and no /sda .]

# /sbin/grub-install --recheck /dev/sda
Probing devices to guess BIOS drives. This may take a long time.
Installation finished. No error reported.
This is the contents of the device map /boot/grub/device.map.
Check if this is correct or not. If any of the lines is incorrect,
fix it and re-run the script `grub-install'.

(fd0) /dev/fd0
(hd0) /dev/sda
(hd1) /dev/sdb

[It has now made hd0 to be /sda and hd1 to be /sdb .]

[Rebooted again, and things look the same.]

[Finally, check the serial numbers of the disks again, to see which one is the new /sda .]

# /usr/sbin/smartctl -i /dev/sda
# /usr/sbin/smartctl -i /dev/sdb

[I found that /sdc has become /sda . The old /sdb is still the same old /sdb .]

[This completes the entire process of replacing an /sda hard disk.]

[You may want to record the results of S.M.A.R.T., such as the reallocated sector count for potential future reference.]

# /usr/sbin/smartctl -a /dev/sda

[You might also want to run a smartctl disk test on the new disk drive, during a period of low server demand. I won't cover that here.]

[As a side note, I find the fdisk output to be perhaps notable, especially since there is no asterisk * under the Boot column, but the server still rebooted fine:]

# /sbin/fdisk -l

Disk /dev/sda: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 1 487 3911796 fd Linux raid autodetect
/dev/sda2 488 731 1959930 82 Linux swap / Solaris
/dev/sda3 732 30401 238324275 fd Linux raid autodetect

Disk /dev/sdb: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 1 487 3911796 fd Linux raid autodetect
/dev/sdb2 488 731 1959930 82 Linux swap / Solaris
/dev/sdb3 732 30401 238324275 fd Linux raid autodetect

Disk /dev/md1: 4005 MB, 4005560320 bytes
2 heads, 4 sectors/track, 977920 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md1 doesn't contain a valid partition table

Disk /dev/md3: 244.0 GB, 244043939840 bytes
2 heads, 4 sectors/track, 59581040 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md3 doesn't contain a valid partition table

Disk /dev/dm-0: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/dm-0 doesn't contain a valid partition table

Disk /dev/dm-1: 85.8 GB, 85899345920 bytes
255 heads, 63 sectors/track, 10443 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/dm-1 doesn't contain a valid partition table

Disk /dev/dm-2: 42.9 GB, 42949672960 bytes
255 heads, 63 sectors/track, 5221 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/dm-2 doesn't contain a valid partition table

# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md1 3.7G 1.2G 2.6G 32% /
/dev/mapper/vg00-usr 20G 1.3G 19G 7% /usr
/dev/mapper/vg00-var 80G 25G 56G 31% /var
/dev/mapper/vg00-home
40G 33G 7.2G 83% /home
none 989M 0 989M 0% /tmp

[Perhaps notably, the above is definitely NOT how I would partition and map my disks in the future, but it puts the above md1 md3 formatting into context. Next time, maybe I will just put everything into one md0 mounted on / instead.]

[That's all, folks! Hope this helps!]
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Disk fail in Raid 5 linux box manishrmusale Linux - Newbie 8 08-22-2014 08:29 AM
Retrieve data from an old raid disk to a new disk steve metane Linux - Hardware 1 04-05-2013 09:15 AM
LXer: How To Securely Destroy/Wipe Data On Hard Drives With shred LXer Syndicated Linux News 0 02-22-2012 10:10 AM
SATA RAID disk fail detection henrikost Linux - Hardware 2 09-21-2006 02:28 AM
MacOS 8.6 -- disk detect problem after a disk wipe BinJajer Other *NIX 2 02-05-2006 03:24 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 08:57 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration