[SOLVED] Non-bootable hard drive

mfoley · 02-06-2024, 11:14 PM

I ran out of space on my computer, so I installed a new 2TB Western Digital "Red" drive, partitioned, formatted, restored from backup, did the "mount --bind", chroot, lilo ... but it wouldn't boot. I tried an older backup, same thing. I even scratch installed from the Slackware 15.0 .iso, then again from 14.2 letting the setup allocate swap space, root partition, install lilo, MBR, etc. Nada. It always just hung after the BIOS splash screen.

Finally, after many hours wasted, I went to the store and bought a new 2TB Western Digital "Blue" drive (cheaper than RED), did the partition, format, restore and voila!, it booted just fine, less than 1 hour, mostly waiting for the restore.

Any idea what happened to the "Red" drive? Could I have done something? I seem to be able to read/write this drive just fine, but no booting. I suppose I could use it in a nn-booting RAID, but I hate to hang onto a drive I might accidentally forget about and stick into a new computer as the boot drive.

What if I 'dd' zeros to the starting sectors?

Other (better) ideas?

mrmazda · 02-07-2024, 12:28 AM

Might be a good idea to show us output from fdisk -l or parted -l and lsblk -f from both drives.

I suspect something simply went wrong setting up the Red. You could use rescue or install media to full disk clone the Blue to the Red, then remove the Blue and reinstall the Red to find out.

I'd much rather be depending on the Red than the Blue. My experience with WD Blue is worsted only by WD Green.

remmilou · 02-07-2024, 02:25 AM

Did you do a SMART check?
dd with zeros is also good to try. Might fail on the first sector, because of hardware error(s).

Agree with mrmazda about the quality of the red vs blue and green.
I worked with over 1000 different HD's. Seagate was even shittier. A lot of them died after a couple of years being not used.

lvm_ · 02-07-2024, 03:20 AM

There is no such thing as non-booting drive per se, yet some things could make it unbootable for instance I was once baffled by a root fs missing during boot but annoyingly and mysteriously accessible and healthy when booted from other media - turned out RAID superblock takes precedence over partition table when dmraind module is loaded, and this disk was once used as RAID volume. Or it could be hardware incompatibility e.g. the power disable pin.

jefro · 02-07-2024, 02:45 PM

I get the feeling that some reference to the boot device is wrong. Either in bios or grub or uefi.

Arnulf · 02-07-2024, 04:43 PM

If you need any data from the "WD Red" backup these data.
Unplug all HDDs & SSDs except the "WD Red".
Start a live linux into console. Don't boot into X & GUI!
Check SMART values of the "WD Red" with smartctl -a /dev/sda. If values Reallocated_Sector_Ct, Reallocated_Event_Count or Current_Pending_Sector show values > 0 replace this HDD.
Run badblocks -wsv /dev/sda. This may take a while, about one day for a 2 TB HDD. All data on the HDD will be wiped. The HDD will be clean for a fresh install without any remains from previous use. If badblocks doesn't end with 0 badblocks found replace this HDD.
Check SMART values of the "WD Red" again with smartctl -a /dev/sda. If values Reallocated_Sector_Ct, Reallocated_Event_Count or Current_Pending_Sector show values > 0 replace this HDD.

If errors occur this HDD shouldn't longer be used in a computer for daily use or handling critical data. Depending on detected errors, this HDD may be suitable for a museum computer without handling any critical data.

mfoley · 02-11-2024, 11:29 PM

I have a spare computer handy and will try Arnulf's suggestion. I'll post back.

colorpurple21859 · 02-12-2024, 05:35 AM

Maybe the non-booting drive has a gpt partition table, and the booting drive is msdos

mfoley · 02-12-2024, 11:57 PM

Quote:

Originally Posted by colorpurple21859

Maybe the non-booting drive has a gpt partition table, and the booting drive is msdos

The non-booting drive was new out of a sealed box and had no partition table. I created the table, not as gpt. Which is not to say I didn't mess something up.

The booting (blue) drive was also new, out of the box and, as far as I know, I created the exact same partitions on it as on the "Red" drive. Quite simple, an 8G swap partition and the rest ext4:

Code:

# fdisk -l /dev/sda
Disk /dev/sda: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x07ea64e1

Device     Boot    Start        End    Sectors  Size Id Type
/dev/sda1           2048   16779263   16777216    8G 82 Linux swap
/dev/sda2  *    16779264 3907029167 3890249904  1.8T 83 Linux

The "Red" drive is back in its box, but when I take it out to try the suggestions in this thread, I'll check the partition table.

yancek · 02-13-2024, 07:44 AM

Quote:

Maybe the non-booting drive has a gpt partition table, and the booting drive is msdos

And maybe the reverse is true as your fdisk output shows a dos disklabel. Is that the new 'blue' drive? Is this drive the only drive you are now using? Do you have the old drive still attached, the one on which you ran out of space? Did you ever investigate what that problem was? Did you get warning messages? If so, what were they? Did you check /var/log for old log files and delete them? Did you use the find command to find large files which you no longer needed/wanted? Did you have multiple partitions on the 'old' drive? Did you have a separate 'home' or 'data' partition? How large are/were these partitions? Posting the output of df -h would have shown the partition sizes and used/free space on partitions. When you attempted to install Slackware on either of the new drives, was the 'old' drive still attached or was the 'new' drive the only drive attached when attempting with both the 'red' and 'blue' drives? Was your earlier install a Legacy/CSM install or was it EFI?

mfoley · 02-13-2024, 11:27 PM

Quote:

Originally Posted by yancek

And maybe the reverse is true as your fdisk output shows a dos disklabel.

Wow! lots of questions. Interesting that you point out the disk label as "dos". I didn't notice that. This was a new drive. All I did was partition as shown and format ext4. I did not do e2label, so that "dos" label must have been as-shipped, or a default when partitioning a drive with no partitions.

Quote:

Is that the new 'blue' drive? Is this drive the only drive you are now using?

Yes and yes.

Quote:

Do you have the old drive still attached, the one on which you ran out of space?

No, but I have it and could if there were a reason to.

Quote:

Did you ever investigate what that problem was? Did you get warning messages? If so, what were they? Did you check /var/log for old log files and delete them? Did you use the find command to find large files which you no longer needed/wanted? Did you have multiple partitions on the 'old' drive? Did you have a separate 'home' or 'data' partition? How large are/were these partitions? Posting the output of df -h would have shown the partition sizes and used/free space on partitions.

Partition on the "too-small" drive was the same: 8G swap, the rest ext4. No multiple partitions. The problem was simple. I was storing backups for the drive and backups of the VirtualMachine on that drive. It just ran out of space, 100% used. The VM backups are rather large. I had the choice of not backing up to that drive or getting a bigger drive. Since I want to keep a month's worth of full/differential backups on the drive (yes, I also backup to external), my choice was to install a bigger drive. Usually a simple process.

Quote:

When you attempted to install Slackware on either of the new drives, was the 'old' drive still attached or was the 'new' drive the only drive attached when attempting with both the 'red' and 'blue' drives?

With both new drives the old "too-small" drive was attached. First, I booted from ISO, then partitioned new, then formatted, then mounted both drives with old as -ro, then restored from the backup on "old" to new, and did the chroot/lilo thing. Booting worked on "blue", not on "red".

Furthermore, before buying "blue" I tried scratch-installing on "red" Slackware 15.0 and, failing that 14.2, allowing the setup to format the partition. Of course I did have to partition before the setup. I don't remember if I kept the partitions from the backup attempt, or deleted and recreated the partition before attempting the scratch install.

Quote:

Was your earlier install a Legacy/CSM install or was it EFI?

Not EFI. I wasn't keen on EFI generally, and have only been messing with it the past 6 months. This machine's "old" drive is probably 5 years old -- wasn't doing EFI then.

The restore from backup to "blue" worked as expected.

I will be experimenting with this thread's suggestions tomorrow.

yancek · 02-14-2024, 06:48 AM

I would be very surprised if either new disk you purchased came with a 'dos' disklabel as mostly they have been gpt for several years. If you had run fdisk when the drives were new you would have known. e2label won't change that. You could change it using parted/gparted or similar partition manager by creating a new partition table, usually gpt or dos but there are others.

Quote:

was storing backups for the drive and backups of the VirtualMachine on that drive.

That's a copy not a backup. For an actual useful backup, it needs to be on a separate drive as the primary reason for having backups is hard drive failure. If you want multiple backups, you need multiple drives.

I'm guessing that the problem could have been the red drive was gpt and if you simply copied your old system there from a dos drive and the new drive was gpt, it would not boot as you need a bios_boot partition for an old Legacy/CSM install on a gpt drive.

You haven't indicated the size of the older/original drive but, would it not have been simpler to just copy your backups from the original OS drive to the new drive? You could also have use the find command to locate large files you may not want/need any longer or perhaps gone to the /var/log directory and delete old log files.

mfoley · 02-20-2024, 12:06 AM

Quote:

Originally Posted by yancek

I would be very surprised if either new disk you purchased came with a 'dos' disklabel as mostly they have been gpt for several years. If you had run fdisk when the drives were new you would have known. e2label won't change that. You could change it using parted/gparted or similar partition manager by creating a new partition table, usually gpt or dos but there are others.

I've checked the "dos" label issue. All my Slackware computers with the boot drive formatted for MBR have the "dos" label. Apparently, the Slackware (14.2) install setup does this during the partition identification and formatting step.

As arnulf suggested, I ran 'smartctl -t long' on this drive, then 'smartctl -a'. All tests passed, zero raw values in Reallocated_Sector_Ct, Reallocated_Event_Count and Current_Pending_Sector. I then did 'badblocks -wsv /dev/sda'. That did take a day to run! I then re-partitioned, restored the backup and it booted!

I can't say what went wrong with my initial attempt. I did not format the drive to GPT (proven by the "dos" label) and I doubt it came that way from the store, but I didn't check. The "blue" drive had no problem out of the box and I would be surprised if "red" came GPT and "blue" did not.

Anyway, whatever the initial cause, the 'badblocks' probably fixed it. I now have formatted it GPT and will set up UEFI.