LinuxQuestions.org - Fedora 34 - degraded RAID not starting at boot.

- Fedora (https://www.linuxquestions.org/questions/fedora-35/)

- - Fedora 34 - degraded RAID not starting at boot. (https://www.linuxquestions.org/questions/fedora-35/fedora-34-degraded-raid-not-starting-at-boot-4175701422/)

Fedora 34 - degraded RAID not starting at boot.

Hello to everyone.

This one has me scratching my head a bit, I have seen similar questions on searching the forum but none (so far as I can see) exactly describing the situation I found myself in.

I have a media/mail server running F34, generally reliably.

It hosts various filesystems over 11 mostly 8TB disks but the main media tree is a 15TB ext4 consisting of two RAID1 arrays striped using LVM. Quite which it was set up that way is lost in the mists of time (probably I thought it would be flexible to add storage, and it would be, but it needs pairs of PVs to continue striping so in practice I'd need to add 4 disks and there isn't room in the case, but I digress).

I wanted to migrate one of the disks to an SSD - in fact over time I'd like all of them to migrate to SSD but that's an expensive undertaking so I bought a single Samsung 870 QVO 8TB to play with/kick off the process. Yes, I know it's QLC.

Here starts the pain. I assumed I could just power down, remove one of the RAID disks, replace it with the SSD, power back up (the array at that point running on a single device) and add the new disk back to the array after partitioning the drive.

But, no, the array did not start, so one of the PV's was missing so the logical volume, in turn, was missing and systemd grumbled loudly about that dropping me to the single user shell on the console.

No worries I thought, I'll do it manually - but as soon as I tried to start the array then I lost keyboard input (except ctrl-alt-del still worked to re-boot the machine).

Perplexed I powered down and put the old drive back in - the system started fine and I manually failed and removed the drive I wanted to replace assuming this would fix MD's expectations on reboot.

But, nope, the system would still not start the array.

Finally I managed by removing the volume and dependent mounts from fstab, rebooting, manually starting the array from a normal console session - this time, running multi-user, it did not kill the keyboard input when I started the array and finally I could add in the new drive.

I am however puzzled
- why would Fedora not start the array, it is a mirrored pair so no reason not to run degraded with one drive - indeed that's the whole raison d'etre of RAID so not doing so seems a bit of a fail.

- what the heck was the business with "mdadm --run /dev/mdXXX" killing the single user console keyboard input?

Anyone got any suggestions?

Some questions: Are your arrays on partitions or whole disk? Is the boot drive RAID? What's in mdadm.conf? Did you rebuild initramfs after creating the RAID? Did you check the state of the drives and raid with --examine and --detail?

Since the system has no idea what is going on when powered down, if you have a hot-plug enclosure it's actually better to swap drives while running. That way the RAID driver can detect the loss of drive and mark it failed in the metadata on the other drives. As it is, it is expecting the other drive to appear and finish building the RAID. RAID assembly at boot time is incremental and the RAID isn't put online until it is finished being assembled. Large storage systems don't start all drives at once, so md can't assume that all drives will be present at the same time.

Not sure what happened with the keyboard input being blocked. I assume the kernel was stuck attempting some operation. I/O timeouts are normally 30 seconds and various subsystems may attempt retries, so I would wait at least 5 minutes before calling it dead.

No the boot isn't RAID (long story; it should be) but would not be that RAID array if it were. The arrays are on partitions, not that I think that would make a difference.

mdadm.conf is empty (in fact not present) - everything autoconfigures. Maybe one is generated for initramfs, I'll check.

Did I rebuild initramfs - no, and I am wondering if this would have helped. But you can't always predict when this is going to be needed or when your disks are going to fail so I am not at all clear this should be necessary. At some point I will be transitioning the other disks in the array so I can see if it makes a difference. Or I will put together a test system and see if it happens with a fresh install.

Hot swapping drives - yes, but the only truly safe way to pull a drive from a running system is if the hardware actually allows the power to be removed to the drive first.

Consumer grade NAS enclosures rarely allow this (certainly the one I'm using which is a U-NAS case does not). I don't care how many people think it should work, or how many people have done it and got away with it I have killed more than one drive and one SATA interface trying so now I just go with my instincts that hot-swap with power applied is not a good idea.

Interestingly the array seems to have started, and then maybe been shut down - I can see the "continuing with 1 drive" message in the boot log, but it wasn't active by the time LVM got going.

I/O timeouts, yes, maybe but starting the array succeeded and I'd expect characters to still be echoed at the console even if the mdadm command were waiting for something before exiting.

Boot on RAID isn't great on typical home systems, because the BIOS won't try to boot from the second drive if the first one is present and failed. This is considered a server feature. If you aren't booting from RAID then rebuilding initramfs may not be required.

arrays should always be on partitions so the disks are labeled.

Not having mdadm.conf means you are missing some things like consistent array names, and a program to run when a monitored disk fails, but auto-assembly should still work. The one thing I'm wondering about with no arrays configured is homehost. By default it won't assemble an array for a foreign host, so correct hostname needs to be available at assembly.

The continuing with 1 drive message is interesting. I don't know why it would shut down the array once it was started unless a surviving disk got an error. I suppose it was this message: https://elixir.bootlin.com/linux/v5..../raid1.c#L1629

Taking your points in turn:

It should be possible to have grub installed on both drives in the array and two BIOS boot disks (main and "backup") configured, as long as the disk fails completely and isn't enumerated by the BIOS that should be OK - but even if not most modern BIOSes have a boot menu so the good disk can be manually chosen. I'd stick to RAID1 for boot though - RAID5 or 6 is asking for problems (plus you really have to duplicate the grub installation on every device in the array which is a pain). I'm not clear even server boards always handle "present but failed" elegantly (if it boots, it will probably be with lots of timeouts).

Yes, I agree array components should be on partitions but, as I said, mine are and I don't think it would matter in this case if they weren't

I'm not convinced that mdadm.conf is necessary or even desirable with modern systems - components should be found by UUID anyway, not block device name. Besides the block number is available in the component info area.

Yes, that looks like the source of the kernel message.

I need to go and grub around (sorry) for some old drives and hook them up to the test rig - I think I have some 320G drives in the shed though I had a clear-out a while ago and sold everything that still had enough capacity to be worth the effort of listing on eBay.

Was cleaning out an office and found a brand new, unopened, 1GB drive I can let you have cheap. Make an offer.

I'm good, thanks :)

PS 1GB eh, when was that bought? - 1990?

Well, one fried motherboard later (don't ask) and I can't reproduce the problem in a fresh installation of FC34 which means it must be specific to my multiply renewed setup (cruft has accumulated, it would seem).

I can't *quite* replicate the setup as I can only find two working HDDs but I got close with two pairs of RAID1 array striped into one logical volume with LVM, but pulling the power from one disk does not cause the system to throw its toys out of the pram.

Hmmm.