Weird issue! More than 100 servers are going down if they reboot!
Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Weird issue! More than 100 servers are going down if they reboot!
Hi,
First, my Linux knowledge is basic.
I have around 100 servers with OVH which they are installed with CentOS 6.x and Most of them CentOS 7.x and KVM virtualization. Recently I have found out if any of them reboots they go into 'Kernel Panic' (That is what the support said) and does not comes up. Unfortunately, I do not have KVM over IP access but from the Debian Rescue mode, I could not locate any relevant logs.
I have looked at the logs below but did not find anything relevant.
Here are some of the last logs
/var/log/messages
Code:
Sep 29 06:58:58 server01 named[1584]: validating @0x7f51f80597f0: x SOA: no valid signature found
Sep 29 06:58:58 server01 named[1584]: validating @0x7f51e40028e0: 1x NSEC: no valid signature found
Sep 29 06:58:58 server01 named[1584]: error (network unreachable) resolving 'xA/IN': 2001:41d0:1:1982::1#53
Sep 29 06:58:58 server01 named[1584]: error (network unreachable) resolving 'x': 2001:41d0:1:4a81::1#53
Sep 29 07:00:45 server01 init: tty (/dev/tty1) main process (2495) killed by TERM signal
Sep 29 07:00:45 server01 init: tty (/dev/tty2) main process (2497) killed by TERM signal
Sep 29 07:00:45 server01 init: tty (/dev/tty3) main process (2499) killed by TERM signal
Sep 29 07:00:45 server01 init: tty (/dev/tty4) main process (2501) killed by TERM signal
Sep 29 07:00:45 server01 init: tty (/dev/tty5) main process (2503) killed by TERM signal
Sep 29 07:00:45 server01 init: tty (/dev/tty6) main process (2505) killed by TERM signal
Sep 29 07:00:54 server01 ntpd[26063]: ntpd exiting on signal 15
Code:
root@rescue:/mnt/var/log# tail dmesg
parport0: PC-style at 0x378, irq 5 [PCSPP]
ppdev: user-space parallel port driver
tun: Universal TUN/TAP device driver, 1.6
tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
EXT3-fs (sda2): using internal journal
kjournald starting. Commit interval 5 seconds
EXT3-fs (sda1): using internal journal
EXT3-fs (sda1): mounted filesystem with ordered data mode
EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts:
Adding 4193276k swap on /dev/sda3. Priority:-1 extents:1 across:4193276k SS
Code:
root@rescue:/mnt/var/log# tail boot.log
Starting ksmtuned: [ OK ]
Starting crond: [ OK ]
Starting atd: [ OK ]
Starting libvirtd daemon: [ OK ]
-
Starting php-fpm: Done...
Starting nginx: Done...
Starting MySQL.. SUCCESS!
RTNETLINK answers: No such process
RTNETLINK answers: No such file or directory
I could not find anything on the Internet. Any suggestion would be helpful.
The log displayed by dmesg is volatile; it's held in a ring buffer in kernel memory until a process like klogd writes it to disk. All dmesg will tell you is what happened since last reboot. The interesting stuff should be in the logs, but it's entirely possible that whatever causes the kernel panic also prevents the system from dumping anything related to the issue to disk.
I'm not familiar with OVH, but perhaps it would be possible to set up a virtual serial port? Perhaps one that connects to another VM? If so, you could redirect the console to the serial port and capture the actual panic screen.
Do the systems panic when shutting down as well, or does it only happen when you try to reboot?
During troubleshooting, you might want to add "panic=s" (where s is a number of seconds) to the kernel command line. It causes the system to reboot automatically after s seconds in case of a kernel panic, and I figure since you can't see the panic screen anyway, the system might as well just reboot.
The log displayed by dmesg is volatile; it's held in a ring buffer in kernel memory until a process like klogd writes it to disk. All dmesg will tell you is what happened since last reboot. The interesting stuff should be in the logs, but it's entirely possible that whatever causes the kernel panic also prevents the system from dumping anything related to the issue to disk.
I'm not familiar with OVH, but perhaps it would be possible to set up a virtual serial port? Perhaps one that connects to another VM? If so, you could redirect the console to the serial port and capture the actual panic screen.
Do the systems panic when shutting down as well, or does it only happen when you try to reboot?
During troubleshooting, you might want to add "panic=s" (where s is a number of seconds) to the kernel command line. It causes the system to reboot automatically after s seconds in case of a kernel panic, and I figure since you can't see the panic screen anyway, the system might as well just reboot.
Thank you, I will ask their support about virtual serial port. They do offer KVM over IP but it is $30 - $40 per day. Not a problem with that but I would like to get a better idea before I go that path.
Also, I am wondering if there is a section here to hire a Linux expert for that matter (I could not find any section for that) or anywhere I can do that?
Without decent messages, we are more blind than you are. I don't use hosting but if a headless box won't boot, I'm dead in the water until I connect screen and keyboard, so for you that KVM is essential.
If everything is falling over on boot, that sounds like an infrastructure problem. Have you rebooted these images successfully before ?. What did you change ?. If nothing you did, chase them for any changes they did.
Without decent messages, we are more blind than you are. I don't use hosting but if a headless box won't boot, I'm dead in the water until I connect screen and keyboard, so for you that KVM is essential.
If everything is falling over on boot, that sounds like an infrastructure problem. Have you rebooted these images successfully before ?. What did you change ?. If nothing you did, chase them for any changes they did.
You were right.
Finally, OVH gave a KVM access (after paid 2 days ago -- I do not know what happens if something urgent happens!! That is why I have planned to move to a different provider at least with basic support and IPMI access) and found out the engineer which left a month ago messed with all of the /etc/fstab with Anisable That is why it was preventing from booting which I have found the problem and now have to plan somehow to fix it on all the servers.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.