LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 09-30-2018, 05:26 AM   #1
Yakooza
LQ Newbie
 
Registered: Sep 2018
Posts: 25

Rep: Reputation: Disabled
Unhappy Weird issue! More than 100 servers are going down if they reboot!


Hi,

First, my Linux knowledge is basic.

I have around 100 servers with OVH which they are installed with CentOS 6.x and Most of them CentOS 7.x and KVM virtualization. Recently I have found out if any of them reboots they go into 'Kernel Panic' (That is what the support said) and does not comes up. Unfortunately, I do not have KVM over IP access but from the Debian Rescue mode, I could not locate any relevant logs.

I have looked at the logs below but did not find anything relevant.
Here are some of the last logs

/var/log/messages

Code:
Sep 29 06:58:58 server01 named[1584]:   validating @0x7f51f80597f0: x SOA: no valid signature found
Sep 29 06:58:58 server01 named[1584]:   validating @0x7f51e40028e0: 1x NSEC: no valid signature found
Sep 29 06:58:58 server01 named[1584]: error (network unreachable) resolving 'xA/IN': 2001:41d0:1:1982::1#53
Sep 29 06:58:58 server01 named[1584]: error (network unreachable) resolving 'x': 2001:41d0:1:4a81::1#53
Sep 29 07:00:45 server01 init: tty (/dev/tty1) main process (2495) killed by TERM signal
Sep 29 07:00:45 server01 init: tty (/dev/tty2) main process (2497) killed by TERM signal
Sep 29 07:00:45 server01 init: tty (/dev/tty3) main process (2499) killed by TERM signal
Sep 29 07:00:45 server01 init: tty (/dev/tty4) main process (2501) killed by TERM signal
Sep 29 07:00:45 server01 init: tty (/dev/tty5) main process (2503) killed by TERM signal
Sep 29 07:00:45 server01 init: tty (/dev/tty6) main process (2505) killed by TERM signal
Sep 29 07:00:54 server01 ntpd[26063]: ntpd exiting on signal 15
Code:
root@rescue:/mnt/var/log# tail dmesg
parport0: PC-style at 0x378, irq 5 [PCSPP]
ppdev: user-space parallel port driver
tun: Universal TUN/TAP device driver, 1.6
tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
EXT3-fs (sda2): using internal journal
kjournald starting.  Commit interval 5 seconds
EXT3-fs (sda1): using internal journal
EXT3-fs (sda1): mounted filesystem with ordered data mode
EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts:
Adding 4193276k swap on /dev/sda3.  Priority:-1 extents:1 across:4193276k SS

Code:
root@rescue:/mnt/var/log# tail boot.log
Starting ksmtuned:                                         [  OK  ]
Starting crond:                                            [  OK  ]
Starting atd:                                              [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]
-
Starting php-fpm: Done...
Starting nginx: Done...
Starting MySQL.. SUCCESS!
RTNETLINK answers: No such process
RTNETLINK answers: No such file or directory
I could not find anything on the Internet. Any suggestion would be helpful.

Also, they are different configuration servers.

Thank you.
 
Old 09-30-2018, 06:01 PM   #2
Ser Olmy
Senior Member
 
Registered: Jan 2012
Distribution: Slackware
Posts: 3,347

Rep: Reputation: Disabled
The log displayed by dmesg is volatile; it's held in a ring buffer in kernel memory until a process like klogd writes it to disk. All dmesg will tell you is what happened since last reboot. The interesting stuff should be in the logs, but it's entirely possible that whatever causes the kernel panic also prevents the system from dumping anything related to the issue to disk.

I'm not familiar with OVH, but perhaps it would be possible to set up a virtual serial port? Perhaps one that connects to another VM? If so, you could redirect the console to the serial port and capture the actual panic screen.

Do the systems panic when shutting down as well, or does it only happen when you try to reboot?

During troubleshooting, you might want to add "panic=s" (where s is a number of seconds) to the kernel command line. It causes the system to reboot automatically after s seconds in case of a kernel panic, and I figure since you can't see the panic screen anyway, the system might as well just reboot.
 
1 members found this post helpful.
Old 09-30-2018, 06:07 PM   #3
Yakooza
LQ Newbie
 
Registered: Sep 2018
Posts: 25

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Ser Olmy View Post
The log displayed by dmesg is volatile; it's held in a ring buffer in kernel memory until a process like klogd writes it to disk. All dmesg will tell you is what happened since last reboot. The interesting stuff should be in the logs, but it's entirely possible that whatever causes the kernel panic also prevents the system from dumping anything related to the issue to disk.

I'm not familiar with OVH, but perhaps it would be possible to set up a virtual serial port? Perhaps one that connects to another VM? If so, you could redirect the console to the serial port and capture the actual panic screen.

Do the systems panic when shutting down as well, or does it only happen when you try to reboot?

During troubleshooting, you might want to add "panic=s" (where s is a number of seconds) to the kernel command line. It causes the system to reboot automatically after s seconds in case of a kernel panic, and I figure since you can't see the panic screen anyway, the system might as well just reboot.
Thank you, I will ask their support about virtual serial port. They do offer KVM over IP but it is $30 - $40 per day. Not a problem with that but I would like to get a better idea before I go that path.

Also, I am wondering if there is a section here to hire a Linux expert for that matter (I could not find any section for that) or anywhere I can do that?

Thanks
 
Old 10-02-2018, 11:29 PM   #4
Yakooza
LQ Newbie
 
Registered: Sep 2018
Posts: 25

Original Poster
Rep: Reputation: Disabled
Any suggestion to find out what is going on? I have requested KVM for last 2 days. Still nothing Their cheap brand Soyoustart is not good!
 
Old 10-03-2018, 01:18 AM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,151

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
Without decent messages, we are more blind than you are. I don't use hosting but if a headless box won't boot, I'm dead in the water until I connect screen and keyboard, so for you that KVM is essential.
If everything is falling over on boot, that sounds like an infrastructure problem. Have you rebooted these images successfully before ?. What did you change ?. If nothing you did, chase them for any changes they did.
 
1 members found this post helpful.
Old 10-03-2018, 05:53 PM   #6
Yakooza
LQ Newbie
 
Registered: Sep 2018
Posts: 25

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by syg00 View Post
Without decent messages, we are more blind than you are. I don't use hosting but if a headless box won't boot, I'm dead in the water until I connect screen and keyboard, so for you that KVM is essential.
If everything is falling over on boot, that sounds like an infrastructure problem. Have you rebooted these images successfully before ?. What did you change ?. If nothing you did, chase them for any changes they did.
You were right.
Finally, OVH gave a KVM access (after paid 2 days ago -- I do not know what happens if something urgent happens!! That is why I have planned to move to a different provider at least with basic support and IPMI access) and found out the engineer which left a month ago messed with all of the /etc/fstab with Anisable That is why it was preventing from booting which I have found the problem and now have to plan somehow to fix it on all the servers.

Thanks
 
  


Reply

Tags
kernel panic



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How can i add a user in 100 servers when there is no centralization James0806 Linux - Newbie 1 07-09-2014 09:36 AM
password less ssh connection for more than 100 Linux servers deep27ak Linux - Server 17 02-01-2014 01:11 PM
Script for Capacity Performance Monitoring for 100+ servers balu_therock Linux - Newbie 8 10-19-2013 06:21 AM
Multiple Reverse SSH Tunnels for 100+ servers exactiv Linux - Networking 2 11-23-2011 02:55 PM
Can't access Windows. Weird, weird grub issue. MightyHard Linux - General 2 12-31-2008 04:35 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 03:06 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration