Determine what freezes a server

TB0ne · 04-25-2024, 05:06 PM

Quote:

Originally Posted by lenainjaune

There was a freeze on 23/04 but unfortunately someone reboot before a photo was taken. We just warned users to take a photo before restarting. At least as the problem is always there with the cloned system disk this demonstrates that problem is not relative to the disk.

Yes we will ! We are considering putting 2 systems in redundancy and when the first will be unresponsive the second will take the relay ... and the more important, we will managing our telephony ourself with our Asterisk system.

As we managed to make the screen always on (parameter consoleblank=0 on grub configuration), we noticed that a simulated crash with a kernel panic (we followed this method to achieve it), displays also on the login screen.

Is it sufficient to have information before the freeze ?

We also experimented to make a journalctl command running at boot (so before login) in modifying /etc/rc.local to run this detached command journalctl --follow & and in this case the screen is flooded continuously with no pause (however we discovered that Ctrl + s can stop it and permit access to another tty with Ctrl + Alt + F2 or other). This flood is strange because in logon the command is flood-less. We suppose that is not the right way to do it.

Bolded a piece for emphasis only...you have been told this several times now, and that is the ONLY information that can help diagnose this issue. Not sure of the thought process in your diagnostic methods, but after the system freezes...it's ALREADY FROZEN. Quite obviously nothing will get logged after that point. You don't tell us what version/distro of Linux you're using, but a 4.x kernel is pretty old. You can either look in /var/log for a file (messages, syslog, etc...usual suspects) and inspect those, or you can look at "journalctl --list-boots", for a list of the log files and look at an older one with "journalctl -b <whatever number>", which will have the info in it.

Have you checked to see if disk space is running out? You mention asterisk as a PBX...a full disk can also cause problems, especially with long/undeleted voicemails. Regardless, it sounds like you need to actually hire a consultant to come take care of this, based on what you're posting.

lenainjaune · 04-26-2024, 10:15 AM

Quote:

Originally Posted by wpeckham

Information from before the freeze, in particular JUST before the freeze so it is likely to capture the cause, is the ONLY information that might be seriously helpful. AT the freeze logging will stop and you will get no information, and AFTER the freeze is also after the reboot and the cause information may be gone for good.

Ah ! We supposed what was displayed when it freeze will indicates the cause ... So what is the aim of what is displayed ?

How to track the problem before the freeze ?

Are external monitoring tools like Nagios is the only solution, or can we do the same locally (change debug level or audit more deeply different targets since the traditional logs not achieved it)

Quote:

Originally Posted by wpeckham

If I understand correctly:
1. if you move the drive ti a different identical machine that one does freeze.
That would eliminate the original machine hardware EXCEPT the drive.
2. IF cloned to a new drive, it will still freeze. That eliminates the drive itself.

If those are both true, we have eliminated all of the hardware and only a software issue can be left.

What has changed about the software or configuration in the few weeks just before this started?

Yes you have well understood. We supposed from a long time the problem is not about hardware. The only thing we are not really sure is about the RAM. Have we used the in place RAM or have we used the RAM of the computer we moved from ? We will testing it again to ensure we did not miss this step and to definitively eliminate the hardware cause.

The problem about freezing is here from years but until now, we let the problem as is, as it occurred about each 1 or 2 month (before we had a PABX which crashed really often so this discomfort turned out to be more acceptable).

But since a few months the problem became more frequent to reach one freeze by week.

---

We also tried to install netdata to monitor what happens in the system but there is a conflict to install it. For now we abandon this idea and we did not dig further to preserve the system to avoid a bigger problem.

lenainjaune · 04-26-2024, 10:37 AM

Quote: