LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 05-09-2009, 03:13 PM   #1
<Ol>Origy
Member
 
Registered: Aug 2003
Location: Slovenia
Distribution: Arch, Debian, Embedded
Posts: 136

Rep: Reputation: 15
Debian: Strange CPU overheating issues


I am having some strange issues with my debian server box. I've been running Ubuntu (server) on this PC for a long time and without any problems. Recently I've decided to switch to Debian to try out the difference. Some time after the installation of the new linux distro a strange problem began to trouble my server box. Every once in a while the server would become unresponsive to my ssh login requests, and turning on the server monitor would show a nice little kernel panic message:

Quote:
CPU0: Machine Check Exception: 0000000000000004
CPU0: Bank 0: 3200008000000800
CPU0: Bank 3: 3200000000080a01
Kernel panic - not syncing: CPU context corrupt
It was pretty much the same kernel panic each time. I've searched the internet for this type of message and I came over a number of websites claiming that this problem is due to CPU overheating. I leaned back in my chair for a moment, and said hmmmm... The CPU has never before overheated, so what's the chance of it doing that right now? I decided to check anyway. Upon removing the box cover it turned out that the CPU case was indeed extremely hot to the touch. So the kernel panic was indeed caused by overheating, but what could have caused it? My first reaction was that the CPU was clogged with dust and needed cleaning, but that was not the case as it has recently been cleaned. Another option was that the CPU fan may have died, but it wasn't the case either since it was spinning nicely each time I powered on the comp.

Normally when I power on the box, it will not overheat immediately. The CPU case will remain cold to the touch for a long while, sometimes even up to a few days! But at some random point it will begin to heat up. Personally I can't see the temperature with my eyes. I normally open the case several times and feel the CPU case with my hands. It turns out to be cold most of the time and the CPU fan is spinning nicely. I do however notice when the caps-lock light starts flashing on the keyboard, which suggests that the kernel panic has taken place due to overheating.

The server box is an old Pentium 300 MHz with 256MB of ram. It only runs a HTTP server with some other services such as mysql, samba, cups, webmin, and a ssh for remote logins. I suspected that the overheating might be caused by a rogue process taking up 100% of the CPU all the time. So I left a process monitor running on the main terminal, listing currently active processes and their CPU usage. When the panic took place, it froze the screen, leaving the current process list available for me to review. The CPU usage turned out to be almost zero, having the "top" process the highest on the list with 1.3% of CPU usage.

Now here's my dilemma. I have no idea what causes this strange overheating. It has never happened before, it started happening a short time after I installed debian, it doesn't seem to be caused by a rogue process, and the strangest part - it seems to happen in random intervals. Any ideas or suggestions on how to further diagnose the problem?

~Ol
 
Old 05-10-2009, 03:19 PM   #2
davcefai
Member
 
Registered: Dec 2004
Location: Malta
Distribution: Debian Sid
Posts: 863

Rep: Reputation: 45
Funny, I came here with a similar issue.

I have a twin core AMD on an Asus MB. Absolutely trouble free since installing this in September. I am running Debian Unstable, upgraded to KDE4 a couple of weeks ago.

My 500W EZ Cool PSU went snap-crackle-pop on Friday morning and I replaced it with a 700W Storm unit.

A short while ago I was playing Oolite when suddenly the machine rebooted. I got a hot CPU warning and, sure enough the CPU temp as indicated in the BIOS Hardware Monitor was 95 deg C. The heat sink felt cool.

I let the machine cool down for about 15 min and restarted it. Temperature was 77 deg and climbing at about 1 deg every 3 seconds.

I repeated this and observed the same behaviour.

So I tried changing the clock multiplier from 14 to 8. The PC would not boot at all.

I reset the CMOS and the machine came up Ok. The temperature is now dropping.

I don't know if the overheating is "real" but it is seems to be happening outside of Debian. Could some software be zapping the CMOS settings?

Can anybody help?
 
Old 05-11-2009, 12:37 AM   #3
davcefai
Member
 
Registered: Dec 2004
Location: Malta
Distribution: Debian Sid
Posts: 863

Rep: Reputation: 45
VirtualBox?

Are you running VirtualBox?

It is taking one core to 100% for longish periods and the other occasionally. It did not do this previously so maybe it is intracting badly with a recently updated package.

I will try to update VBox.
 
Old 05-11-2009, 07:20 PM   #4
jim80net
LQ Newbie
 
Registered: May 2009
Location: San Antonio, TX
Distribution: Debian
Posts: 15

Rep: Reputation: 1
You know, just cuz your heatsink is cool doesn't mean your cpu is too. I'd check your thermal paste, with older computers, that stuff can get hard and not pass heat well. In addition, your CPU isn't the only thing that generates heat. Your HDD's is a big one, and one that a lot of people overlook is the RAM, RAM is somewhat sensitive to heat, and it can get fairly hot. I'd make sure your chassis cooling is in order too.

to monitor your box's temperature:

$ apt-get install lm-sensors hddtemp
$ sensors-detect
$ sensors
$ hddtemp /dev/{s,h}d*

Last edited by jim80net; 05-11-2009 at 07:23 PM. Reason: include hddtemp and lm-sensors
 
Old 05-11-2009, 11:17 PM   #5
davcefai
Member
 
Registered: Dec 2004
Location: Malta
Distribution: Debian Sid
Posts: 863

Rep: Reputation: 45
Smile Solved, I think.

I think I've nailed the problem.

It occurs when there is heavy disc activity in the Virtual Machine.

What I did was open Task Manager in Windows, System Monitor and KSensors in Linux. The CPU load in Linux tracked that in Windows but with a significant multiplier. At the same time the temperatures rose with CPU activity and rose most in the more active core.

Perfectly obvious I suppose once one twigs what's happening.

First I watched AVG Free scan and update. The temperature in one core reached 89 deg C. After the system cooled down I copied a large directory but had to abort when both cores reached 85 deg C with a long way yet to go.

The system cools off very quickly once the CPU load drops.

Conclusions:

1. The stock AMD cooler cannot cope with a sustained high CPU load.
2. The CPU has to work extremely hard to cope with a high CPU utilisation within Vbox.

One would hope that VBox will improve matters but similar problems seem to have been around for more than 2 years. However I still think that a recent Debian upgrade - don't know which - has exercabated the situation.

Resolution:

I spent some time last night reading CPU Cooler tests. I'm off the Scan to buy an Akasa 967 cooler this evening.

jim80net, thanks for your comment. However my PC is well cooled by normal standards. Large case, 2 chassis fans, PSU with 120mm fan, round IDE cables not to impede air flow and Artic Silver paste on the CPU. it has to run all day in a 35 degree plus ambient in Summer.
 
Old 05-13-2009, 02:07 PM   #6
<Ol>Origy
Member
 
Registered: Aug 2003
Location: Slovenia
Distribution: Arch, Debian, Embedded
Posts: 136

Original Poster
Rep: Reputation: 15
As much as I like seeing other people having their problems fixed, it does not solve my original dilemma. Having tested the CPU usage the second time I am now fairly certain that a rogue process isn't causing the overheating. I will now try to catch the CPU redhanded. That means before the kernel panic shows up, giving me time to do some analysis.
 
Old 05-13-2009, 11:02 PM   #7
davcefai
Member
 
Registered: Dec 2004
Location: Malta
Distribution: Debian Sid
Posts: 863

Rep: Reputation: 45
From solving my problem I think that the cause is a Debian package that has been recently updated and developed this problem. In my case it was disc activity in VirtualBox causing 100% CPU utilisation.

Can you establish whether, in your case you are getting a similar event chain Disc Activity ----> CPU Utilisation ----> Overheating?

That would be a start. If it turns out to be the case then it could be that some normal, disc intensive, process such as an indexing run is triggering the overheating.
 
Old 05-15-2009, 10:18 AM   #8
<Ol>Origy
Member
 
Registered: Aug 2003
Location: Slovenia
Distribution: Arch, Debian, Embedded
Posts: 136

Original Poster
Rep: Reputation: 15
Please note that this system is a very basic debian installation. It does not use virtual box, and it doesn't even have X11 installed. The only way to interact with it is via command line. I normally ssh onto the box from another machine. Suppose there isn't a rogue process that causes 100% CPU use, what other factors could cause the CPU to overheat? I can't think of another. Could it be some rogue kernel module that isn't showing in the "top" process monitor?
 
Old 05-15-2009, 12:03 PM   #9
davcefai
Member
 
Registered: Dec 2004
Location: Malta
Distribution: Debian Sid
Posts: 863

Rep: Reputation: 45
I suggest "stressing" the PC by copying a fairly large chunk of data. Watch the temperature. if it heats up then I would blame a recent update to Debian (or maybe Linux). In your case, it being a recent installation, you would have installed this as part of the distro and would be unlikely to have an update history you can consult.

No matter what, you will be able to either pin the problem on disc activity or eliminate it entirely.

Your method of detecting which process, if any, has a high CPU usage may not be foolproof. It is possible that the CPU intensive process ended just as the CPU reached the critical temperature. Far fetched? Maybe but stranger things happen regularly :-)

Something else: Could the Power Supply be causing the problem? Have you checked whether the air intakes are dust free and the fan is rotating at full speed? When the PC gets hot is the PSU hot too? If it is then I would suspect it.

Have you checked the voltages? Does your setup screen have a hardware monitor? You could install LMsensors. Setup is a bit of a pain but, in my experience, well worth it. You could even set up a cron job to run lmsensors every few minutes. If the PC hangs the last run may give you useful information. Better still, set up a job which runs

Code:
sensors > sensors.txt
top -b -n 1>top.txt
or even, if you have the disc space


Code:
sensors >> sensors.txt
top -b -n 1 >> top.txt
That should give you a record of what happened and you correlate the tasks with temperatures, voltages, fan speeds....

Back to the hardware side, you could try swapping out the CPU cooler and the PSU. Incidentally Arctic Silver on the heat sink can drop the temperature by 2 or 3 degrees C.

I hope this helps.
 
Old 06-25-2009, 06:44 AM   #10
<Ol>Origy
Member
 
Registered: Aug 2003
Location: Slovenia
Distribution: Arch, Debian, Embedded
Posts: 136

Original Poster
Rep: Reputation: 15
Okay, I've found what the problem was. The rkhunter was causing the overheating behaviour for some reason. After uninstallation the problems are gone.
 
Old 06-25-2009, 11:56 PM   #11
davcefai
Member
 
Registered: Dec 2004
Location: Malta
Distribution: Debian Sid
Posts: 863

Rep: Reputation: 45
Rkhunter was probably stressing the PC while looking for problems.

Since this thread started I have upgraded to an aftermarket CPU cooler and the difference this has made is tremendous. I think you may find that the problem is only solved until you next get a hyperactive prohram.
 
  


Reply

Tags
context, corrupt, cpu, hddtemp, lmsensors, overheating



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
CPU Overheat Warning when opening terminal (CPU is not overheating) Virtuality Linux - Newbie 4 05-30-2007 04:10 AM
Overheating/CPU temp issues jswhite Linux - Laptop and Netbook 9 11-22-2005 07:31 AM
CPU/Heatsink overheating SticklerThe1st Linux - Hardware 9 12-05-2004 07:34 PM
CPU overheating? bdika Linux - Hardware 8 11-21-2004 09:07 PM
suse 9.1 -- cpu overheating MrFubar Linux - Software 5 07-17-2004 01:25 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 06:10 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration