LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 11-14-2005, 08:22 PM   #1
tinksmartbstupi
Member
 
Registered: Dec 2004
Location: Coram, NY
Distribution: Slackware
Posts: 47

Rep: Reputation: 15
Kernel Panic, Machine Check exception


I just finished installing slackware 10, and when I downloaded any newer kernel, and compile it (no matter how I compile it) I always end up getting a Machine Check Exception, and the kernel panics

What is it and how do I fix it?

(2.6.11.1, 2.6.14)
 
Old 11-15-2005, 07:44 AM   #2
bejiita
Member
 
Registered: Feb 2004
Location: Upstate NY
Distribution: Slackware
Posts: 79

Rep: Reputation: 15
what happens when you pass nomce at boot time ?
 
Old 11-15-2005, 08:26 AM   #3
runlevel0
Member
 
Registered: Mar 2005
Location: Hilversum/Holland
Distribution: Debian GNU/Linux 5.0 (“Lenny”)
Posts: 290

Rep: Reputation: 31
Bad News:
MCE's are always hardware related errors.

These exceptions are triggered when the processor finds hardware malfunctions such as TLB, bus or other unrecoverable hardware failures.

They can be caused by b0rked motherboard or gfx-card components, but most frequently are related to some of the below causes:
  1. Bad RAM modules
  2. Overstressed or deficient power supply
  3. Improperly configured components
  4. Extreme thermal conditions

You can check point 1 using memtest, a utility which runs from a liveCD. Most modern distro's LiveCD has this option. Try Knoppix. Also test if the modules are properly attached to the mainboard. The definitive solution is obviously replacing the modules.

Regarding to point 2; check that your power supply runs smoothly w/o noises or vibrations, check the connections and try to ensure that the amount of power used by your devices aren't higher as the nominal power of the supply. If you don't feel like doing the math, try disconnecting devices, such as CDROM/DVDs, USB devices, etc. A solution is getting a more powerful supply.

Point 3 includes overclocking of the bus or the CPU. I can't stat if it also affects GPU but I'm almost sure it does. Set your components to the vendor rated settings.

Point 4 is mostly related to cooling device malfunction, check the fans and replace the ones which doesn't behave properly (vibration, excess noise or simply not running at all). You can also try to boot after letting the system rest for a time so that it cools and check the temperature using your OS's sensor software.

I unfortunately don't know the exact translation of the MCE codes, perhaps they are documented in the specs of your processor, but IMHO the above checklist will be enough to find the culprit.




Last edited by runlevel0; 11-15-2005 at 08:42 AM.
 
Old 11-15-2005, 08:45 AM   #4
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,691
Blog Entries: 4

Rep: Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947
Another very subtle cause of system problems is insufficient or unstable power supply. Computer systems rely upon low-voltage DC circuits and if the line-voltage coming into the box is not exactly "on spec," weird and un-reproducible problems can occur. I first encountered this when a new office photocopier was installed on the wrong circuit.

The solution is to buy and install a UPS (Uninterruptible Power Supply) box. Even a very small one will do just fine. These boxes combine a surge-protector element with a battery, which allows them to fill-in for undervoltage .. and beep to warn you that it's happening.

However... in your case I would expect that the first thing to do is to have the motherboard and equipment diagnosed for possible problems. Make sure that all of the cards, including RAM cards, are firmly seated in their sockets.
 
Old 11-15-2005, 09:28 PM   #5
tinksmartbstupi
Member
 
Registered: Dec 2004
Location: Coram, NY
Distribution: Slackware
Posts: 47

Original Poster
Rep: Reputation: 15
Thanks, after searching around I found that with my laptop you have to pass nomce to the kernel at boot...

I haven't tried it yet but I'll let you know when I do

and I don't understand why, my laptop is brand new, Only thing I can think of is something AMD did, similar to the duron chips where they just cut L1 and L2 pins and sold them for cheaper.

*shrugs* oh well.
 
Old 11-16-2005, 03:18 PM   #6
runlevel0
Member
 
Registered: Mar 2005
Location: Hilversum/Holland
Distribution: Debian GNU/Linux 5.0 (“Lenny”)
Posts: 290

Rep: Reputation: 31
Question

Quote:
Originally posted by tinksmartbstupi
Thanks, after searching around I found that with my laptop you have to pass nomce to the kernel at boot...

I haven't tried it yet but I'll let you know when I do

and I don't understand why, my laptop is brand new, Only thing I can think of is something AMD did, similar to the duron chips where they just cut L1 and L2 pins and sold them for cheaper.

*shrugs* oh well.
Cool, you could write a hardware review so that others can avoid the problem.

I was almost sure that MCE was Intel specific, but in the kernel tree if you disable MCE you will also disable some AMD thermal control features. I don't know if disabling this would cause any problem, as all this stuff is also handled by the ACPI and also (in the mobile CPUs) by AMDs k7-powernow! extensions.

As I can see my Duron supports both, MCE and MCA (Pentium Pro specific).
So, now I'm really confused.

Any expert in the house to resolve this mistery?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Machine Check Exception 000000000000004 AND CPU context corrupt RCbeta Linux - Hardware 1 10-08-2005 01:58 PM
Machine Check Exception 0000000000000004 pbs Linux - Software 7 06-26-2005 12:33 PM
kernel: CPU 0: Machine Check Exception: 0000000000000004 Toadman Linux - General 4 05-27-2005 10:52 PM
kernel:CPU0:machine check exception:0000000000000004 madhabendra Red Hat 0 06-10-2004 11:49 PM
CPU#0:Machine Check Exception karamboul Linux - Software 1 03-29-2002 10:33 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 09:03 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration