LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 09-21-2006, 09:41 AM   #1
humbletech99
Member
 
Registered: Jun 2005
Posts: 374

Rep: Reputation: 30
Machine Check Exception on new Opteron server


I set up a new amd64 gentoo server yesterday on an opteron but within a few hours of it being up I got a "Machine Check Exception" and the thing froze up. I had to go to the the local console to see this and then had to hard reboot the machine. It wasn't really doing much at the time other than compiling a couple of things. The server is a dual-cpu dual-core machine (4 cores that is) with 8GB ram and 12 SCSI disks + 2 satas for OS.

The error from the console is below:

Code:
HARDWARE ERROR
CPU 2: Machine Check Exception:                                    4 Bank 4:  f615200133000813
TSC 5ac60e50b6a ADDR 1d251ec00
This is not a software problem!
Run through mcelog --ascii to decode and contact your hardware vendor
Kernel panic - not syncing: Machine check
I have been googling around since yesterday but haven't found anything conclusive

I've tried running mcelog and got the following:
Code:
# mcelog --k8 /dev/mcelog
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC a4d0cd72d5a8
ADDR 23c400000
  Northbridge GART error
       bit61 = error uncorrected
  TLB error 'generic transaction, level generic'
STATUS a40000000005001b MCGSTATUS 0
MCE 1
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC a56b2eba7649
ADDR 23c400000
  Northbridge GART error
       bit61 = error uncorrected
  TLB error 'generic transaction, level generic'
STATUS a40000000005001b MCGSTATUS 0
MCE 2
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC a60591585bda
ADDR 23c400000
  Northbridge GART error
       bit61 = error uncorrected
  TLB error 'generic transaction, level generic'
STATUS a40000000005001b MCGSTATUS 0
MCE 3
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC a69ff2a635e8
ADDR 23c400000
  Northbridge GART error
       bit61 = error uncorrected
  TLB error 'generic transaction, level generic'
STATUS a40000000005001b MCGSTATUS 0
MCE 4
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC a73a53f42ca9
ADDR 23c400000
  Northbridge GART error
       bit61 = error uncorrected
  TLB error 'generic transaction, level generic'
STATUS a40000000005001b MCGSTATUS 0
MCE 5
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC a7d4b6934fdf
ADDR 23c400000
  Northbridge GART error
       bit61 = error uncorrected
  TLB error 'generic transaction, level generic'
STATUS a40000000005001b MCGSTATUS 0
MCE 6
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 2 4 northbridge TSC a86f17e0a6a8
ADDR 191b0b000
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = c12f
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node response, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d417c000c1080a13 MCGSTATUS 0
MCE 7
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC a86f17e0c311
ADDR 23c400000
  Northbridge GART error
       bit61 = error uncorrected
  TLB error 'generic transaction, level generic'
STATUS a40000000005001b MCGSTATUS 0

Does anybody know anything about this?

Last edited by humbletech99; 09-21-2006 at 10:14 AM.
 
Old 09-21-2006, 10:54 AM   #2
stress_junkie
Senior Member
 
Registered: Dec 2005
Location: Massachusetts, USA
Distribution: Ubuntu 10.04 and CentOS 5.5
Posts: 3,873

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
Quote:
generic read mem transaction
memory access, level generic
This makes me think that you had a memory access error. You could try changing memory or you could try running a very stable distro like Debian just to see if you get the same problem. Since you say that you set this machine up yesterday it couldn't possibly be running in a mission critical role yet. I think I'd try using Debian first, then if you get other hardware errors related to memory you could change the memory boards.

If you have no more than four gigabytes of RAM I would use the 32 bit kernel as well.

Last edited by stress_junkie; 09-21-2006 at 10:56 AM.
 
Old 09-21-2006, 10:57 AM   #3
humbletech99
Member
 
Registered: Jun 2005
Posts: 374

Original Poster
Rep: Reputation: 30
I'm not really gonna install debian just for this, I'd be much more inclined to let it run memtest86 overnight since then I don't have to go through all the compiling and custom settings again.

I was thinking that this is either a CPU problem or a Ram problem. It seems to only happen with Opterons from what i can tell from googling.

Last edited by humbletech99; 09-21-2006 at 03:28 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Machine check exception? ryanreich Linux - General 1 08-18-2006 08:16 PM
Kernel Panic, Machine Check exception tinksmartbstupi Linux - Software 5 11-16-2005 03:18 PM
Machine Check Exception 0000000000000004 pbs Linux - Software 7 06-26-2005 12:33 PM
kernel: CPU 0: Machine Check Exception: 0000000000000004 Toadman Linux - General 4 05-27-2005 10:52 PM
CPU#0:Machine Check Exception karamboul Linux - Software 1 03-29-2002 10:33 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 09:17 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration