I set up a new amd64 gentoo server yesterday on an opteron but within a few hours of it being up I got a "Machine Check Exception" and the thing froze up. I had to go to the the local console to see this and then had to hard reboot the machine. It wasn't really doing much at the time other than compiling a couple of things. The server is a dual-cpu dual-core machine (4 cores that is) with 8GB ram and 12 SCSI disks + 2 satas for OS.
The error from the console is below:
Code:
HARDWARE ERROR
CPU 2: Machine Check Exception: 4 Bank 4: f615200133000813
TSC 5ac60e50b6a ADDR 1d251ec00
This is not a software problem!
Run through mcelog --ascii to decode and contact your hardware vendor
Kernel panic - not syncing: Machine check
I have been googling around since yesterday but haven't found anything conclusive
I've tried running mcelog and got the following:
Code:
# mcelog --k8 /dev/mcelog
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC a4d0cd72d5a8
ADDR 23c400000
Northbridge GART error
bit61 = error uncorrected
TLB error 'generic transaction, level generic'
STATUS a40000000005001b MCGSTATUS 0
MCE 1
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC a56b2eba7649
ADDR 23c400000
Northbridge GART error
bit61 = error uncorrected
TLB error 'generic transaction, level generic'
STATUS a40000000005001b MCGSTATUS 0
MCE 2
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC a60591585bda
ADDR 23c400000
Northbridge GART error
bit61 = error uncorrected
TLB error 'generic transaction, level generic'
STATUS a40000000005001b MCGSTATUS 0
MCE 3
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC a69ff2a635e8
ADDR 23c400000
Northbridge GART error
bit61 = error uncorrected
TLB error 'generic transaction, level generic'
STATUS a40000000005001b MCGSTATUS 0
MCE 4
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC a73a53f42ca9
ADDR 23c400000
Northbridge GART error
bit61 = error uncorrected
TLB error 'generic transaction, level generic'
STATUS a40000000005001b MCGSTATUS 0
MCE 5
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC a7d4b6934fdf
ADDR 23c400000
Northbridge GART error
bit61 = error uncorrected
TLB error 'generic transaction, level generic'
STATUS a40000000005001b MCGSTATUS 0
MCE 6
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 2 4 northbridge TSC a86f17e0a6a8
ADDR 191b0b000
Northbridge Chipkill ECC error
Chipkill ECC syndrome = c12f
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node response, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS d417c000c1080a13 MCGSTATUS 0
MCE 7
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC a86f17e0c311
ADDR 23c400000
Northbridge GART error
bit61 = error uncorrected
TLB error 'generic transaction, level generic'
STATUS a40000000005001b MCGSTATUS 0
Does anybody know anything about this?