GART TLB error generic level generic
Posted this in hardware too, we really don't know if this is a software issue or hardware. Think it's bogus but...
Having an issue with several LS20 blades, AMD, Running Red Hat Linux 3.2.3-53. While investigating other issues we found this error on a number of the machines. We do not think it is memory related but are not certain. Bios are updated to current levels:
Aug 8 09:07:42 xtky8205pap kernel: CPU 1: Silent Northbridge MCE
Aug 8 09:07:42 xtky8205pap kernel: Northbridge status a6000001:0005001b
Aug 8 09:07:42 xtky8205pap kernel: Error gart error
Aug 8 09:07:42 xtky8205pap kernel: GART TLB error generic level generic
Aug 8 09:07:42 xtky8205pap kernel: err cpu1
Aug 8 09:07:42 xtky8205pap kernel: processor context corrupt
Aug 8 09:07:42 xtky8205pap kernel: error uncorrected
Aug 8 09:07:42 xtky8205pap kernel: previous error lost
Aug 8 09:07:42 xtky8205pap kernel: NB error address 0000000037ff0008
Aug 9 11:17:49 xtky8205pap kernel: nfs_safe_remove: Engine/.#ds_updating busy, d_count=2
Aug 10 03:24:52 xtky8205pap kernel: nfs_safe_remove: Engine/.#ds_updating busy, d_count=2
Aug 10 10:38:16 xtky8205pap kernel: CPU 1: Silent Northbridge MCE
Aug 10 10:38:16 xtky8205pap kernel: Northbridge status a6000002:0005001b
Aug 10 10:38:16 xtky8205pap kernel: Error gart error
Aug 10 10:38:16 xtky8205pap kernel: GART TLB error generic level generic
Aug 10 10:38:16 xtky8205pap kernel: err cpu0
Aug 10 10:38:16 xtky8205pap kernel: processor context corrupt
Aug 10 10:38:16 xtky8205pap kernel: error uncorrected
Aug 10 10:38:16 xtky8205pap kernel: previous error lost
Aug 10 10:38:16 xtky8205pap kernel: NB error address 0000000037ff0010
The blades do not show any outwards signs of having any kind of stability trouble etc. Any help would be greatly appreciated.
CD
Last edited by Clydesdale; 08-13-2007 at 06:25 PM.
|