LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Red Hat
User Name
Password
Red Hat This forum is for the discussion of Red Hat Linux.

Notices


Reply
  Search this Thread
Old 11-06-2013, 03:40 AM   #1
jackd1000
Member
 
Registered: Jul 2007
Posts: 67

Rep: Reputation: 15
Why do EDAC and my memtest disagree ?


All,

I get complaints from EDAC in /var/log/messages that seems to indicate there is a problem with my memory. They are like :-

****************************

Oct 27 04:05:53 ct-on-db kernel: EDAC MC0: CE page 0x5820a0, offset 0x480, grain 8, syndrome 0x8fd1, row 2, channel 1, label "": k8_edac
Oct 27 04:05:53 ct-on-db kernel: EDAC MC0: CE - no information available: k8_edac Error Overflow set
Oct 27 04:05:53 ct-on-db kernel: EDAC k8 MC0: extended error code: ECC chipkill x4 error
Oct 27 04:07:19 ct-on-db kernel: EDAC k8 MC0: general bus error: participating processor(local node response), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(

*********************************

These messages run continuously - but only a few days after the system has had a reboot and despite the fact that all the memory gets used up pretty quickly (within hours)

I've left memtest on the machine overnight and haven't had a single error. The machine has had some 20+ hours running memtest with no complaint.

My question is, what is EDAC doing that memtest doesn't ? memtest would have hit the same motherboard issues as EDAC I'd have thought (assuming its not a memory problem but possibly a motherboard problem) so I don;t see why EDAC is complaining so regularly and memtest does not.

Johnnie
 
Old 11-06-2013, 05:06 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
EDAC might be better in finding and in this case correcting (possible) memory problems.

This from edac.txt (found in the linux kernel Documentation directory):
Quote:
Detecting CE events, then harvesting those events and reporting them,
CAN be a predictor of future UE events. With CE events, the system can
continue to operate, but with less safety. Preventive maintenance and
proactive part replacement of memory DIMMs exhibiting CEs can reduce
the likelihood of the dreaded UE events and system 'panics'.
I would keep an eye on this if I where you, especially if these errors have shown up recently and keep pointing to the same dimm (MC0/row 2/channel 1).

Have you tried using dmidecode -t memory to get more information?

BTW: Which memtest are you using (memtest or memtest+)? The latter is supposed to be better equipped to deal with modern hardware.
 
Old 11-06-2013, 08:51 AM   #3
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Germany
Distribution: Whatever fits the task best
Posts: 17,148
Blog Entries: 2

Rep: Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886
Quote:
Originally Posted by jackd1000 View Post
My question is, what is EDAC doing that memtest doesn't ? memtest would have hit the same motherboard issues as EDAC I'd have thought (assuming its not a memory problem but possibly a motherboard problem) so I don;t see why EDAC is complaining so regularly and memtest does not.
Memtest is a fairly simple test, it just tests all memory cells with different patterns. It can't test any possible real life situation. You simply have to take Memtest results with this in mind: If Memtests finds errors you definitely have a problem, if it doesn't find errors your machine is likely to be error free, but there is no guarantee.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
EDAC and DIMM jackd1000 Linux - Hardware 0 10-05-2013 12:44 PM
Edac error clofvir Linux - Newbie 10 01-16-2011 02:06 AM
Is it against the rules to disagree with a moderator here? PatrickMay16 General 7 12-04-2007 05:48 PM
parted, diskdrake disagree with df n8tx Linux - Software 2 10-15-2001 02:17 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Red Hat

All times are GMT -5. The time now is 05:39 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration