LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Enterprise (https://www.linuxquestions.org/questions/linux-enterprise-47/)
-   -   Machine check exception on RHEL4 (https://www.linuxquestions.org/questions/linux-enterprise-47/machine-check-exception-on-rhel4-618674/)

ajatiti 02-04-2008 04:34 PM

Machine check exception on RHEL4
 
I am facing a problem with Dell Power Edge 2950 server. It is RHEL4 (kernel 2.6.9-5).

The server gets hung with the following on the screen. I took the screen shot when I connected through DRAC console.
Dell recommended to update the firmware (BIOS and BMC). We did that and still having same problem.
DRAC logs all the hardware event logs, we can see " cpu mach chk" error in that logs. The front panel on the physical server displays the same error.

Also in September we had similar problem that occured twice and then we changed the motherboard, cpu, riser.

Now dell says they cannot seee any hardware problem they want us to loook for any OS issues.

Do you think it could be an OS issue? Did anyone had the same issue?

This is what I saw on the console:

stack: ffffffff8011ba9a 0000000000000000 0000000000000002 0000000000000000
0000000000000000 0000000000000900 00000000ffffffff ffffffff803beea0
00007730a18eb238 ffffffff8011bad7
Call Trace:<ffffffff8011ba9a>{smp_really_stop_cpu+0} <ffffffff8011bad7>{smp_send
_stop+52}
<ffffffff80135106>{panic+235} <ffffffff8011744f>{print_mce+159}
<ffffffff80117510>{mce_available+0} <ffffffff80117855>{do_machine_check+811}
<ffffffff8010e6cc>{mwait_idle+86} <ffffffff8010e6cc>{mwait_idle+86}
<ffffffff8011115b>{machine_check+127} <ffffffff8010e6cc>{mwait_idle+86}
<EOE> <ffffffff8010e65c>{cpu_idle+26}

Code: eb f6 85 db 7e 0a 8b 45 14 44 39 e0 74 02 eb f6 31 c0 85 db
console shuts up ...
NMI Watchdog detected LOCKUP on CPU1, registers:
CPU1
Modules linked in: e1000(U) md5 ipv6(U) autofs4 i2c_dev i2c_core sunrpc ds yen
_socket pcmcia_core button battery ac sr_mod(U) usb_storage joydev uhci_hcd eh
_hcd bnx2(U) dm_sbanpshot dm_zero dm_mirror ext3 jbd(U) dm_mod mptfc(U) mptsas(
mptspi(U) mptscsih(U) mptbase(U) megaraid_mbox(U) megaraid_mm(U) megaraid_sas
sd_mod scsi_mod
Pid:3864, comm: hald Tainted: GF M 2.6.9-5.ELsmp
RIP: 0010:[<ffffffff802f88c4>]

thanks..

slacksite 02-13-2008 10:06 AM

What are the physical specs of the CPUs on this server?

Newer processors require later versions of RHEL4 to work properly.

In particular, there were some OS changes made post RHEL4U5 to address Clovertown and Harpertown CPUs. Odd things would happen, including MCEs, particularly on 64-bit. Based on the addresses in your stack trace, you are running 64-bit as well.

Its also probably worth mentioning that you are running an OS that is 2 years old (RHEL4 GA). I would *strongly* recommend you update to the latest RHEL4 update and errata.

ajatiti 02-14-2008 10:38 AM

The processor is Intel Xeon 5150 2.66Ghz 64-bit.

What is the latest update for RHEL4?
Where can I get the update from? Will there be any impact on the working server?
Appreciate your help!!

slacksite 02-15-2008 10:26 AM

RHEL4U6 is the latest update release, but there are already errata to RHEL4U6.

Do you have a subscription to RHN? take a look at the following few KB articles:

http://kbase.redhat.com/faq/FAQ_80_4293.shtm

http://kbase.redhat.com/faq/FAQ_80_3929.shtm

ajatiti 02-18-2008 01:50 PM

Thanks a Ton..


All times are GMT -5. The time now is 08:42 AM.