Hard Drive Problems: timeout waiting for DMA; error waiting for DMA
Ok, I've driven myself sick with this problem. Been living off google answers for the past 4 days, with no help.
First I describe the problems, then at the end I show my dmesg and lspci -vvv. I have a AMD Athlon 700, on some mobo, donno, don't matter. Currently I have 1 20GB HD as boot drive, and 6 120GB made into a software RAID5, using 2.4.22 kernel, self-compiled. Earlier I was using a 2.4.20 self-compile kernel (same hardware) and there was never a problem. 4 of the 6 120GB drives are on a Adaptec 1200A ATA100 IDERAID Controller. It uses a HighPoint driver. Anyway, after compiling the new kernel, I get many many DMA timeout errors. It take 15 minutes to boot the machine now (used to take 45secs) The errors look like... blk: queue c03deefc, I/O limit 4095Mb (mask 0xffffffff) hdg1 hdh:<4>hdh: dma_timer_expiry: dma status == 0x61 hdh: 0 bytes in FIFO hdh: timeout waiting for DMA hdh: error waiting for DMA hdh: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } This is the drive that's on the mobo's IDE slot. hdparm -tT /dev/hda: /dev/hda: Timing buffer-cache reads: 128 MB in 0.80 seconds =160.00 MB/sec Timing buffered disk reads: 64 MB in 3.97 seconds = 16.12 MB/sec This is the RAID device. hdparm -tT /dev/md0: /dev/md0: Timing buffer-cache reads: 128 MB in 0.84 seconds =152.38 MB/sec Timing buffered disk reads: 64 MB in 14.83 seconds = 4.32 MB/sec The speeds are horriable, and I've done everything I can think of. I've set the parameters for the hd using hdparm multiple different ways with no luck. I've recompiled the kernel using default PCI IDE and the specific drivers. I just don't know what else to do. I've went over and over thru the 2.4.20 kernel config I had but it's no use. Anyone that can help please email me, mint@freshstation.org or reply to this post. Thanks in advance. ************************************* ********** lspci -vvv *************** ************************************* 00:00.0 Host bridge: Advanced Micro Devices [AMD] AMD-751 [Irongate] System Controller (rev 25) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- Latency: 64 Region 0: Memory at d0000000 (32-bit, prefetchable) [size=128M] Region 1: Memory at df002000 (32-bit, prefetchable) [size=4K] Region 2: I/O ports at d000 [disabled] [size=4] Capabilities: [a0] AGP version 1.0 Status: RQ=15 SBA+ 64bit- FW- Rate=x1,x2 Command: RQ=0 SBA- AGP+ 64bit- FW- Rate=<none> 00:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-751 [Irongate] AGP Bridge (rev 01) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64 Bus: primary=00, secondary=01, subordinate=01, sec-latency=64 I/O behind bridge: 0000c000-0000cfff Memory behind bridge: dc000000-ddffffff Prefetchable memory behind bridge: d8000000-dbffffff BridgeCtl: Parity- SERR+ NoISA+ VGA+ MAbort- >Reset- FastB2B- 00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-756 [Viper] ISA (rev 01) Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-756 [Viper] IDE (rev 03) (prog-if 8a [Master SecP PriP]) Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64 Region 4: I/O ports at f000 [size=16] 00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-756 [Viper] ACPI (rev 03) Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- 00:07.4 USB Controller: Advanced Micro Devices [AMD] AMD-756 [Viper] USB (rev 06) (prog-if 10 [OHCI]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 16 (20000ns max), cache line size 08 Interrupt: pin D routed to IRQ 11 Region 0: Memory at df000000 (32-bit, non-prefetchable) [size=4K] 00:08.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 78) Subsystem: 3Com Corporation 3C905C-TX Fast Etherlink for PC Management NIC Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64 (2500ns min, 2500ns max), cache line size 08 Interrupt: pin A routed to IRQ 5 Region 0: I/O ports at d400 [size=128] Region 1: Memory at df001000 (32-bit, non-prefetchable) [size=128] Expansion ROM at <unassigned> [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=2 PME- 00:09.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 78) Subsystem: 3Com Corporation 3C905C-TX Fast Etherlink for PC Management NIC Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64 (2500ns min, 2500ns max), cache line size 08 Interrupt: pin A routed to IRQ 10 Region 0: I/O ports at d800 [size=128] Region 1: Memory at df003000 (32-bit, non-prefetchable) [size=128] Expansion ROM at <unassigned> [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=2 PME- 00:0b.0 Unknown mass storage controller: Triones Technologies, Inc. HPT366 (rev 03) Subsystem: Triones Technologies, Inc.: Unknown device 0005 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 120 (2000ns min, 2000ns max), cache line size 08 Interrupt: pin A routed to IRQ 11 Region 0: I/O ports at dc00 [size=8] Region 1: I/O ports at e000 [size=4] Region 2: I/O ports at e400 [size=8] Region 3: I/O ports at e800 [size=4] Region 4: I/O ports at ec00 [size=256] Expansion ROM at <unassigned> [disabled] [size=128K] 01:05.0 VGA compatible controller: ATI Technologies Inc: Unknown device 5446 (prog-if 00 [VGA]) Subsystem: ATI Technologies Inc: Unknown device 0408 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64 (2000ns min), cache line size 08 Interrupt: pin A routed to IRQ 10 Region 0: Memory at d8000000 (32-bit, prefetchable) [size=64M] Region 1: I/O ports at c000 [size=256] Region 2: Memory at dd000000 (32-bit, non-prefetchable) [size=16K] Expansion ROM at <unassigned> [disabled] [size=128K] Capabilities: [50] AGP version 2.0 Status: RQ=31 SBA+ 64bit- FW- Rate=x1,x2 Command: RQ=0 SBA+ AGP- 64bit- FW- Rate=<none> Capabilities: [5c] Power Management version 2 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- *********************************** ********** dmesg ******************* *********************************** It's a big output so I placed it here freshstation.org/dmesg.txt But here is some of the important parts: Partition check: hda: hda1 hda2 hda3 hda4 hdb: hdb1 hdd: hdd1 hde:<4>hde: dma_timer_expiry: dma status == 0x61 hde: 0 bytes in FIFO hde: timeout waiting for DMA hde: error waiting for DMA hde: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } blk: queue c03deaa8, I/O limit 4095Mb (mask 0xffffffff) hde1 hdf:<4>hdf: dma_timer_expiry: dma status == 0x61 hdf: 0 bytes in FIFO hdf: timeout waiting for DMA hdf: error waiting for DMA hdf: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } blk: queue c03debe4, I/O limit 4095Mb (mask 0xffffffff) hdf1 hdg:<4>hdg: dma_timer_expiry: dma status == 0x61 hdg: 0 bytes in FIFO hdg: timeout waiting for DMA hdg: error waiting for DMA hdg: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } blk: queue c03deefc, I/O limit 4095Mb (mask 0xffffffff) hdg1 hdh:<4>hdh: dma_timer_expiry: dma status == 0x61 hdh: 0 bytes in FIFO hdh: timeout waiting for DMA hdh: error waiting for DMA hdh: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } blk: queue c03df038, I/O limit 4095Mb (mask 0xffffffff) hdh1 Highpoint HPT370 Softwareraid driver for linux version 0.02 hde: dma_timer_expiry: dma status == 0x61 hde: 0 bytes in FIFO hde: timeout waiting for DMA hde: error waiting for DMA hde: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } blk: queue c03deaa8, I/O limit 4095Mb (mask 0xffffffff) hdf: dma_timer_expiry: dma status == 0x61 hdf: 0 bytes in FIFO hdf: timeout waiting for DMA hdf: error waiting for DMA hdf: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } blk: queue c03debe4, I/O limit 4095Mb (mask 0xffffffff) hdg: dma_timer_expiry: dma status == 0x61 hdg: 0 bytes in FIFO hdg: timeout waiting for DMA hdg: error waiting for DMA hdg: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } blk: queue c03deefc, I/O limit 4095Mb (mask 0xffffffff) hdh: dma_timer_expiry: dma status == 0x61 hdh: 0 bytes in FIFO hdh: timeout waiting for DMA hdh: error waiting for DMA hdh: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } blk: queue c03de790, I/O limit 4095Mb (mask 0xffffffff) blk: queue c03deaa8, I/O limit 4095Mb (mask 0xffffffff) blk: queue c03debe4, I/O limit 4095Mb (mask 0xffffffff) blk: queue c03deefc, I/O limit 4095Mb (mask 0xffffffff) blk: queue c03df038, I/O limit 4095Mb (mask 0xffffffff) blk: queue c03de33c, I/O limit 4095Mb (mask 0xffffffff) hde: dma_timer_expiry: dma status == 0x61 hdh: dma_timer_expiry: dma status == 0x61 hde: 0 bytes in FIFO hde: timeout waiting for DMA hde: error waiting for DMA hde: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } hdh: 0 bytes in FIFO hdh: timeout waiting for DMA hdh: error waiting for DMA hdh: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } hdf: 0 bytes in FIFO hdf: timeout waiting for DMA hdf: status error: status=0x58 { DriveReady SeekComplete DataRequest } hdf: drive not ready for command hdf: dma_timer_expiry: dma status == 0x41 hdf: 0 bytes in FIFO hdf: timeout waiting for DMA hdf: error waiting for DMA hdf: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } blk: queue c03de790, I/O limit 4095Mb (mask 0xffffffff) blk: queue c03deaa8, I/O limit 4095Mb (mask 0xffffffff) blk: queue c03debe4, I/O limit 4095Mb (mask 0xffffffff) blk: queue c03deefc, I/O limit 4095Mb (mask 0xffffffff) blk: queue c03df038, I/O limit 4095Mb (mask 0xffffffff) hde: dma_timer_expiry: dma status == 0x61 hdh: dma_timer_expiry: dma status == 0x61 hde: 0 bytes in FIFO hde: timeout waiting for DMA hde: error waiting for DMA hde: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } hdh: 0 bytes in FIFO hdh: timeout waiting for DMA hdh: error waiting for DMA hdh: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } hdf: 0 bytes in FIFO hdf: timeout waiting for DMA hdf: status error: status=0x58 { DriveReady SeekComplete DataRequest } hdf: drive not ready for command hdf: dma_timer_expiry: dma status == 0x41 hdf: 0 bytes in FIFO hdf: timeout waiting for DMA hdf: error waiting for DMA hdf: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } blk: queue c03de200, I/O limit 4095Mb (mask 0xffffffff) blk: queue c03de33c, I/O limit 4095Mb (mask 0xffffffff) blk: queue c03de790, I/O limit 4095Mb (mask 0xffffffff) blk: queue c03deaa8, I/O limit 4095Mb (mask 0xffffffff) blk: queue c03debe4, I/O limit 4095Mb (mask 0xffffffff) blk: queue c03deefc, I/O limit 4095Mb (mask 0xffffffff) blk: queue c03df038, I/O limit 4095Mb (mask 0xffffffff) hde: dma_timer_expiry: dma status == 0x61 hdh: dma_timer_expiry: dma status == 0x61 hde: 0 bytes in FIFO hde: timeout waiting for DMA hde: error waiting for DMA hde: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } hdh: 0 bytes in FIFO hdh: timeout waiting for DMA hdh: error waiting for DMA hdh: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } hdf: 0 bytes in FIFO hdf: timeout waiting for DMA hdf: status error: status=0x58 { DriveReady SeekComplete DataRequest } hdf: drive not ready for command hdf: dma_timer_expiry: dma status == 0x41 hdf: 0 bytes in FIFO hdf: timeout waiting for DMA hdf: error waiting for DMA hdf: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } blk: queue c03de33c, I/O limit 4095Mb (mask 0xffffffff) hdb: DMA disabled blk: queue c03de790, I/O limit 4095Mb (mask 0xffffffff) hdd: DMA disabled blk: queue c03deaa8, I/O limit 4095Mb (mask 0xffffffff) hde: DMA disabled blk: queue c03debe4, I/O limit 4095Mb (mask 0xffffffff) hdf: DMA disabled blk: queue c03deefc, I/O limit 4095Mb (mask 0xffffffff) hdg: DMA disabled blk: queue c03df038, I/O limit 4095Mb (mask 0xffffffff) hdh: DMA disabled blk: queue c03de200, I/O limit 4095Mb (mask 0xffffffff) hda: DMA disabled blk: queue c03deaa8, I/O limit 4095Mb (mask 0xffffffff) hde: dma_timer_expiry: dma status == 0x21 hde: 0 bytes in FIFO hde: timeout waiting for DMA hde: error waiting for DMA hde: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest } |
For completeness, could you include your hdparm settings for the drives?
That's "hdparm <device>", no flags. |
root@nebula:~# hdparm /dev/hde
/dev/hde: multcount = 0 (off) I/O support = 1 (32-bit) unmaskirq = 1 (on) using_dma = 0 (off) keepsettings = 1 (on) nowerr = 0 (off) readonly = 0 (off) readahead = 2 (on) geometry = 14593/255/63, sectors = 234441648, start = 0 Other than that, all the other drives are the same. |
I had nearly the same problem. At my SuSE 9.0 the solution was simple in the end:
Changing "/etc/sysconfig/hardware" from: DEVICES_FORCE_IDE_DMA="/dev/hda:udma5 /dev/hdb:udma2 /dev/hdd:on" to: DEVICES_FORCE_IDE_DMA="" and everything is working (and booting) fine, again. Hope, it will help you. schultzm |
What you have suggested is to TURN-OFF DMA. this is not, im my view the proper corrective action.
Of coz, if u do not use DMA, then there will be no DMA timeeouts!! |
AFAIK, this kind of error is usually an indication of faulty hardware. It maybe the disk, it may also be the power supply.
If you changed a few disks and it still giving out such error messages, then I suggest you look for a good power supply for your system. At least, this was what happened to me. |
let me discribe my problem, with gives me the same error.
I have Quntum fireball 30GB HDD - master on ide1 - DMA mode CDROM - slave on ide1 - PIO mode I can boot up from my CDROM (live cd, bootable disks etc) with no problem at all. This might be due to the use of PIO mode. How ever, when the HDD is accessed ( that is using DMA mode), i keep getting the same erros reported by mintee. I tried several HDD on the system, but got the same error consistantly. I came to the conclution that there might bea problem in the DMA hardware on my motherboard, and am now about to replace it. I havent done so yet howerver. ghostdancer, what makes u say it is related to Pwr supp? Based on my setup, since CDROM uses PIO and works fine, i concluded it might be the DMA controller. Both CDROM and HDD used the same powerunit. |
Quote:
However, if a system using better graded PSU but it is runs beyond its designed limits, would also result in PSU failure. For example, a particular spec of hardware is design only to run with 1 hard drive, with additional hard drive or external USB component, it would consume more power, then this PSU maybe running beyond its capacity. You should also aware, compare a drive that is idle with a drive that is active, a drive that is active uses more power (eg: loading applications or doing memory swapping or other activities). Once this happens, some component in the system may gets less power as its needs, and it may results some starts to fail. For our case, we bought a few systems from a Taiwan manufacturer (these days, they have a lot of small form factor and 1U system). We basically setup these systems as computer appliances for sending and receiving SMS via USB GSM modem. When we started selling, we notice a few series of hardware always fail after a period of operation. This is regardless of changing the main board or the hard drive. After some investigation, we realised, the problem is related to the USB GSM modem that we are using. It seems, when the modem is transmitting SMS or receiving SMS, it uses more power (it is similar to our normal hand phone, when it is idling, it can last a few days without recharging, when we starts to make calls, sending or receiving SMSes, it needs to recharge frequently as it consume more power).The systems that we bought was not design to operate under such situation, and worse, it was using a lower grade PSU. The symptoms is flickering front panel display (our system has a front LED for displaying system status and simple menu) and DMA error from the drive. In some situations, there is even random reboot (basically, the PSU has degraded to a state that it can not provide stable power). After we upgraded the system to better grade PSU, it runs fine with nothing else to change of the system (however, sometime, we may need to change the drive, since it may already damaged during operation). Main board fault is possible, but from my experience, it is quite rare (of so many boxes we sold, I think, it counted less than 10). If it really is the main board issue, then the usual symptoms is, it will emit certain sound, simply won't start or intermediate start-up failure. Unless, you bought a cheap main board or some non-branded main board? Anyway, this is just my experience. It maybe different with your situation. |
mmm.. yes, i begin to see what u mean ghostdancer.
FYI : see http://www.z-a-recovery.com/art-powe...ly-failure.htm I'll change my power unit, and let the results be known! |
I changed the pwr supply unit and tried.
Still the same problem So i guess, i have got a rare motherboard issue. My next option is to replace the mother board and see... |
Quote:
Good luck! |
All times are GMT -5. The time now is 03:54 PM. |