LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Hard Drive Problems: timeout waiting for DMA; error waiting for DMA (https://www.linuxquestions.org/questions/linux-hardware-18/hard-drive-problems-timeout-waiting-for-dma%3B-error-waiting-for-dma-111942/)

mintee 11-03-2003 02:34 PM

Hard Drive Problems: timeout waiting for DMA; error waiting for DMA
 
Ok, I've driven myself sick with this problem. Been living off google answers for the past 4 days, with no help.

First I describe the problems, then at the end I show my dmesg and lspci -vvv.

I have a AMD Athlon 700, on some mobo, donno, don't matter. Currently I have 1 20GB HD as boot drive, and 6 120GB made into a software RAID5, using 2.4.22 kernel, self-compiled. Earlier I was using a 2.4.20 self-compile kernel (same hardware) and there was never a problem. 4 of the 6 120GB drives are on a Adaptec 1200A ATA100 IDERAID Controller. It uses a HighPoint driver.

Anyway, after compiling the new kernel, I get many many DMA timeout errors. It take 15 minutes to boot the machine now (used to take 45secs) The errors look like...

blk: queue c03deefc, I/O limit 4095Mb (mask 0xffffffff)
hdg1
hdh:<4>hdh: dma_timer_expiry: dma status == 0x61
hdh: 0 bytes in FIFO
hdh: timeout waiting for DMA
hdh: error waiting for DMA
hdh: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }


This is the drive that's on the mobo's IDE slot.

hdparm -tT /dev/hda:

/dev/hda:
Timing buffer-cache reads: 128 MB in 0.80 seconds =160.00 MB/sec
Timing buffered disk reads: 64 MB in 3.97 seconds = 16.12 MB/sec


This is the RAID device.

hdparm -tT /dev/md0:

/dev/md0:
Timing buffer-cache reads: 128 MB in 0.84 seconds =152.38 MB/sec
Timing buffered disk reads: 64 MB in 14.83 seconds = 4.32 MB/sec


The speeds are horriable, and I've done everything I can think of. I've set the parameters for the hd using hdparm multiple different ways with no luck. I've recompiled the kernel using default PCI IDE and the specific drivers. I just don't know what else to do. I've went over and over thru the 2.4.20 kernel config I had but it's no use.

Anyone that can help please email me, mint@freshstation.org or reply to this post.

Thanks in advance.



*************************************
********** lspci -vvv ***************
*************************************
00:00.0 Host bridge: Advanced Micro Devices [AMD] AMD-751 [Irongate] System Controller (rev 25)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 64
Region 0: Memory at d0000000 (32-bit, prefetchable) [size=128M]
Region 1: Memory at df002000 (32-bit, prefetchable) [size=4K]
Region 2: I/O ports at d000 [disabled] [size=4]
Capabilities: [a0] AGP version 1.0
Status: RQ=15 SBA+ 64bit- FW- Rate=x1,x2
Command: RQ=0 SBA- AGP+ 64bit- FW- Rate=<none>

00:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-751 [Irongate] AGP Bridge (rev 01) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64
Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
I/O behind bridge: 0000c000-0000cfff
Memory behind bridge: dc000000-ddffffff
Prefetchable memory behind bridge: d8000000-dbffffff
BridgeCtl: Parity- SERR+ NoISA+ VGA+ MAbort- >Reset- FastB2B-

00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-756 [Viper] ISA (rev 01)
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0

00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-756 [Viper] IDE (rev 03) (prog-if 8a [Master SecP PriP])
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64
Region 4: I/O ports at f000 [size=16]

00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-756 [Viper] ACPI (rev 03)
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-

00:07.4 USB Controller: Advanced Micro Devices [AMD] AMD-756 [Viper] USB (rev 06) (prog-if 10 [OHCI])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 16 (20000ns max), cache line size 08
Interrupt: pin D routed to IRQ 11
Region 0: Memory at df000000 (32-bit, non-prefetchable) [size=4K]

00:08.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 78)
Subsystem: 3Com Corporation 3C905C-TX Fast Etherlink for PC Management NIC
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (2500ns min, 2500ns max), cache line size 08
Interrupt: pin A routed to IRQ 5
Region 0: I/O ports at d400 [size=128]
Region 1: Memory at df001000 (32-bit, non-prefetchable) [size=128]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=2 PME-

00:09.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 78)
Subsystem: 3Com Corporation 3C905C-TX Fast Etherlink for PC Management NIC
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (2500ns min, 2500ns max), cache line size 08
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at d800 [size=128]
Region 1: Memory at df003000 (32-bit, non-prefetchable) [size=128]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=2 PME-

00:0b.0 Unknown mass storage controller: Triones Technologies, Inc. HPT366 (rev 03)
Subsystem: Triones Technologies, Inc.: Unknown device 0005
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 120 (2000ns min, 2000ns max), cache line size 08
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at dc00 [size=8]
Region 1: I/O ports at e000 [size=4]
Region 2: I/O ports at e400 [size=8]
Region 3: I/O ports at e800 [size=4]
Region 4: I/O ports at ec00 [size=256]
Expansion ROM at <unassigned> [disabled] [size=128K]


01:05.0 VGA compatible controller: ATI Technologies Inc: Unknown device 5446 (prog-if 00 [VGA])
Subsystem: ATI Technologies Inc: Unknown device 0408
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (2000ns min), cache line size 08
Interrupt: pin A routed to IRQ 10
Region 0: Memory at d8000000 (32-bit, prefetchable) [size=64M]
Region 1: I/O ports at c000 [size=256]
Region 2: Memory at dd000000 (32-bit, non-prefetchable) [size=16K]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [50] AGP version 2.0
Status: RQ=31 SBA+ 64bit- FW- Rate=x1,x2
Command: RQ=0 SBA+ AGP- 64bit- FW- Rate=<none>
Capabilities: [5c] Power Management version 2
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-




***********************************
********** dmesg *******************
***********************************

It's a big output so I placed it here freshstation.org/dmesg.txt

But here is some of the important parts:

Partition check:
hda: hda1 hda2 hda3 hda4
hdb: hdb1
hdd: hdd1
hde:<4>hde: dma_timer_expiry: dma status == 0x61
hde: 0 bytes in FIFO
hde: timeout waiting for DMA
hde: error waiting for DMA
hde: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

blk: queue c03deaa8, I/O limit 4095Mb (mask 0xffffffff)
hde1
hdf:<4>hdf: dma_timer_expiry: dma status == 0x61
hdf: 0 bytes in FIFO
hdf: timeout waiting for DMA
hdf: error waiting for DMA
hdf: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

blk: queue c03debe4, I/O limit 4095Mb (mask 0xffffffff)
hdf1
hdg:<4>hdg: dma_timer_expiry: dma status == 0x61
hdg: 0 bytes in FIFO
hdg: timeout waiting for DMA
hdg: error waiting for DMA
hdg: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

blk: queue c03deefc, I/O limit 4095Mb (mask 0xffffffff)
hdg1
hdh:<4>hdh: dma_timer_expiry: dma status == 0x61
hdh: 0 bytes in FIFO
hdh: timeout waiting for DMA
hdh: error waiting for DMA
hdh: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

blk: queue c03df038, I/O limit 4095Mb (mask 0xffffffff)
hdh1
Highpoint HPT370 Softwareraid driver for linux version 0.02
hde: dma_timer_expiry: dma status == 0x61
hde: 0 bytes in FIFO
hde: timeout waiting for DMA
hde: error waiting for DMA
hde: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

blk: queue c03deaa8, I/O limit 4095Mb (mask 0xffffffff)
hdf: dma_timer_expiry: dma status == 0x61
hdf: 0 bytes in FIFO
hdf: timeout waiting for DMA
hdf: error waiting for DMA
hdf: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

blk: queue c03debe4, I/O limit 4095Mb (mask 0xffffffff)
hdg: dma_timer_expiry: dma status == 0x61
hdg: 0 bytes in FIFO
hdg: timeout waiting for DMA
hdg: error waiting for DMA
hdg: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

blk: queue c03deefc, I/O limit 4095Mb (mask 0xffffffff)
hdh: dma_timer_expiry: dma status == 0x61
hdh: 0 bytes in FIFO
hdh: timeout waiting for DMA
hdh: error waiting for DMA
hdh: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

blk: queue c03de790, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c03deaa8, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c03debe4, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c03deefc, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c03df038, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c03de33c, I/O limit 4095Mb (mask 0xffffffff)
hde: dma_timer_expiry: dma status == 0x61
hdh: dma_timer_expiry: dma status == 0x61
hde: 0 bytes in FIFO
hde: timeout waiting for DMA
hde: error waiting for DMA
hde: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

hdh: 0 bytes in FIFO
hdh: timeout waiting for DMA
hdh: error waiting for DMA
hdh: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

hdf: 0 bytes in FIFO
hdf: timeout waiting for DMA
hdf: status error: status=0x58 { DriveReady SeekComplete DataRequest }

hdf: drive not ready for command
hdf: dma_timer_expiry: dma status == 0x41
hdf: 0 bytes in FIFO
hdf: timeout waiting for DMA
hdf: error waiting for DMA
hdf: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

blk: queue c03de790, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c03deaa8, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c03debe4, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c03deefc, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c03df038, I/O limit 4095Mb (mask 0xffffffff)
hde: dma_timer_expiry: dma status == 0x61
hdh: dma_timer_expiry: dma status == 0x61
hde: 0 bytes in FIFO
hde: timeout waiting for DMA
hde: error waiting for DMA
hde: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

hdh: 0 bytes in FIFO
hdh: timeout waiting for DMA
hdh: error waiting for DMA
hdh: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

hdf: 0 bytes in FIFO
hdf: timeout waiting for DMA
hdf: status error: status=0x58 { DriveReady SeekComplete DataRequest }

hdf: drive not ready for command
hdf: dma_timer_expiry: dma status == 0x41
hdf: 0 bytes in FIFO
hdf: timeout waiting for DMA
hdf: error waiting for DMA
hdf: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

blk: queue c03de200, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c03de33c, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c03de790, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c03deaa8, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c03debe4, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c03deefc, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c03df038, I/O limit 4095Mb (mask 0xffffffff)
hde: dma_timer_expiry: dma status == 0x61
hdh: dma_timer_expiry: dma status == 0x61
hde: 0 bytes in FIFO
hde: timeout waiting for DMA
hde: error waiting for DMA
hde: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

hdh: 0 bytes in FIFO
hdh: timeout waiting for DMA
hdh: error waiting for DMA
hdh: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

hdf: 0 bytes in FIFO
hdf: timeout waiting for DMA
hdf: status error: status=0x58 { DriveReady SeekComplete DataRequest }

hdf: drive not ready for command
hdf: dma_timer_expiry: dma status == 0x41
hdf: 0 bytes in FIFO
hdf: timeout waiting for DMA
hdf: error waiting for DMA
hdf: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

blk: queue c03de33c, I/O limit 4095Mb (mask 0xffffffff)
hdb: DMA disabled
blk: queue c03de790, I/O limit 4095Mb (mask 0xffffffff)
hdd: DMA disabled
blk: queue c03deaa8, I/O limit 4095Mb (mask 0xffffffff)
hde: DMA disabled
blk: queue c03debe4, I/O limit 4095Mb (mask 0xffffffff)
hdf: DMA disabled
blk: queue c03deefc, I/O limit 4095Mb (mask 0xffffffff)
hdg: DMA disabled
blk: queue c03df038, I/O limit 4095Mb (mask 0xffffffff)
hdh: DMA disabled
blk: queue c03de200, I/O limit 4095Mb (mask 0xffffffff)
hda: DMA disabled
blk: queue c03deaa8, I/O limit 4095Mb (mask 0xffffffff)
hde: dma_timer_expiry: dma status == 0x21
hde: 0 bytes in FIFO
hde: timeout waiting for DMA
hde: error waiting for DMA
hde: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

unSpawn 11-10-2003 07:05 PM

For completeness, could you include your hdparm settings for the drives?
That's "hdparm <device>", no flags.

mintee 11-12-2003 11:43 AM

root@nebula:~# hdparm /dev/hde

/dev/hde:
multcount = 0 (off)
I/O support = 1 (32-bit)
unmaskirq = 1 (on)
using_dma = 0 (off)
keepsettings = 1 (on)
nowerr = 0 (off)
readonly = 0 (off)
readahead = 2 (on)
geometry = 14593/255/63, sectors = 234441648, start = 0



Other than that, all the other drives are the same.

schultzm 05-18-2004 05:05 AM

I had nearly the same problem. At my SuSE 9.0 the solution was simple in the end:

Changing "/etc/sysconfig/hardware"

from:
DEVICES_FORCE_IDE_DMA="/dev/hda:udma5 /dev/hdb:udma2 /dev/hdd:on"

to:
DEVICES_FORCE_IDE_DMA=""

and everything is working (and booting) fine, again.

Hope, it will help you.

schultzm

tidalbobo 09-18-2007 11:17 PM

What you have suggested is to TURN-OFF DMA. this is not, im my view the proper corrective action.
Of coz, if u do not use DMA, then there will be no DMA timeeouts!!

ghostdancer 09-19-2007 12:33 AM

AFAIK, this kind of error is usually an indication of faulty hardware. It maybe the disk, it may also be the power supply.

If you changed a few disks and it still giving out such error messages, then I suggest you look for a good power supply for your system. At least, this was what happened to me.

tidalbobo 09-19-2007 11:37 PM

let me discribe my problem, with gives me the same error.

I have
Quntum fireball 30GB HDD - master on ide1 - DMA mode
CDROM - slave on ide1 - PIO mode

I can boot up from my CDROM (live cd, bootable disks etc) with no problem at all. This might be due to the use of PIO mode. How ever, when the HDD is accessed ( that is using DMA mode), i keep getting the same erros reported by mintee.

I tried several HDD on the system, but got the same error consistantly.

I came to the conclution that there might bea problem in the DMA hardware on my motherboard, and am now about to replace it. I havent done so yet howerver.

ghostdancer, what makes u say it is related to Pwr supp?

Based on my setup, since CDROM uses PIO and works fine, i concluded it might be the DMA controller. Both CDROM and HDD used the same powerunit.

ghostdancer 09-20-2007 02:15 AM

Quote:

Originally Posted by tidalbobo (Post 2897509)
...
ghostdancer, what makes u say it is related to Pwr supp?
...

That is my experience. A lot of time, we forget about the power supply unit (PSU) whenever we troubleshoot the system. As much as I know (since I involve in a company that sells Linux appliance system), the quality of PSU may degrade over a period of operation. PSU has grading of its own. A better graded PSU basically can run longer than a lower graded PSU, under normal situation.

However, if a system using better graded PSU but it is runs beyond its designed limits, would also result in PSU failure. For example, a particular spec of hardware is design only to run with 1 hard drive, with additional hard drive or external USB component, it would consume more power, then this PSU maybe running beyond its capacity. You should also aware, compare a drive that is idle with a drive that is active, a drive that is active uses more power (eg: loading applications or doing memory swapping or other activities). Once this happens, some component in the system may gets less power as its needs, and it may results some starts to fail.

For our case, we bought a few systems from a Taiwan manufacturer (these days, they have a lot of small form factor and 1U system). We basically setup these systems as computer appliances for sending and receiving SMS via USB GSM modem. When we started selling, we notice a few series of hardware always fail after a period of operation. This is regardless of changing the main board or the hard drive. After some investigation, we realised, the problem is related to the USB GSM modem that we are using. It seems, when the modem is transmitting SMS or receiving SMS, it uses more power (it is similar to our normal hand phone, when it is idling, it can last a few days without recharging, when we starts to make calls, sending or receiving SMSes, it needs to recharge frequently as it consume more power).The systems that we bought was not design to operate under such situation, and worse, it was using a lower grade PSU. The symptoms is flickering front panel display (our system has a front LED for displaying system status and simple menu) and DMA error from the drive. In some situations, there is even random reboot (basically, the PSU has degraded to a state that it can not provide stable power). After we upgraded the system to better grade PSU, it runs fine with nothing else to change of the system (however, sometime, we may need to change the drive, since it may already damaged during operation).

Main board fault is possible, but from my experience, it is quite rare (of so many boxes we sold, I think, it counted less than 10). If it really is the main board issue, then the usual symptoms is, it will emit certain sound, simply won't start or intermediate start-up failure. Unless, you bought a cheap main board or some non-branded main board?

Anyway, this is just my experience. It maybe different with your situation.

tidalbobo 09-20-2007 02:55 AM

mmm.. yes, i begin to see what u mean ghostdancer.
FYI : see http://www.z-a-recovery.com/art-powe...ly-failure.htm

I'll change my power unit, and let the results be known!

tidalbobo 09-21-2007 05:00 AM

I changed the pwr supply unit and tried.
Still the same problem
So i guess, i have got a rare motherboard issue.
My next option is to replace the mother board and see...

ghostdancer 09-21-2007 05:06 AM

Quote:

Originally Posted by tidalbobo (Post 2898935)
I changed the pwr supply unit and tried.
Still the same problem
So i guess, i have got a rare motherboard issue.
My next option is to replace the mother board and see...

Oh...

Good luck!


All times are GMT -5. The time now is 03:54 PM.