LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 07-15-2003, 03:15 PM   #1
geos
LQ Newbie
 
Registered: Jul 2003
Posts: 1

Rep: Reputation: 0
Unhappy Adaptec aic7xxx ABORT -- need help!!!


Hello,

I think we have an issue with the drivers or BIOS on our two Adaptec 9160 cards (please see PROBLEM and SETUP) below, and in looking around through various google searches this appears to be a problem many others have experienced, but I've been unable to get any clear answers on exactly what to do to fix this. Most of the stuff I read seemed to point to older versions of Linux, where an older version of the Adaptec driver was the culprit, but we're running RedHat 7.3 so we should have the patched version of the Adaptec driver, so this is a big concern to me. Anyway, I thought this listing would be relevant to this subject.

I did not install Linux on this host, but I will try to provide as much information as possible. Anyway, I was hoping someone on this listing might be able help me to fix this very annoying problem. I need advice on what to
do here so I don't screw things up. I'm not very knowledgeable about Linux.

PROBLEM: We regularly receive ABORT messages from the aic7xxx driver on our Linux box. We're running Legato (no snickering, please) with two attached tape libraries
that use these cards. I've provided some sample output from the /var/log/messages file below. These errors seem to occur once every few days or so, and as near as I can tell, only when the cards are in use. When these errors happen, there is a reasonable likelihood that one or more nsrjb
processes (these perform mounting, loading, unloading, etc. of tapes) will hang, resulting in frozen backup operations. Sometimes, the affected tape will be prematurely marked full by the software -- no doubt, a nasty side effect of this phenomena. Communication from the host to the affected devices (tape drives) will often be terminated, but other times, the communication is unaffected. In either case, the syntax of the messages does not seem to differ. Sometimes the host itself will lock up and must be cold booted, but normally the machine is fine, and the worst case scenario is
that no further communication with the attached devices is possible until the machine is rebooted.

SETUP: The host in question is a Dell PowerEdge 2550, BIOS Revision A06.
'uname -a' shows: 2.4.20-13.7smp #1 SMP Mon May 12 12:31:27 EDT 2003 i686 unknown

On bootup, I see:

Adaptec SCSI Card 39160 BIOS, (c) 2000 Adaptec, Inc.
v2.57.2S2

for the card that manages our P1000 tape library and

v 3.10.0 (c) 2001

for the card managing the Storagetek library.

We're running one Storagetek L80 LTO tape library on one of the cards. In this case both channels are being used. Specifically, the picker and two drives on the library are daisy chained and attach to channel A on the SCSI card
via one LVD cable, and the other two drives are daisy chained and attach to channel B on the SCSI card. The other library is an ATL P1000 tape library with two SDLT
drives and connects to the other Adaptec card, using only one channel. Both libraries are terminated properly, and all cables have been checked. As I said, communication is restored once the host is rebooted.

I did check Adaptec's page, and it appears that we have the latest firmware release for the second card, but we're behind on the other one. I could download that for the other card and flash it's BIOS, but this seems to be a one shot "better know you really want to do this" deal. Not sure if I
should do this, but I was thinking that maybe the current driver won't work reliably with the older BIOS, so maybe this is part of the problem. I'd read that many changes
were included in the new Adaptec driver. I don't know how to determine the version we're using. I'm wondering if we can patch the driver and if so how? I thought that Linux just uses a built-in driver, and the only way to get the next version is to upgrade the OS? Anyway, I'm thinking we have all the right settings in the config for the cards, but maybe we're using the wrong version of the driver, or that older BIOS is the problem. Maybe it's kludging up the other card. When the communication is terminated, it appears to
affect both libraries, but I've not tried running just one library. Could do that, but I've seen so many things out there about these cards that seem to match our symptoms that I thought I'd start here. I did post something once before
to a Legato listing, but that was a while ago, and I think we were waiting to upgrade to 7.3 at the time since someone commented that the next release of the driver would probably fix things.

Would appreciate any help.

George

<<< /var/log/messages >>>
Jul 7 04:53:30 hostname kernel: scsi0:0:3:0: Attempting to queue an ABORT message
Jul 7 04:53:30 hostname kernel: scsi0: Dumping Card State in Command
phase, at SEQADDR 0x168
Jul 7 04:53:30 hostname kernel: ACCUM = 0x80, SINDEX = 0xa0, DINDEX = 0xe4, ARG_2 = 0x0
Jul 7 04:53:30 hostname kernel: HCNT = 0x0 SCBPTR = 0x0
Jul 7 04:53:30 hostname kernel: SCSISEQ = 0x12, SBLKCTL = 0xa
Jul 7 04:53:30 hostname kernel: DFCNTRL = 0x4, DFSTATUS = 0x89
Jul 7 04:53:30 hostname kernel: LASTPHASE = 0x80, SCSISIGI = 0x44, SXFRCTL0 = 0x88
Jul 7 04:53:30 hostname kernel: SSTAT0 = 0x7, SSTAT1 = 0x11
Jul 7 04:53:30 hostname kernel: SCSIPHASE = 0x2
Jul 7 04:53:30 hostname kernel: STACK == 0x175, 0x160, 0xe7, 0x34
Jul 7 04:53:30 hostname kernel: SCB count = 4
Jul 7 04:53:30 hostname kernel: Kernel NEXTQSCB = 0
Jul 7 04:53:30 hostname kernel: Card NEXTQSCB = 0
Jul 7 04:53:30 hostname kernel: QINFIFO entries:
Jul 7 04:53:30 hostname kernel: Waiting Queue entries:
Jul 7 04:53:30 hostname kernel: Disconnected Queue entries:
Jul 7 04:53:30 hostname kernel: QOUTFIFO entries:
Jul 7 04:53:30 hostname kernel: Sequencer Free SCB List: 2 1 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Jul 7 04:53:30 hostname kernel: Sequencer SCB Info: 0(c 0x40, s 0x37, l 0,
t 0x1) 1(c 0x40, s 0x37, l 0, t 0xff) 2(c 0x40, s 0x27, l 0, t 0xff) 3(c
0x0, s 0xff, l 255, t 0xff) 4(c 0x0, s 0xff, l 255, t 0xff) 5(c 0x0, s
0xff, l 255, t 0xff) 6(c 0x0, s 0xff, l 255, t 0xff) 7(c 0x0, s 0xff, l
255, t 0xff) 8(c 0x0, s 0xff, l 255, t 0xff) 9(c 0x0, s 0xff, l 255, t
0xff) 10(c 0x0, s 0xff, l 255, t 0xff) 11(c 0x0, s 0xff, l 255, t 0xff)
12(c 0x0, s 0xff, l 255, t 0xff) 13(c 0x0, s 0xff, l 255, t 0xff) 14(c
0x0, s 0xff, l 255, t 0xff) 15(c 0x0, s 0xff, l 255, t 0xff) 16(c 0x0, s
0xff, l 255, t 0xff) 17(c 0x0, s 0xff, l 255, t 0xff) 18(c 0x0, s 0xff,
l 255, t 0xff) 19(c 0x0, s 0xff, l 255, t 0xff) 20(c 0x0, s 0xff, l 255,
t 0xff) 21(c 0x0, s 0xff, l 255, t 0xff) 22(c 0x0, s 0xff, l 255, t
0xff) 23(c 0x0, s 0xff, l 255, t 0xff) 24(c 0x0, s 0xff, l 255, t 0xff)
25(c 0x0, s 0xff, l 255, t 0xff) 26(c 0x0, s 0xff, l 255, t 0xff) 27(c 0x0, s 0xff, l 255, t 0xff) 28(c 0x0, s 0xff, l 255, t 0xff) 29(c 0x0, s 0xff, l 255, t 0xff)
Jul 7 04:53:30 hostname kernel: 30(c 0x0, s 0xff, l 255, t 0xff) 31(c 0x0, s 0xff, l 255, t 0xff)
Jul 7 04:53:30 hostname kernel: Pending list: 1(c 0x40, s 0x37, l 0)
Jul 7 04:53:30 hostname kernel: Kernel Free SCB list: 2 3
Jul 7 04:53:30 hostname kernel: Untagged Q(3): 1
Jul 7 04:53:30 hostname kernel: DevQ(0:0:0): 0 waiting
Jul 7 04:53:30 hostname kernel: DevQ(0:2:0): 0 waiting
Jul 7 04:53:30 hostname kernel: DevQ(0:3:0): 0 waiting
Jul 7 04:53:30 hostname kernel: scsi0:0:3:0: Device is active, asserting ATN
Jul 7 04:53:30 hostname kernel: Recovery code sleeping
Jul 7 04:53:30 hostname kernel: (scsi0:A:3:0): Abort Message Sent
Jul 7 04:53:30 hostname kernel: (scsi0:A:3:0): SCB 1 - Abort Completed.
Jul 7 04:53:30 hostname kernel: Recovery SCB completes
Jul 7 04:53:30 hostname kernel: Recovery code awake
Jul 7 04:53:30 hostname kernel: aic7xxx_abort returns 0x2002
 
Old 07-15-2003, 08:26 PM   #2
finegan
LQ Guru
 
Registered: Aug 2001
Location: Dublin, Ireland
Distribution: Slackware
Posts: 5,700

Rep: Reputation: 72
'uname -a' shows: 2.4.20-13.7smp #1 SMP Mon May 12 12:31:27 EDT 2003 i686 unknown


Evidently you're running up2date which got you a newer kernel rev. If you upgrade redhat, you're still going to end up with just another build of the same adaptec driver from 2.4.20.

There have been some major changes to the driver recently, and if aic7xxx is built as a module, you could probably download the new driver code, compile it against your kernel and replace the module. If not, its easiest to just compile a custom kernel for the box. Unfortunately, the new drivers have been added to in the -pre series for 2.4.22. I'm running one now, this far along in the 2.4.x development cycle most of the add-ons in -pre are just backports of stable stuff from the development kernel series, 2.5.x.

Cheers,

Finegan
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
apt-get >> abort problem Coldfish Debian 1 09-23-2005 09:42 AM
Adaptec aic7xxx driver for Redhat advanced server sivamurugu Linux - Hardware 1 10-07-2003 03:07 PM
Help, Starcraft will abort installation under WineX zLinuxz Linux - General 1 10-25-2002 10:17 PM
Adaptec 29160N and aic7xxx module Geggi Linux - Software 7 05-19-2001 09:25 AM
passwd: Critical error - immediate abort dojit Linux - General 2 03-07-2001 01:04 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 08:53 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration