RHEL3 attached to a SAN

snapbean · 02-15-2007, 11:47 AM

Guys....any help here would be greatly appreciated. I have 2 RHEL 3 servers attached to our SAN, they each have there own separate space on the SAN 30GB each. Theses 2 servers are our PRODUCTION and TEST server and my question is this. Today we tried to fail over these server meaning if our prod server fails I would like the Test server to use the PROD servers space on the san. So when we swapped LUNs on each HOST each server had issues mounting the others SAN space? Any ideas?? These two servers are identical servers as is the space allotcated on the San, they are mounted identically on each server with the same mount points and Dev names... We got an error about it mounting and it possibly not being an ext2 partition which it is. Should I have to mount this space differently??? Any suggestions are welcomed!!!!!

Thanks

MensaWater · 02-15-2007, 02:05 PM

You're doing this for everything including the boot/root filesystems?

I can see problems with this - for example the MAC addresses you have stored in config files for your NICs won't be the same.

There are clustering utilities designed to do what you're talking about. Are you using any of those?

You may want to have a look at the following:
http://lcic.org/

snapbean · 02-15-2007, 02:27 PM

Thanks for the reply.
NO we are not doing this for all the root files systems...this actually is just a seperate Partition for the software these servers run. IN the event we loose our production server, by some chance ( hardware ) we would like to use the SAN partition for that server and attach it to the test server. So in our SAN we would just in theory move the LUNS to the appropriate HOST...as we tried today the Test server would not boot having the TEST servers SAN space attached. It claimed that that space was not ext2 and needed fsch run...

They are not in a clustered enviroment either, the test server really is there as a test environment to build new services and test before moving to production. Really they are completely seperate. Even as the SAN see's them they are seperate HOST's with Seperate LUNs carved for SAN space.

Thanks

MensaWater · 02-15-2007, 02:50 PM

Haven't done this on Linux except with OCFS from Oracle which allows you to actually share the filesystems on two servers at the same time (it has its own locking mechanism). However I wouldn't recommend using that unless its for a database (designed for Oracle RAC).

On HP-UX we have Veritas Volume Manager (VxVM) and do this kind of disk move frequently. That requires it to be "deported" from the original server and "imported" on the target.

You should be able to do it with LVM. On the HP-UX at another job we were able to do that with the HP LVM which is similar to the one in Linux. We were even able to mount them Read Only on the other server so we could see them even while they were active Read/Write on the first server. Just had a look and the vgexport/vgimport commands appear to have the same function.

man vgexport
man vgimport

Of course you'd have to put the SAN volumes in a LVM Volume Group (VG). Have a look at:

man lvm
for an overview. This will point you to the commands used to do that.

snapbean · 02-16-2007, 07:26 AM

I'm not understanding though why I could'nt unmount each SAN allocation from each server and then on the SAN move LUN's to the other server and just mount new space?? When each server is basically the same, POwerpath names them the same so my mount points are the same with the same name. I just get an error about incorrect file system or something..........to run fsck. When I moved everything back all is good? Is there specific data about the current server in that space? where its not allowing me to mount on another server?

MensaWater · 02-16-2007, 10:08 AM

Ah PowerPath - just went through this in migrating servers from one Clariion to another.

One thing to keep in mind - Linux may redo your device names in discovery order. That is to say if you have an sda, sdb, sdc and sdd then add another 4 devices there is no guarantee the original 4 will be the same sda, sdb, sdc and sdd - it may be they become sde, sdf, sdg and sdh and the new 4 become the sda, sdb, sdc and sdd. This is unlike HP-UX for example which simply creates new device entries for the added devices.

We went for a fairly involved process for moving volumes. As steps in doing that we stopped PowerPath and also made it recreate its files by doing:
Rename PowerPath configuration files
(in /etc, emcp_devicesDB.dat, emcp_deviceDB.idx, powermt.custom)
NOTE: mv NOT cp

On restart it would recreate these files with the new device order.

You can use powermt display dev=all to get a good view of which sd? devices are associated with which PowerPath pseudodevices.

Also for each add we ran Qlogic's ql-scan-lun.sh utility to make it see the drives. You may wish to get that if you're using Qlogic fibre HBAs or see if the vendor you use has an equivalent. There was another scsi_scan utility we had used in testing that worked fine on one server but on another gave very odd results. The Qlogic utility didn't have the same issues.

Since you're not doing exactly what we were doing I'd recommend you test this on non-critical systems first.

snapbean · 02-19-2007, 07:32 AM

Thanks ,,,,
That does make sense to me and I will have to try it and see if the dev names have changed. I'll let you know. Thanks for the info.

Thanks

MensaWater · 02-19-2007, 08:09 AM

No problem.

If you're using PowerPath does that mean your using Symmetrix or Clariion for the SAN storage? If the latter make sure you stop/start Naviagent when doing changes as well.

snapbean · 02-19-2007, 10:18 AM

Each server is running PowerPath and our SAN environment currently is EMC CX600 moving soon to the new CX3-40 SAN. In our initial test we powered down each server, moved LUNS and powered back on. I do believe you may be right in thinking the drives have changed or dev's each server sees?? I"ll have to try again and see what the server see's on boot. or remount.

Thanks.

MensaWater · 02-19-2007, 10:26 AM

OK CX600 and CX3 are Clariion. If you're running Navisphere on the Linux client (you should be) then remember to stop/start Naviagent as mentioned above.

We use CX500 for Exchange, Have 2 CX700s and a CX3 as well as a Symmetrix DMX 3 though we don't use the Symm on Linux. The changes I mentioned we'd done were on the CX700s.

born4linux · 02-23-2007, 02:24 AM

are you using LVM? what's your SAN server and HBA?

MensaWater · 02-23-2007, 09:08 AM

No LVM on the SAN attached stuff.

HBA is Qlogic 2340 (shows up as QLA2312 in lspci but that's OK).

Fibre switches are EMC Connectrix (OEM'ed McData)

Storage is Clariion CX700.

We use Navisphere (naviagent) on the host to talk to the Clariion.

We use EMC PowerPath to do multipathing to the devices. (2 Fibre HBAs in each host, 2 SPAs on the Clariion = 4 paths for each drive).

So the drives as mounted are:

/dev/emcpowera1 300G 153G 148G 51% /db
/dev/emcpowerb1 30G 7.1G 23G 24% /db/thisone
/dev/emcpowerc1 10G 7.9G 2.2G 79% /db/thistwo
/dev/emcpowerd1 10G 7.9G 2.2G 79% /db/thisthree

So for example - /dev/emcpowerd would be the EMC pseudodevice using the 4 real devices /dev/sdc, /dev/sdh, /dev/sdm & /dev/sdr. The relationship can be seen with "powermt display dev=all" as shown below:

Pseudo name=emcpowerd
CLARiiON ID=APM00011111111 [Production RAC]
Logical device ID=60060160827618004A0EE396866ADB11 [LUN 7]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0
Owner: default=SP B, current=SP B
==============================================================================
---------------- Host --------------- - Stor - -- I/O Path - -- Stats ---
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
==============================================================================
2 QLogic Fibre Channel 2300 sdc SP A1 active alive 0 0
2 QLogic Fibre Channel 2300 sdh SP B1 active alive 0 0
3 QLogic Fibre Channel 2300 sdm SP A0 active alive 0 0
3 QLogic Fibre Channel 2300 sdr SP B0 active alive 0 0

Pseudo name=emcpowere
CLARiiON ID=APM00011111111 [Production RAC]
Logical device ID=6006016082761800647983C1866ADB11 [LUN 6]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0
Owner: default=SP A, current=SP A
==============================================================================
---------------- Host --------------- - Stor - -- I/O Path - -- Stats ---
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
==============================================================================
2 QLogic Fibre Channel 2300 sdb SP A1 active alive 0 0
2 QLogic Fibre Channel 2300 sdg SP B1 active alive 0 0
3 QLogic Fibre Channel 2300 sdl SP A0 active alive 0 0
3 QLogic Fibre Channel 2300 sdq SP B0 active alive 0 0

Pseudo name=emcpowerc
CLARiiON ID=APM00011111111 [Production RAC]
Logical device ID=6006016082761800727C8D83866ADB11 [LUN 8]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0
Owner: default=SP A, current=SP A
==============================================================================
---------------- Host --------------- - Stor - -- I/O Path - -- Stats ---
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
==============================================================================
2 QLogic Fibre Channel 2300 sdd SP A1 active alive 0 0
2 QLogic Fibre Channel 2300 sdi SP B1 active alive 0 0
3 QLogic Fibre Channel 2300 sdn SP A0 active alive 0 0
3 QLogic Fibre Channel 2300 sds SP B0 active alive 0 0

Pseudo name=emcpowerb
CLARiiON ID=APM00011111111 [Production RAC]
Logical device ID=60060160827618008C0F6C6B866ADB11 [LUN 9]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0
Owner: default=SP B, current=SP B
==============================================================================
---------------- Host --------------- - Stor - -- I/O Path - -- Stats ---
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
==============================================================================
2 QLogic Fibre Channel 2300 sde SP A1 active alive 0 0
2 QLogic Fibre Channel 2300 sdj SP B1 active alive 0 0
3 QLogic Fibre Channel 2300 sdo SP A0 active alive 0 0
3 QLogic Fibre Channel 2300 sdt SP B0 active alive 0 0

Pseudo name=emcpowera
CLARiiON ID=AAPM00011111111 [Production RAC]
Logical device ID=6006016082761800B2BF2A54866ADB11 [LUN 10]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0
Owner: default=SP A, current=SP A
==============================================================================
---------------- Host --------------- - Stor - -- I/O Path - -- Stats ---
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
==============================================================================
2 QLogic Fibre Channel 2300 sdf SP A1 active alive 0 0
2 QLogic Fibre Channel 2300 sdk SP B1 active alive 0 0
3 QLogic Fibre Channel 2300 sdp SP A0 active alive 0 0
3 QLogic Fibre Channel 2300 sdu SP B0 active alive 0 0

As a final note: We don't boot from the Clariion storage - we have Dell PERC cards so the OS resides on internal RAID disks. We use the SAN storage for the database.

rahulk · 02-26-2007, 08:07 PM

I agree to what you are saying. But there is one issue with PowerPath which I am facing. I have two servers with identical hardware running RHEL 2.1 AS. I have Oracle running on them and use EMC Symm.. for providing the LUNs.

I agree that linux does not maintain /dev/sd[a-z] in order and hence we use PowerPath for creating the Pseudo devices /dev/emcpower[a-z].

Just two days back, EMC provided me a 64 GB LUN which is visible from both the servers. I rebooted the boxes and found out that the new LUN had different Pseudo devices.
On server1 ---> /dev/emcpowers
On server2 ---> /dev/emcpowert

I had a look at it carefully and found out that the /etc/opt/emcpower/emcpower.conf file had the entry of emcpowers on the second server, but its vid was pointing to a device which is available on server 1. On doing fdisk, it was not able to open the device.

So the issue here is that the server 2 unexpectedly skipped /dev/emcpowers. Its a issue here since I would like to have all the names of pseudo devices to be the same on both servers for same LUN. Could I do something here?

Please help!!!
Thanks.

MensaWater · 02-27-2007, 09:43 AM

You could try doing the removal of the PowerPath config files I mentioned in an earlier post. You'd of course Stop PowerPath then restart it. It sounds like it may have had a pseudo device once that it no longer has.

We did the above for our Oracle RAC shared storage on the CX700 running RHEL 3. I'm not sure how RHEL 2.1 would treat it differently.

We were ultra cautious in doing all this because what we didn't want to have happen is for Oracle to startup with the wrong pseudo devices. As can be seen you can get the actual LUN information from your Clariion in the powermt display dev=all. You need to make sure you are NOT autostarting Oracle DB until after you've insured your pseudodevices and related mounts are exactly what you think they are.
If Oracle attempts to start and has the wrong devices it will likely corrupt things. At a minimum you'd want to do a cold backup of the database before doing this.

So - repeated caveat: PROCEED AT YOUR OWN RISK.

rahulk · 02-27-2007, 10:07 AM

yup that is right. We don't start the oracle services unless we ensure that all the devices are properly mapped.

Thanks for the reply.