Need Help - Two node cluster, RHEL 6 High Availability Add on , with Oracle over NFS

ineedtosolvetheproblem · 09-28-2011, 03:38 AM

we have set up two node cluster, with Oracle datafiles running in the NFS mounted /data

fail over is working with DB crash, power failure

However, a loss of connectivity on eth0 causes the following problems:

1. The /data mount is not detected to have failed or hung. The netfs.sh script which we are using on the cluster.conf doesnt detect this and try to unount it.

2. The cluster doesn't know eth0 is dead.

clustat reports everything as normal throughout, so nothing happens. Additionally, because /data is essentially hung, manual failover via clusvcadm is also failing.

Here is our cluster.conf file. can anyone contribute please

<?xml version="1.0"?>
<cluster config_version="35" name="cluster1">
<fence_daemon post_fail_delay="0"/>
<clusternodes>
<clusternode name="test1.private" nodeid="1">
<fence>
<method name="manual">
<device name="manual"/>
</method>
</fence>
</clusternode>
<clusternode name="test2.private" nodeid="2">
<fence>
<method name="manual">
<device name="manual"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_manual" name="manual"/>
</fencedevices>
<rm>
<failoverdomains/>
<service autostart="1" name="oracle" recovery="relocate">
<netfs ref="data_mount"/>
<script ref="oracle_resource"/>
<ip address="192.168.1.86" monitor_link="eth0"/>
</service>
<resources>
<script file="/usr/local/bin/test.sh" name="oracle_resource"/>
<netfs export="/data/dbcluster" force_unmount="1" fstype="nfs" host="test.main.example.com" mountpoint="/data" name="data_mount" opt ions="rw,bg,hard,nointr,tcp,nfsvers=3,timeo=600,rsize=32768,wsize=32768,actimeo=0"/>
</resources>
</rm>
</cluster>"

kbp · 09-28-2011, 11:21 PM

Sounds like you only have one nic .. is this correct? Most people building a HA cluster would have separate nics for presentation, heartbeat and storage. If you only have one nic I'm guessing that you're not using RAC ?