1486555456] Do failover for: joadbtest02 <-------在此时failover失败.
2013-08-09 17:28:31.399: [ CRSRES][1513838912] startRunnable: setting CLI values
2013-08-09 17:28:31.414: [ CRSRES][1513838912] Attempting to start `ora.joadbtest02.vip` on member `maa01` <---尝试vip failover到节点1
2013-08-09 17:28:31.421: [ CRSRES][1530632512] startRunnable: setting CLI values
2013-08-09 17:28:31.434: [ CRSRES][1530632512] Attempting to start `ora.rac.db` on member `maa01`
2013-08-09 17:28:31.542: [ CRSRES][1530632512] Start of `ora.rac.db` on member `maa01` succeeded.
2013-08-09 17:28:37.863: [ CRSAPP][1513838912] StartResource error for ora.joadbtest02.vip error code = 1
2013-08-09 17:28:41.057: [ CRSRES][1513838912] Start of `ora.joadbtest02.vip` on member `maa01` failed. <---------VIP failover failed.
2013-08-09 17:28:41.085: [ CRSEVT][1486555456] Post recovery done evmd event for: joadbtest02
2013-08-09 17:28:41.085: [ CRSD][1486555456] SM: recoveryDone: 0
2013-08-09 17:28:41.098: [ CRSEVT][1486555456] Processing RecoveryDone
再查看ora.joadbtest02.vip日志文件:
ora.joadbtest02.vip:
2013-08-09 17:28:34.723: [ RACG][1353934704] [11316][1353934704][ora.joadbtest02.vip]: checkIf: interface eth2 is down <--- is it clue
Invalid parameters, or failed to bring up VIP (host=MAA01)
2013-08-09 17:28:34.729: [ RACG][1353934704] [11316][1353934704][ora.joadbtest02.vip]: clsrcexecut: cmd = /oracle/app/11gR1/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /oracle/app/11gR1/crs/bin/racgvip start joadbtest02
2013-08-09 17:28:34.729: [ RACG][1353934704] [11316][1353934704][ora.joadbtest02.vip]: clsrcexecut: rc = 1, time = 3.150s
2013-08-09 17:28:37.861: [ RACG][1353934704] [11316][1353934704][ora.joadbtest02.vip]: clsrcexecut: cmd = /oracle/app/11gR1/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /oracle/app/11gR1/crs/bin/racgvip check joadbtest02
2013-08-09 17:28:37.861: [ RACG][1353934704] [11316][1353934704][ora.joadbtest02.vip]: clsrcexecut: rc = 1, time = 3.130s
2013-08-09 17:28:37.861: [ RACG][1353934704] [11316][1353934704][ora.joadbtest02.vip]: end for resource = ora.joadbtest02.vip, action = start, status = 1, time = 6.350s
此处已经看出线索了,看来问题出在网卡这里,节点1的Public IP的网卡是eth0,不知道何故,节点二Public IP的网卡却为eth2,
由于客户之前的messages日志并没有保留,
Oracle和集群更早期的日志也没有。具体为什么两个节点的Public IP不一样不得而知。
解决方法:
将两个节点Public IP的网卡设置为一致,具体操作可参考我之前写的一篇文章:
VIP不能正常启动,报错CRS-1006
|