buf = STATE=ONLINE on xxxxdb02
2012-05-01 20:36:23.115: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf =
2012-05-01 20:36:23.115: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcqryapi:
resname = ora.xxxxdb02.vip, host = NULL,time = 0.004s
2012-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb0 2.vip]:Tue May 1
20:36:23 BEIST 2012 [ 921812 ] Checkinginterface existance
Tue May 1 20:36:23 BEIST 2012 [ 921812 ] Calling getifbyip
Tue May 1 20:36:23 BEIST 2012 [ 921812 ] getifbyip: started for xxx.xxx.xxx.4
2012-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb0 2.vip]:Tue May 1
20:36:23 BEIST 2012 [ 921812 ] getifbyip:checking if failover is happening ()
Tue May 1 20:36:23 BEIST 2012 [ 921812 ] getifbyip: failover i s not happening()
Tue May 1 20:36:23 BEIST 2012 [ 921812 ] Completed getifbyip
2012-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb0 2.vip]:Tue May 1
20:36:23 BEIST 2012 [ 921812 ] Completedwith initial interface test
2012-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrcexecut: envORACLE_CONFIG_HOME=/oracle/product/10.2.0/crs_1
17
2012-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb0 2.vip]:
clsrcexecut: cmd =/oracle/product/10.2.0/crs_1/bin/racgeut -e _USR_ORA_DEBUG=5 54
/oracle/product/10.2.0/crs_1/bin/racgvipstop xxxxdb02
2012-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrcexecut: rc = 0, time = 0.204s
2012-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrcposthaevt: reason = failure
2012-05-01 20:36:23.285: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:clsrccln:
exiting ora.xxxxdb02.vip refcount=1
2012-05-01 20:36:23.286: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrcprsrgter:gctx->prsrcfgref_clsrcgctx = 0
解决方案
根据分析结果,我们认为,10.2.0.5中CRS 对网络过于敏感,出现网络延时
的时候会对数据库集群产生较大影响,针对目前的情况,我们建议如下:
一、 详查网络问题,极偶然的丢包或延时在网络层面也属于常见情况。
有可能是线缆问题,也可能是交换机、服务器网卡、网络配置等
问题,需要详细检查网络情况。
二、 修改过于敏感的CRS 配置,将发包超时设置为3秒( 10.2.0.5之前的
值):
修改$ORA_CRS_HOME/bin/racgvip 脚本如下部分
# timeout of ping in number of loops
PING_TIMEOUT=" -c 1 -w 1"
修改成如下内容:
# timeout of ping in number of loops
PING_TIMEOUT=" -c 1 -w 3"
三、 由于Bug 6955040是VIP异常后被触发,目前优先解决VIP异常问
题,该Bug 可以忽略。