数据库节点二VIP异常故障分析(三)

2015-02-03 04:25:33 · 作者: · 浏览: 128
tatcb:

buf = STATE=ONLINE on xxxxdb02

2012-05-01 20:36:23.115: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:

buf =

2012-05-01 20:36:23.115: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcqryapi:

resname = ora.xxxxdb02.vip, host = NULL,time = 0.004s

2012-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb0 2.vip]:Tue May 1

20:36:23 BEIST 2012 [ 921812 ] Checkinginterface existance

Tue May 1 20:36:23 BEIST 2012 [ 921812 ] Calling getifbyip

Tue May 1 20:36:23 BEIST 2012 [ 921812 ] getifbyip: started for xxx.xxx.xxx.4

2012-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb0 2.vip]:Tue May 1

20:36:23 BEIST 2012 [ 921812 ] getifbyip:checking if failover is happening ()

Tue May 1 20:36:23 BEIST 2012 [ 921812 ] getifbyip: failover i s not happening()

Tue May 1 20:36:23 BEIST 2012 [ 921812 ] Completed getifbyip

2012-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb0 2.vip]:Tue May 1

20:36:23 BEIST 2012 [ 921812 ] Completedwith initial interface test

2012-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:

clsrcexecut: envORACLE_CONFIG_HOME=/oracle/product/10.2.0/crs_1

17

2012-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb0 2.vip]:

clsrcexecut: cmd =/oracle/product/10.2.0/crs_1/bin/racgeut -e _USR_ORA_DEBUG=5 54

/oracle/product/10.2.0/crs_1/bin/racgvipstop xxxxdb02

2012-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:

clsrcexecut: rc = 0, time = 0.204s

2012-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:

clsrcposthaevt: reason = failure

2012-05-01 20:36:23.285: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:clsrccln:

exiting ora.xxxxdb02.vip refcount=1

2012-05-01 20:36:23.286: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:

clsrcprsrgter:gctx->prsrcfgref_clsrcgctx = 0

解决方案

根据分析结果,我们认为,10.2.0.5中CRS 对网络过于敏感,出现网络延时

的时候会对数据库集群产生较大影响,针对目前的情况,我们建议如下:

一、 详查网络问题,极偶然的丢包或延时在网络层面也属于常见情况。

有可能是线缆问题,也可能是交换机、服务器网卡、网络配置等

问题,需要详细检查网络情况。

二、 修改过于敏感的CRS 配置,将发包超时设置为3秒( 10.2.0.5之前的

值):

修改$ORA_CRS_HOME/bin/racgvip 脚本如下部分

# timeout of ping in number of loops

PING_TIMEOUT=" -c 1 -w 1"

修改成如下内容:

# timeout of ping in number of loops

PING_TIMEOUT=" -c 1 -w 3"

三、 由于Bug 6955040是VIP异常后被触发,目前优先解决VIP异常问

题,该Bug 可以忽略。