数据库节点二VIP异常故障分析(一)

2015-02-03 04:25:33 · 作者: · 浏览: 126

系统环境 硬件平台 & 操作

IBM 570

操作系统版本 AIX 5.3

物理内存 32G

Oracle 产品及版本 10.2.0.5 RAC

业务类型 OLTP

背景概述

交易系统在xx月xx 日,节点二VIP异常下线导致节点二数据库服务失

效。接到请求后,第一时间进行连线处理。故障发生在凌晨 3点,而且

AIX(errpt)、Oracle DB(alert.log )、CRS (crsd.log 、ocssd.log、vip.log、

coredump )等均没有留下太多有效信息,情况非常复杂。

问题详细诊断及分析

一、检查errpt日志:

节点二VIP异常下线时,节点二上无报错产生。

二、检查CRS日志:

节点一:

2012-04-29 03:42:33.180: [ CRSRES][11376]32startRunnable: setting CLIvalues

― 说明:节点一上仅有一行信息反映出故障期间节点一上CRS曾经执行过命令。

节点二:

2012-04-29 03:41:12.308: [ CRSAPP][11263]32CheckResource error forora.xxxxdb02.vip error code =

1

2012-04-29 03:41:12.335: [ CRSRES][11263]32In stateChanged,ora.xxxxdb02.vip target is ONLINE

2012-04-29 03:41:12.335: [ CRSRES][11263]32ora.xxxxdb02.vip on xxxxdb02went OFFLINE unexpectedly

2012-04-29 03:41:12.335: [ CRSRES][11263]32StopResource: setting CLIvalues

2012-04-29 03:41:12.340: [ CRSRES][11263]32Attempting to stop`ora.xxxxdb02.vip` on member

`xxxxdb02`

2012-04-29 03:41:12.893: [ CRSRES][11269]32In stateChanged,ora.xxxxdb.xxxxdb1.xxxxdb2.srv target

is ONLINE

2012-04-29 03:41:12.894: [ CRSRES][11269]32ora.xxxxdb.xxxxdb1.xxxxdb2.srvon xxxxdb02 went OFFLINE

unexpectedly

2012-04-29 03:41:12.894: [ CRSRES][11269]32StopResource: setting CLIvalues

2012-04-29 03:41:12.899: [ CRSRES][11269]32Attempting to stop`ora.xxxxdb.xxxxdb1.xxxxdb2.srv` on

member `xxxxdb02`

2012-04-29 03:41:12.958: [ CRSRES][11263]32Stop of `ora.xxxxdb02.vip` onmember `xxxxdb02`

succeeded.

2012-04-29 03:41:12.971: [ CRSRES][11263]32ora.xxxxdb02.vipRESTART_COUNT=0 RESTART_ATTEMPTS=0

2012-04-29 03:41:12.976: [ CRSRES][11263]32ora.xxxxdb02.vip failed onxxxxdb02 relocating.

2012-04-29 03:41:13.025: [ CRSRES][11263]32StopResource: setting CLIvalues

2012-04-29 03:41:13.029: [ CRSRES][11263]32Attempting to stop

`ora.xxxxdb02.LISTENER_XXXXDB02.lsnr` onmember `xxxxdb02`

2012-04-29 03:41:13.146: [ CRSRES][11269]32Stop of`ora.xxxxdb.xxxxdb1.xxxxdb2.srv` on member

`xxxxdb02` succeeded.

2012-04-29 03:41:13.146: [ CRSRES][11269]32ora.xxxxdb.xxxxdb1.xxxxdb2.srv RESTART_COUNT=0

RESTART_ATTEMPTS=1

2012-04-29 03:41:13.159: [ CRSRES][11269]32Restartingora.xxxxdb.xxxxdb1.xxxxdb2.srv on xxxxdb02

2012-04-29 03:41:13.164: [ CRSRES][11269]32startRunnable: setting CLIvalues

2012-04-29 03:41:13.164: [ CRSRES][11269]32Attempting to start `ora.xxxxdb.xxxxdb1.xxxxdb2.srv` on

member `xxxxdb02`

2012-04-29 03:41:45.618: [ CRSAPP][11269]32StartResource error forora.xxxxdb.xxxxdb1.xxxxdb2.srv

error code = 1

2012-04-29 03:41:45.799: [ CRSRES][11269]32Start of`ora.xxxxdb.xxxxdb1.xxxxdb2.srv` on member

`xxxxdb02` failed.

2012-04-29 03:41:45.820: [ CRSRES][11269]32ora.xxxxdb.xxxxdb1.xxxxdb2.srv failed on xxxxdb02

relocating.

2012-04-29 03:41:45.885: [ CRSRES][11269]32Cannot relocate ora.xxxxdb.xxxxdb1.xxxxdb2.srvStopping

dependents

2012-04-29 03:41:45.897: [ CRSRES][11269]32StopResource: setting CLIvalues

2012-04-29 03:42:29.483: [ CRSRES][11263]32Stop of `ora.xxxxdb02.LISTENER_XXXXDB02.lsnr` on member

`xxxxdb02` succeeded.

2012-04-29 03:42:29.496: [ CRSRES][11263]32Attempting to start`ora.xxxxdb02.vip` on member

`xxdb01np5`

2012-04-29 03:42:32.036: [ CRSRES][11263]32Start of `ora.xxxxdb02.vip`on member `xxdb01np5`

succeeded.

― 说明:节点二是故障节点,上面信息反映出,由于VIP检测异常导致节点二上VIP被强

制OF