系统环境 硬件平台 & 操作
IBM 570
操作系统版本 AIX 5.3
物理内存 32G
Oracle 产品及版本 10.2.0.5 RAC
业务类型 OLTP
背景概述
交易系统在xx月xx 日,节点二VIP异常下线导致节点二数据库服务失
效。接到请求后,第一时间进行连线处理。故障发生在凌晨 3点,而且
AIX(errpt)、Oracle DB(alert.log )、CRS (crsd.log 、ocssd.log、vip.log、
coredump )等均没有留下太多有效信息,情况非常复杂。
问题详细诊断及分析
一、检查errpt日志:
节点二VIP异常下线时,节点二上无报错产生。
二、检查CRS日志:
节点一:
2012-04-29 03:42:33.180: [ CRSRES][11376]32startRunnable: setting CLIvalues
― 说明:节点一上仅有一行信息反映出故障期间节点一上CRS曾经执行过命令。
节点二:
2012-04-29 03:41:12.308: [ CRSAPP][11263]32CheckResource error forora.xxxxdb02.vip error code =
1
2012-04-29 03:41:12.335: [ CRSRES][11263]32In stateChanged,ora.xxxxdb02.vip target is ONLINE
2012-04-29 03:41:12.335: [ CRSRES][11263]32ora.xxxxdb02.vip on xxxxdb02went OFFLINE unexpectedly
2012-04-29 03:41:12.335: [ CRSRES][11263]32StopResource: setting CLIvalues
2012-04-29 03:41:12.340: [ CRSRES][11263]32Attempting to stop`ora.xxxxdb02.vip` on member
`xxxxdb02`
2012-04-29 03:41:12.893: [ CRSRES][11269]32In stateChanged,ora.xxxxdb.xxxxdb1.xxxxdb2.srv target
is ONLINE
2012-04-29 03:41:12.894: [ CRSRES][11269]32ora.xxxxdb.xxxxdb1.xxxxdb2.srvon xxxxdb02 went OFFLINE
unexpectedly
2012-04-29 03:41:12.894: [ CRSRES][11269]32StopResource: setting CLIvalues
2012-04-29 03:41:12.899: [ CRSRES][11269]32Attempting to stop`ora.xxxxdb.xxxxdb1.xxxxdb2.srv` on
member `xxxxdb02`
2012-04-29 03:41:12.958: [ CRSRES][11263]32Stop of `ora.xxxxdb02.vip` onmember `xxxxdb02`
succeeded.
2012-04-29 03:41:12.971: [ CRSRES][11263]32ora.xxxxdb02.vipRESTART_COUNT=0 RESTART_ATTEMPTS=0
2012-04-29 03:41:12.976: [ CRSRES][11263]32ora.xxxxdb02.vip failed onxxxxdb02 relocating.
2012-04-29 03:41:13.025: [ CRSRES][11263]32StopResource: setting CLIvalues
2012-04-29 03:41:13.029: [ CRSRES][11263]32Attempting to stop
`ora.xxxxdb02.LISTENER_XXXXDB02.lsnr` onmember `xxxxdb02`
2012-04-29 03:41:13.146: [ CRSRES][11269]32Stop of`ora.xxxxdb.xxxxdb1.xxxxdb2.srv` on member
`xxxxdb02` succeeded.
2012-04-29 03:41:13.146: [ CRSRES][11269]32ora.xxxxdb.xxxxdb1.xxxxdb2.srv RESTART_COUNT=0
RESTART_ATTEMPTS=1
2012-04-29 03:41:13.159: [ CRSRES][11269]32Restartingora.xxxxdb.xxxxdb1.xxxxdb2.srv on xxxxdb02
2012-04-29 03:41:13.164: [ CRSRES][11269]32startRunnable: setting CLIvalues
2012-04-29 03:41:13.164: [ CRSRES][11269]32Attempting to start `ora.xxxxdb.xxxxdb1.xxxxdb2.srv` on
member `xxxxdb02`
2012-04-29 03:41:45.618: [ CRSAPP][11269]32StartResource error forora.xxxxdb.xxxxdb1.xxxxdb2.srv
error code = 1
2012-04-29 03:41:45.799: [ CRSRES][11269]32Start of`ora.xxxxdb.xxxxdb1.xxxxdb2.srv` on member
`xxxxdb02` failed.
2012-04-29 03:41:45.820: [ CRSRES][11269]32ora.xxxxdb.xxxxdb1.xxxxdb2.srv failed on xxxxdb02
relocating.
2012-04-29 03:41:45.885: [ CRSRES][11269]32Cannot relocate ora.xxxxdb.xxxxdb1.xxxxdb2.srvStopping
dependents
2012-04-29 03:41:45.897: [ CRSRES][11269]32StopResource: setting CLIvalues
2012-04-29 03:42:29.483: [ CRSRES][11263]32Stop of `ora.xxxxdb02.LISTENER_XXXXDB02.lsnr` on member
`xxxxdb02` succeeded.
2012-04-29 03:42:29.496: [ CRSRES][11263]32Attempting to start`ora.xxxxdb02.vip` on member
`xxdb01np5`
2012-04-29 03:42:32.036: [ CRSRES][11263]32Start of `ora.xxxxdb02.vip`on member `xxdb01np5`
succeeded.
― 说明:节点二是故障节点,上面信息反映出,由于VIP检测异常导致节点二上VIP被强
制OF