Oracle 11g控制文件损坏问题分析(一)

2014-11-24 18:59:02 · 作者: · 浏览: 51

当时是一个ORACLE 11g 的RAC系统,出现问题时数据库实例可以nomount打开但是在mount控制文件时就会出现如下告警:


ORA-3113 "end of file on communication channel"


然后整个sqlplus连接终止,需要重新连接,当然我们知道通常mount阶段无法进行,问题就出在控制文件本身的存在损坏的问题,但是对于专业的人员来说,如果仅仅满足这样的心态,显然是不行的,所以需要对其进行进一步分析:


但是在ASM日志中我们可以看到如下信息:


Tue Mar 27 13:35:11 2012
NOTE: client PROD1:PROD registered, osid 6726, mbr 0x1
Tue Mar 27 13:35:24 2012
NOTE: ASM client PROD1:PROD disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Trace records dumped in trace file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_6726.trc
Tue Mar 27 13:40:35 2012
NOTE: client PROD1:PROD registered, osid 7477, mbr 0x1
Tue Mar 27 13:41:45 2012
NOTE: ASM client PROD1:PROD disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Trace records dumped in trace file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_7477.trc
Tue Mar 27 13:41:47 2012
NOTE: client PROD1:PROD registered, osid 7736, mbr 0x1
Tue Mar 27 13:42:01 2012
NOTE: ASM client PROD1:PROD disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Trace records dumped in trace file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_7736.tr


对于生成的trace文件我们仅能够看到如下些信息:


2012-03-27 13:41:08.022438 :802EEFE8:KFNS:kfn.c@702:kfnDispatch(): calling server stub for KFNOP=5
2012-03-27 13:41:13.027006 :802EF0F4:KFNU:kfns.c@1924:kfnsBackground(): kfnsBackground completed in 5 seconds (KFNPM=0)
2012-03-27 13:41:13.027012 :802EF0F5:KFNS:kfn.c@729:kfnDispatch(): completed KFNOP=5
2012-03-27 13:41:13.027122 :802EF0F6:KFNS:kfn.c@702:kfnDispatch(): calling server stub for KFNOP=5


对于此问题显然没什么用处,并且问题应该还是在数据库方面。


所以对数据库实例的alert告警检查,当执行alter database mount状态时的日志如下:


Tue Mar 27 11:42:01 2012
alter database mount
This instance was first to mount
Tue Mar 27 11:42:01 2012
NOTE: Loaded library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so
NOTE: Loaded library: System
Tue Mar 27 11:42:01 2012
SUCCESS: diskgroup PRODDATA was mounted
Tue Mar 27 11:42:01 2012
NOTE: dependency between database PROD and diskgroup resource ora.PRODDATA.dg is established
USER (ospid: 26774): terminating the instance
Tue Mar 27 11:42:07 2012
System state dump requested by (instance=1, osid=26774), summary=[abnormal instance termination].
System State dumped to trace file /d01/oracle/11.2.0/admin/PROD1_db01/diag/rdbms/prod/PROD1/trace/PROD1_diag_26656.trc
Dumping diagnostic data in directory=[cdmp_20120327114207], requested by (instance=1, osid=26774), summary=[abnormal instance termination].
Instance terminated by USER, pid = 26774


还是不明显的日志提示,检查告警trace文件:/d01/oracle/11.2.0/admin/PROD1_db01/diag/rdbms/prod/PROD1/trace/PROD1_diag_26656.trc也无明细的信息


后来采用10046事件来跟踪mount这个过程,才看到了比较明细的提示,


alter session set events='10046 trace name context forever,level 12';
Trace file /d01/oracle/11.2.0/admin/PROD1_db01/diag/rdbms/prod/PROD1/trace/PROD1_ora_7764.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /d01/oracle/11.2.0
System name: Linux
Node name: db01.clc.com
Release: 2.6.18-238.el5
Version: #1 SMP Sun Dec 19 14:22:44 EST 2010
Machine: x86_64
Instance name: PROD1
Redo thread mounted by this instance: 0
Oracle process number: 31
Unix process pid: 7764, image: oracle@db01.clc.com (TNS V