一:版本信息
操作系统版本:AIX 61009
数据库版本:11.2.0.3.11(RAC)
二:错误描述
检查发现一套使用ASM的rac两个实例基本上每个小时都会报一次ORA-32701错误,截取alert日志中错误信息如下:
Sat Dec 06 09:44:00 2014 Errors in file /oracle/app/oracle/diag/rdbms/egmmdb/egmmdb2/trace/egmmdb2_dia0_13500888.trc ?(incident=1041128): ORA-32701: Possible hangs up to hang ID=0 detected Incident details in: /oracle/app/oracle/diag/rdbms/egmmdb/egmmdb2/incident/incdir_1041128/egmmdb2_dia0_13500888_i1041128.trc DIA0 terminating blocker (ospid: 15335610 sid: 1299 ser#: 5849) of hang with ID = 3 ? ? requested by master DIA0 process on instance 1 ? ? Hang Resolution Reason: Although the number of affected sessions did not ? ? justify automatic hang resolution initially, this previously ignored ? ? hang was automatically resolved. ? ? by terminating session sid: 1299 ospid: 15335610 Sat Dec 06 09:44:01 2014 Sweep [inc][1041128]: completed Sweep [inc2][1041128]: completed DIA0 successfully terminated session sid:1299 ospid:15335610 with status 31. Sat Dec 06 09:45:35 2014 Errors in file /oracle/app/oracle/diag/rdbms/egmmdb/egmmdb2/trace/egmmdb2_dia0_13500888.trc ?(incident=1041129): ORA-32701: Possible hangs up to hang ID=0 detected Incident details in: /oracle/app/oracle/diag/rdbms/egmmdb/egmmdb2/incident/incdir_1041129/egmmdb2_dia0_13500888_i1041129.trc DIA0 terminating blocker (ospid: 15335610 sid: 1299 ser#: 5849) of hang with ID = 3 ? ? requested by master DIA0 process on instance 1 ? ? Hang Resolution Reason: Although the number of affected sessions did not ? ? justify automatic hang resolution initially, this previously ignored ? ? hang was automatically resolved. ? ?by terminating the process DIA0 successfully terminated process ospid:15335610. Sat Dec 06 09:45:37 2014 Sweep [inc][1041129]: completed Sweep [inc2][1041129]: completed Sat Dec 06 10:45:12 2014 Errors in file /oracle/app/oracle/diag/rdbms/egmmdb/egmmdb2/trace/egmmdb2_dia0_13500888.trc ?(incident=1041130): ORA-32701: Possible hangs up to hang ID=0 detected Incident details in: /oracle/app/oracle/diag/rdbms/egmmdb/egmmdb2/incident/incdir_1041130/egmmdb2_dia0_13500888_i1041130.trc Sat Dec 06 10:45:13 2014 Sweep [inc][1041130]: completed Sweep [inc2][1041130]: completed
egmmdb2_dia0_13500888_i1041129.trc中截取如下信息:
*** 2014-12-06 09:45:35.770
Resolvable Hangs in the System
? ? ? ? ? ? ? ? ? ? ? Root ? ? ? Chain Total ? ? ? ? ? ? ? Hang ? ? ? ? ? ? ??
? ?Hang Hang ? ? ? ? ?Inst Root ?#hung #hung ?Hang ? Hang ?Resolution ? ? ? ??
? ? ?ID Type Status ? Num ?Sess ? Sess ?Sess ?Conf ? Span ?Action ? ? ? ? ? ??
? ----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------
? ? ? 3 HANG RSLNPEND ? ?2 ?1299 ? ? 2 ? ? 2 ? HIGH GLOBAL Terminate Process ?
? Hang Resolution Reason: Although the number of affected sessions did not
? ? justify automatic hang resolution initially, this previously ignored
? ? hang was automatically resolved.
?
? ? ? inst# SessId ?Ser# ? ? OSPID PrcNm Event
? ? ? ----- ------ ----- --------- ----- -----
? ? ? ? ? 1 ? 1444 ?7855 ?10420452 ?M000 enq: FU - contention
? ? ? ? ? 2 ? 1299 ?5849 ?15335610 ?M000 not in wait <<<<====从这里我们可以看出一个M00*进程阻塞了另一个M00*进程(从上面的alert日志中我们可以看到,Hang Manager通过杀掉造成阻塞的1299会话解决Hang问题)
?
Dumping process info of pid[155.15335610] (sid:1299, ser#:5849)
? ? requested by master DIA0 process on instance 1.
?
*** 2014-12-06 09:45:35.770
Process diagnostic dump for oracle@egmmdb2 (M000), OS id=15335610,
pid: 155, proc_ser: 153, sid: 1299, sess_ser: 5849?
-----------------------------------------------------