中午的时候,我们生产上的某个数据库,cpu一直居高不下
通过如下的sql语句,我们查看当时数据库的等待,争用的情况:
select s.SID,
s.SERIAL#,
'kill -9 ' || p.SPID,
s.MACHINE,
s.OSUSER,
s.PROGRAM,
s.USERNAME,
s.last_call_et,
a.SQL_ID,
s.LOGON_TIME,
a.SQL_TEXT,
a.SQL_FULLTEXT,
w.EVENT,
a.DISK_READS,
a.BUFFER_GETS
from v$process p, v$session s, v$sqlarea a, v$session_wait w
where p.ADDR = s.PADDR
and s.SQL_ID = a.sql_id
and s.sid = w.SID
and s.STATUS = 'ACTIVE'
order by s.last_call_et desc;
从event可以看到,是latch 的争用导致的原因
通过如果的sql,查看是什么样的latch
select * from v$session_wait where event like 'latch free';
P2就是 这个latch的name,通过v$latchname这个视图就可以知道哪个具体的latch
1:45:55 PM SQL> select * from v$latchname where latch#=164;
LATCH# NAME HASH
---------- ---------------------------------------------------------------- ----------
164 simulator hash latch 2233208730
查看latch的历史情况
2:11:59 PM SQL> select name,gets,misses,sleeps from v$latch where sleeps >0 order by sleeps desc; NAME GETS MISSES SLEEPS ---------------------------------------------------------------- ---------- ---------- ---------- simulator hash latch 4827860212 135426899 10890947 cache buffers chains 1619822817 2850976006 4747728 gc element 4660052091 25748270 175073 resmgr:schema config 91872524 153968 95708 ges resource hash list 174151449 1070556 55459 Real-time plan statistics latch 40953155 651496 44527 call allocation 3301878 265908 43501 row cache objects 336300485 4970324 19366
这个simulator hash latch已经是显著的latch部分
eagle在他的网站上有篇文章讲到了关于simulator这个
http://www.eygle.com/archives/2011/11/simulator_lru_latch.html
simulator意为模拟,也就是说当Oracle在内存中进行数据块处理时,实际上还会在预先分配的Buffer中进行相关信息记录,如DBA信息,当数据块被老化之后,下次读取时,如果请求的数据在Simulator内存中存在,则认为继续缓存该数据块是有意义的,通过监控并模拟统计这些操作,并对计算结果加权运算,就可以实现对于内存的调整建议。
在模拟过程中,也是通过Latch来实现的,相关的Latch就有 simulator lru latch 、 simulator hash latch等.
就Buffer Cache而言,如果系统中该类争用严重,则可以考虑关闭db_cache_advice,消除这部分内部操作对于性能的影响。
以下是一个相关BUG,在该Bug中,由于DB_CACHE_ADVICE的开启导致了严重的simulator lru latch的竞争:
Bug 5918642 Heavy latch contention with DB_CACHE_ADVICE on
This note gives a brief overview of bug 5918642.
The content was last updated on: 01-APR-2008
Click here for details of each of the sections below.
Affects:
Product (Component) Oracle Server (Rdbms) Range of versions believed to be affected Versions < 11.2 Versions confirmed as being affected
- 10.2.0.3
Platforms affected Generic (all / most platforms affected) Fixed:
This issue is fixed in
- 11.2 (Future Release)
- 10.2.0.4 (Server Patch Set)
- 11.1.0.7 (Server Patch Set)
Symptoms:
Related To:
- Latch Contention
- Waits for "latch free"
- Performance Monitoring
- DB_CACHE_ADVICE
Description
High simulator lru latch contention can occur when db_cache_advice is set to ON if there is a large buffer cache. Workaround: Set db_cache_advice to OFF
当然,这个只是治标不治本的做法,这个是显现的表象的问题,根源的问题还是这个sql语句有问题
当一个数据块读入到sga中时,该块的块头(buffer header)会放置在一个hash bucket的链表(hash chain)中。该内存结构由一系列cache buffers chains子latch保护(又名hash latch或者cbc latch)。对Buffer cache中的块,要select或者update、insert,delete等,都得先获得cache buffers chains子latch,以保证对chain的排他访问。若在过程中发生争用,就会等待latch:cache buffers chains事件。
产生原因: 1. 低效率的SQL语句(主要体现在逻辑读过高) 在某些环境中,应用程序打开执行相同的低效率SQL语句的多个并发会话,这些SQL语句都设法得到相同的数据集,每次执行都带有高 BUFFER_GETS(逻辑读取)的SQL语句是主要的原因。相反,较小的逻辑读意味着较少的latch get操作,从而减少锁存器争用并改