Library Cache Latch Contention due to Multiple Version Count – Day 2 of AIOUG
September 24, 2009 2 Comments
Day 2 of “All India Oracle User Group” was continuation of where I left on Day 1 – Library Cache and Multiple Version Count. On Day 2, I had a real life example on the issue that I presented. On Day 1, the conceptual knowledge of Library Cache, Parent & Child Cursors and some of the fundamental, but not to be ignored, causes of Multiple Child Cursors was shared.
One of the major cause of Multiple Version for an SQL is CURSOR_SHARING. When cursor_sharing is set to SIMILAR, optimizer checks for either of these :
Is the Operator NEITHER of ‘=’ or ‘!=’
OR
Are there Column Histograms present on the Columns
When either of these are true, then, irrespective of whether the execution plan of the new child cursor is already stored, a new Child cursor is created. The only exception is, if the runtime value of the bind variable matches that of an already stored child cursor. We will see this with an example.
Before we begin, as a prerequisite, we will create our test case that will be used for both the cases mentioned above. The data in this table is populated in a skewed manner and we create an Index on the skewed column.
PREREQUISITE
exec dbms_random.seed(0);
drop table test_hist;
create table test_hist as
with n as
(select /*+ materialize */ round(dbms_random.normal,0) n1 from all_objects where rownum<=10000)
select decode(sign(n1),-1,n1+9,n1) empid,
round(dbms_random.value(1000,2000),0) sal,
case when mod(round(dbms_random.value(101,110),0),2)=0 then 'M' else 'F' end gender
from n
where rownum<=10000;
create index test_hist_idx on test_hist(empid);
select empid, cnt, round(cnt/final_total*100,2) percentage from (
select empid, cnt, sum(cnt) over() final_total
from (
select empid, count(*) over(partition by empid) cnt,
row_number() over (partition by empid order by empid) rno
from test_hist)
where rno=1);
EMPID CNT PERCENTAGE
---------- ---------- ----------
0 3937 39.37
1 2289 22.89
2 630 6.3
3 55 .55
4 4 .04
5 4 .04
6 64 .64
7 646 6.46
8 2371 23.71
9 rows selected.
Once the test case is ready, we will walk through the two issues, mentioned above, introduced by Cursor_Sharing set to SIMILAR. For this purpose, we will generate statistics in two different ways: With & Without Histograms.
TEST CASE 1 : With Histogram and Equality Predicates
Gather Stats on this table with method_opt as ‘FOR COLUMNS EMPID SIZE 74’. With this, we generate Frequency Histogram on Empid column. Since the column has only 9 distinct values, creating histogram will generate different execution plans for the values with less data.
Combination of Cursor_Sharing to SIMILAR and unwanted Histograms can cause a major performance bottleneck, with contention on Library Cache related waits. Therefore, it is always recommended to generate Histograms only when required. Oracle 10g changed the default value of method_opt from ‘FOR ALL COLUMNS SIZE 1’ to ‘FOR ALL COLUMNS SIZE AUTO’ and with this change, any database with CURSOR_SHARING set to SIMILAR, would face a severe latch contention issue after upgrading to 10g, as unwanted histograms are generated on columns with large distinct values. As you will see in this test case, these unwanted histograms introduce more child cursors, further causing a hash chain to grow longer. We all know that latches are to be held for brief duration, but, if number of child cursors grow, inspecting a hash chain for a matching cursor can take significant amount of time preventing others to get the latch.
alter system set cursor_sharing=similar scope=memory; select /*+ VIVEK */ sal from test_hist where empid=0; select /*+ VIVEK */ sal from test_hist where empid=2; select /*+ VIVEK */ sal from test_hist where empid=4; select /*+ VIVEK */ sal from test_hist where empid=7; select /*+ VIVEK */ sal from test_hist where empid=8;
Set CURSOR_SHARING to SIMILAR and execute 5 queries with different values passed to empid. Once these are run, check against v$sqlarea, v$sql and finally, v$sql_shared_cursor to get the cause of multiple versions. The output would be similar to that of mine, except for the plan_hash_value. In my case, for all the cases above, the optimizer opted for Full Table Scan. In your case, the plans would differ. Even though the execution plans may or may not match, it is irrelevant for this discussion. After execution of the 5 queries, the output would look like:
SQL> select sql_id, sql_text, executions, child_latch, version_count, plan_hash_value, hash_value fr
om v$sqlarea where sql_text like 'select /*+ VIVEK%';
SQL_ID SQL_TEXT EXECUTIONS CHILD_LATCH VERSION_COUNT PLAN_HASH_VALUE HASH_VALUE
------------- ------------------------------ ---------- ----------- ------------- --------------- --
baz378z2m5q42 select /*+ VIVEK */ sal from t 5 2 5 3529380458 3308443778
est_hist where empid=:"SYS_B_0
"
Elapsed: 00:00:00.10
SQL>select sql_id, sql_text, executions, child_latch, plan_hash_value, child_number, child_address from v$sql where sql_text like 'select /*+ VIVEK%';
SQL_ID SQL_TEXT EXECUTIONS CHILD_LATCH PLAN_HASH_VALUE CHILD_NUMBER CHILD_AD
------------- ------------------------------ ---------- ----------- --------------- ------------ ---
baz378z2m5q42 select /*+ VIVEK */ sal from t 1 2 3529380458 0 699F048C
est_hist where empid=:"SYS_B_0
"
baz378z2m5q42 select /*+ VIVEK */ sal from t 1 2 3529380458 1 68A65B38
est_hist where empid=:"SYS_B_0
"
baz378z2m5q42 select /*+ VIVEK */ sal from t 1 2 3529380458 2 68A47B00
est_hist where empid=:"SYS_B_0
"
baz378z2m5q42 select /*+ VIVEK */ sal from t 1 2 3529380458 3 69519264
est_hist where empid=:"SYS_B_0
"
baz378z2m5q42 select /*+ VIVEK */ sal from t 1 2 3529380458 4 6A3B1AE0
est_hist where empid=:"SYS_B_0
"
Elapsed: 00:00:00.25
SQL> select * from v$sql_shared_cursor where sql_id='baz378z2m5q42';
SQL_ID ADDRESS CHILD_AD CHILD_NUMBER U S O O S L S E B P I S T A B D L T R I I R L I O S M U
------------- -------- -------- ------------ - - - - - - - - - - - - - - - - - - - - - - - - - - - -
M B M R O P M F L
- - - - - - - - -
baz378z2m5q42 684B2D44 699F048C 0 N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N
N N N N N N N N N
baz378z2m5q42 684B2D44 68A65B38 1 N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N
N N N N N N N N N
baz378z2m5q42 684B2D44 68A47B00 2 N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N
N N N N N N N N N
baz378z2m5q42 684B2D44 69519264 3 N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N
N N N N N N N N N
baz378z2m5q42 684B2D44 6A3B1AE0 4 N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N
N N N N N N N N N
Surprisingly, v$sql_shared_cursor does not provide a justification for the cause of multiple child cursors. If you again run a query with EMPID set to any of the values passed earlier (eg empid=0), the version_count will not be incremented. This means, a child cursor, in this case, will only be shared if a value passed to the FILTER Predicate matches any of the bind value previously executed.
TEST CASE 2 : Without Histogram and Range Predicates
For this test case, regather the statistics on the TEST_HIST table with method_opt changed to ‘FOR ALL COLUMNS SIZE 1’, which means, no histograms. Without Histograms, multiple child cursors will be created for any queries that does not contain EQUALITY (=) or NOT EQUAL (!=) predicates. For demonstration, make sure the histograms are removed and then execute following queries:
select /*+ VIVEK */ empid, sal from test_hist where sal between 1000 and 1100; select /*+ VIVEK */ empid, sal from test_hist where sal between 1200 and 1500; select /*+ VIVEK */ empid, sal from test_hist where sal between 1700 and 1800; select /*+ VIVEK */ empid, sal from test_hist where sal between 1500 and 2000;
The output should look like:
SQL> select sql_id, sql_text, executions, child_latch, version_count, plan_hash_value, hash_value fr
om v$sqlarea where sql_text like 'select /*+ VIVEK%';
SQL_ID SQL_TEXT EXECUTIONS CHILD_LATCH VERSION_COUNT PLAN_HASH_VALUE HASH_VALUE
------------- ------------------------------ ---------- ----------- ------------- --------------- --
2xk8mywyq0z69 select /*+ VIVEK */ empid, sal 4 2 4 1054764382 1029733577
from test_hist where sal betw
een :"SYS_B_0" and :"SYS_B_1"
Elapsed: 00:00:00.04
SQL> select sql_id, sql_text, executions, child_latch, plan_hash_value, child_number, child_address
from v$sql where sql_text like 'select /*+ VIVEK%';
SQL_ID SQL_TEXT EXECUTIONS CHILD_LATCH PLAN_HASH_VALUE CHILD_NUMBER CHILD_AD
------------- ------------------------------ ---------- ----------- --------------- ------------ ---
2xk8mywyq0z69 select /*+ VIVEK */ empid, sal 1 2 1054764382 0 69BCD9A0
from test_hist where sal betw
een :"SYS_B_0" and :"SYS_B_1"
2xk8mywyq0z69 select /*+ VIVEK */ empid, sal 1 2 1054764382 1 69FF769C
from test_hist where sal betw
een :"SYS_B_0" and :"SYS_B_1"
2xk8mywyq0z69 select /*+ VIVEK */ empid, sal 1 2 1054764382 2 693BED94
from test_hist where sal betw
een :"SYS_B_0" and :"SYS_B_1"
2xk8mywyq0z69 select /*+ VIVEK */ empid, sal 1 2 1054764382 3 68F50694
from test_hist where sal betw
een :"SYS_B_0" and :"SYS_B_1"
Again, in this case, if you re-execute a query with sal between 1200 and 1500, the version_count will not be incremented as a shared version of a query with same bind values, is already stored in the library cache. In both the cases, v$sql_shared_cursor does not provide the justification for the cause of multiple child cursors and therefore, it is very difficult to diagnose this.
These two test cases should help you understand the side effect of setting Cursor_Sharing parameter to SIMILAR. With this setting, even if you take utmost care about not generating unwanted histograms, any range predicate queries can cause Library Cache contention issue. I am sure that while Histograms are avoidable, Range Based Queries are unaviodable. 🙂
A Child Library Cache Latch protects multiple objects (cursors). This means, two or more entirely different queries can be hashed to a same child latch. Time Consuming match for a query with several child cursors can also cause other sessions to wait on Library Cache Latches for other queries, protected by this same latch, with 1 child cursor. Therefore, Library Cache Latch contention are also called as False Contention and utmost care has to be taken while diagnosing contention on these latches.
Real Life Example
During the AIOUG event and after the completion of these test cases, I presented on an Real Life Example. One of my customer complained for severe performance issues and one of the cause of this slow performance was identified as Contention on Library Cache Latches. These are some of the statistics, before and after eliminating Library Cache Latch Contention.
BEFORE

ISSUE
The combination of Database Version 10.2.0.4 and Cursor_Sharing=SIMILAR was enough to predict the issue which was due to multiple version count. The table below shows the list of top 20 Queries, with Executions and their Version Counts.
select * from ( select sql_id, version_count, executions, child_latch from v$sqlarea order by 2 desc) where rownum<=20; SQL_ID VERSION_COUNT EXECUTIONS CHILD_LATCH ------------- ------------- ---------- ----------- 5w93sms4xr70b 25775 59303 42 4upsrgs9r0z8h 19982 41372 10 cpk0xaggwc9q7 15316 41929 56 fx0662kfns5mj 12060 23058 17 c8brg2z6ytxjk 8991 19637 21 6t87d75m0hby0 8863 46765 37 cnsttwq7y79tc 8549 42953 7 76hw2b3ujar2z 8290 988 61 dh8yhs5pjh0dt 7670 10890 28 8xjwbmfsn0s2y 7057 996 53 dfp62t3qy92fm 6176 192357 26 2d1p0dmyaapv3 4718 16237 64 0a8bvn7zn8u26 4026 9192 39 a2qn47hh1ypqk 3982 7017 8 c1dq1hg17y3xg 3520 2255 62 08q94ja2kmqs3 3304 6143 2 4gm2wn99nk1mk 3190 6566 15 6s2cp8hk4bych 2907 48162 20 8z1b906qc3tnx 2752 6060 47 bg5gftj0r3s4m 2630 10283 52 20 rows selected.
The text of 1st Query from the above list is as under :
SQL_ID : 5w93sms4xr70b Module : GI6_03 Sharable_mem : 337,369,920 SELECT :"SYS_B_0" FROM GIT_LOP_COVER WHERE ROWNUM < :"SYS_B_1" AND LOP_SYS_ID = :"SYS_B_2" AND END_N_IDX = :"SYS_B_3" AND END_S_NO = :"SYS_B_4" AND ATTACH_L_NO = :"SYS_B_5" OWNER PAR NUM_ROWS BLOCKS LAST_ANALYZED GLO T ----------------- --- ---------- ---------- -------------------- --- - LOP_PROD NO 86222715 4334060 02-Sep-2008 10:05:26 YES N COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM ----------------- ------------ ---------- ---------- --------------- LOP_SYS_ID 12529993 0 1.8931E-07 HEIGHT BALANCED END_N_IDX 23 0 5.8944E-09 FREQUENCY END_S_NO 1 0 5.8944E-09 FREQUENCY ATTACH_L_NO 3 11880179 6.8543E-09 FREQUENCY
The Query uses Equality Predicates and we have already seen the issue due to Histograms and Cursor_Sharing SIMILAR. To validate this further, check for the column statistics, as shown above, for the queried table and it uses Histograms on the columns used in this query.
SOLUTION
To resolve this latch contention issues, the default method_opt was changed to ‘FOR ALL COLUMNS SIZE 1’ and the statistics were recollected. This gave some relief, but the version count for some of the queries were still high due to Range Predicates. Therefore, CURSOR_SHARING was set to FORCE. After this change, the version_count dropped significantly and the benefit can be seen from the statistics pasted below.

With Multiple Version Counts, Hard Parses were more than 100 per second. After changing from SIMILAR to FORCE, it came down to 60. Further implementation of Bind Variables, in place of Literals, brought this down to less than 10.
Production Statistics, post optimization

In the next blog, I would write on another test case alongwith a real life example on the issue due to Improper Implementation of an Oracle feature that caused a severe Row Cache Latch Contention. This discussion was also a part of my Day 2 presentation at AIOUG.
I did similar testing on 9i for the case of range predicates and no histograms. Oracle does not generate many versions if u collect stats on all the indexes columns only.
Hi Vivek,
I am having a 2-node RAC database.The database is on 10g (10.2.0.3) and stats are gathered with histogram( default method_opt.Cursor_sharing parameter is set to similar.Checking for the version_count, most of the select queries with equality predicate are showing very high version_count value (somewhere in 1000s).The database is having performance issue, though library cache latch contention is not much,neither is top 5 waits.During the peaks hours gc buffer busy is in top 5 events.
Will testing the application with method_opt => ‘FOR ALL COLUMNS SIZE 1’ help in this case??
Regards,
Anand