Recently, got an oppurtunity to work on a Mutex related issue causing database to halt. While the issue started from April 2013, the frequency of this increased since September 2013. This is a Banking Customer and every month, they add few Branches into the system. With the Branches, the concurrency also goes up. This increase in concurrency could be one of the reason of frequent Library Cache Mutex Issue.
In this case, the issue was found to be due to high number of Child Cursors. The top 4 queries, based on number of child cursors, had around 30k child cursors. From the v$sql_shared_cursor, the reason for these many childs was AUTH_CHECK_MISMATCH, which stands for “Authorization Check Mismatch”. Further investigation revealed this to be an Implementation issue.
The Bank has around 580 Branches. For Scalability purpose, the Application Team has implemented 290 Branches on One Database and other 280 Branches on another Database. These are named as BDB1 and BDB2. For each of the Branches, they have individual schema. Therefore, in BDB1 and BDB2 they have 290 and 280 schemas respectively. Each Schema has it’s own table, procedure, functions etc…The name of the tables, their columns etc are same as well across each of these schema’s. For example, each of the schema has a table USER_SESSION_LOGIN_DTL with same number and names of the columns. They only differ in data stored within each of these tables. Further, each schema has around 800+ tables. Therefore, a query executed against Branch 1 is executed against Branch 2, due to the textual similarity, they match in HASH Values and therefore share a common parent. The logic at the application level is that if a user connects to the application, the Branch Name is returned based on which, he or she connects to the schema (based on the branch name) and executes the application level business trasaction. Another user from another branch will open the same module but will connect to different schema. Therefore, while the queries are textually same for each of the Branches, since the underlying objects are different, we see multiple child cursors being populated.
In my view, this is an inefficient Database Design Issue. This will also have a serious manageability issue, as any change in the business logic, will have to be replicated across all the schemas. Any Optimization, i.e. creation of a New Index, will have to be replicated across each of the Schemas. This list can go on…
This is very easy to demonstrate.
## This pl/sql will create 300 Users with names as F1, F2, F3 and so on....
declare
l_statement varchar2(100);
begin
for i in 1..300
loop
l_statement:='drop User F'||i||' cascade';
execute immediate l_statement;
l_statement:='Create User F'||i||' identified by F'||i;
execute immediate l_statement;
l_statement:='grant dba to F'||i;
execute immediate l_statement;
end loop;
end;
/
## Next we create a Table EMP in each of these schemas. In this case, I am inserting same data into these tables.
declare
l_statement varchar2(100);
begin
for i in 1..300
loop
execute immediate 'alter session set current_schema=F'||i;
l_statement:='create table emp as select * from scott.emp';
execute immediate l_statement;
end loop;
end;
/
## Next let us execute a query on each of the emp schemas (randomly). These will be 30000 queries with different Bind Variables.
declare
l_emp number;
l_num number;
l_sch number;
l_statement varchar2(1000);
l_ename varchar2(32);
begin
for i in 1..30000
loop
l_num:=round(dbms_random.value(1,14),0);
l_sch:=round(dbms_random.value(1,300),0);
execute immediate 'alter session set current_schema=F'||l_sch;
select empno into l_emp from (
select empno, row_number() over(order by empno) rno from scott.emp)
where rno=l_num;
l_statement := 'select /*+ vivek1 */ ename from emp where empno=:b1';
execute immediate l_statement into l_ename using l_emp;
end loop;
end;
/
Once the queries are executed, we can query v$sql_shared_cursor to check for the number of child cursors.
SQL> SQL> @sql_text
Enter value for 1: select /*+ vivek1
old 2: from v$sqlarea where sql_text like '&1%'
new 2: from v$sqlarea where sql_text like 'select /*+ vivek1%'
SQL_ID SQL_TEXT BUFFER_GETS EXECUTIONS VERSION_COUNT PLAN_HASH_VALUE
------------- -------------------------------------------------- ----------- ---------- ------------- ---------------
1c9w6s580jtpd select /*+ vivek1 */ ename from emp where empno=:b 5403 1801 30 3956160932
1
SQL> select count(*) from v$sql_shared_cursor where sql_id='1c9w6s580jtpd';
COUNT(*)
----------
8084
SQL> select sql_id, child_number, AUTH_CHECK_MISMATCH, reason from v$sql_shared_cursor where sql_id='1c9w6s580jtpd' and rownum<=5;
SQL_ID CHILD_NUMBER A REASON
------------- ------------ - --------------------------------------------------------------------------------
1c9w6s580jtpd 94 Y 9437Authorization Check f
1c9w6s580jtpd 95 Y 9537Authorization Check f
1c9w6s580jtpd 96 Y 9637Authorization Check f
1c9w6s580jtpd 97 Y 9737Authorization Check f
1c9w6s580jtpd 98 Y 9837Authorization Check f
For 30000 Executions, we have 8000 Child Cursors. This block was executed from a Single Session and no one else working. Imagine the impact of such an implementation in a highly concurrent environment.