Performance | Real Life Database / SQL Experiences : An Oracle Blog from Vivek Sharma

Performance Tuning Day…All India Oracle User Group

August 9, 2015 2 Comments

Concluded a 5 hour session on Performance Optimization for Pune Chapter of All India Oracle User Group. Due to unexpected massive traffic jam, reached the venue late and therefore, the sessions were behind the actual schedule. Had to cancel a session on Query Optimizer. A Big Disappointment for this cancellation. As a speaker, it really disappoints when you have to run few slides and/or cancel a session. Sorry folks. Will check with the Organizer for the Part II of this event, where we can cover this interesting topic.

The crowd, as expected, was interesting. They made the whole event Interactive with interesting questions. As a Speaker, you enjoy if your participants are deeply involved and raise questions to clarify their doubts, which gives you a sense that the crowd is listening to what you are saying. Thanks to all for lighting up the event.

Last but not the least, Hats Off to the Organizers. They worked hard to make this event a grand success. Their meticulous planning is appreciated.

Filed under Uncategorized Tagged with aioug, Performance

AIOUG Performance Tuning Day – Pune

July 15, 2015 2 Comments

I am presenting a full day event on 8th August 2015 at Pune. This is a Performance tuning day that will cover some of the interesting performance issues and solutions. Looking forward to see you all at Pune. Click on the following link to view the schedule and to register.

Performance Tuning Day Link

Filed under Uncategorized Tagged with aioug, Performance

AIOUG Performance Tuning Day ! Hyderabad…

May 20, 2015 2 Comments

AIOUG is hosting a full day Performance Tuning day on 30th May 2015 in Hyderabad. I will be presenting on Oracle Query Optimizer and Performance. These sessions will be a mix of technical demos and real life examples. Hope to see a large gathering. Registrations are open at Performance Tuning Day.

Filed under AIOUG, Optimizer, Performance Tagged with aioug, Optimizer, Performance

Auto DOP ! 11g Feature…

May 12, 2015 Leave a comment

Recently, during one of the Performance POC on an Oracle Engineered System, I got to implement Auto DOP feature for one of their critical Batch Job. This Batch job is a a set of multiple processes. Some of the processes run as a Serial Process, some these are just one process, but the query is hinted to use Oracle Parallelism and some of these are parallel application threads.

For AUTO DOP to work, there are two mandatory settings. These are IO Calibration and setting of parallel_degree_policy (which defaults to MANUAL, means no AUTO DOP). I started with AUTO settings as LIMITED requires changing DEGREE for all (or critical huge) tables. AUTO setting means, Optimizer will calculate and come out with the Degree of Parallelism. Further, this setting also enables two features: Parallel_Statement_Queuing and In-Memory Parallel Execution. In-Memory Parallel execution might disable Exadata Smart Scan, therefore, I modified _parallel_cluster_cache_policy to ADAPTIVE i.e no In-Memory Parallel Execution. IO Calibration was done, so that, optimizer can take the advantage of various values (as can be seen from DBA_RSRC_IO_CALIBRATION) to come out with Degree of Parallelism.

My initial test run was not that successful. While investigating this, I came across a very informative article from Gwen Shapira. As per this article, for AUTO DOP considers MAX_PMBPS value only and this can be set manually to 200. In my case, since the calibration process was executed, the value for MAX_PMBPS was around 3000+. As I understand, this value depicts the throughput and therefore, larger the value, lower the DOP will be. I followed the advice in the article and deleted the rows from resource_io_calibrate$.

delete from resource_io_calibrate$;
insert into resource_io_calibrate$ values(current_timestamp, current_timestamp, 0, 0, 200, 0, 0);
commit;

Post setting this value manually, the database needs a restart. This change did the trick. The performance of many of the processes improved drastically. For some of the queries, we could also see Statement Queuing working effectively i.e. waiting for the resources to be available and taking up the required DOP, rather than downgrading the parallel degree. However, the customer was bit apprehensive about AUTO word and therefore, decided to Implement the optimizer calculations of DOP manually into each of these queries and disabling this feature. This change worked as well.

My observations from this nice feature (and I may be wrong as I did not get much time to investigate or work on this), to name a few are :

Any Query that takes more than 10 Seconds (Optimizer Estimation) will be subject to Auto DOP

A Query with parallel hint will also go through serial execution run time evaluation. This means, if optimizer estimates the runtime of serial execution to be less than 10 seconds, parallel hint will be ignored.

In my opinion, AUTO DOP is one of the best innovation.

Filed under Exadata, Performance Tagged with Auto DOP, Exadata, Parallelism, parallel_degree_policy, Performance, Statement Queuing, _parallel_cluster_cache_policy, _parallel_statement_queing

Intra Block Chaining ! Impact on Logical Reads

November 8, 2012 5 Comments

I presented this during Sangam 12 (User Group Conference) and many participants wanted me to post a Blog on this. It is important to know the impact of Intra Block Chaining, which occurs, if the number of columns in a table exceed 255. If a table has more than 255 columns, the row pieces are stored in multiple row pieces of 255 each. Now, whether it stores columns 1 to 255 in one Row Piece and 256 to the remaining columns in next row piece, is an interesting fact that everyone wants to know. Through this blog, I will try to demonstrate and answer following facts :

If a table consists of 300 Columns, which all columns will constitute the 2 row pieces ? i.e. Row Piece 1 : 1-255 Columns and Row Piece 2 : 256-300 Columns, or will it be something else ?
What will be the performance impact of Intra Block Chaining ?

The reason, I picked this as one of the topic that the Developers and the DBA’s should know, is based on a Real Life Example. One of my customer complained of a Severe Performance issues, post applying an Application Patch. This patch was to introduce certain new functionality for their end-users. Further, this functionality change modified many of the critical tables used by the application. This change was : introduction of new columns. Few tables, which had less than 255 columns, were altered and new columns were added, which grew these numbers to more than 255. This was enough to cause the performance issues, which went unnoticed, during a test run, which was with done with less user load. On production, the user load was way high, and therefore, any increase in the I/O was capable enough to increase the CPU Utilization.

Let us start with a Demonstration, which is an easiest way to understand the facts. We will first create a table, with required number of columns. First, a table with 300 Columns and then I will insert single row into this table. Following two pl/sql blocks uses a Dynamic SQL to accomplish the task.

variable b1 number;
exec :b1:=&number_of_columns; -- Pass the Number of Columns to be required in a Table 

-- Dynamically Create Intra_Block table with required number of rows. This block constructs the CREATE TABLE Statement.

declare
l_statement	long:='create table intra_block (';
begin
  for i in 1..:b1
  loop
    if i<:b1 then
    l_statement:=l_statement||' A'||i||' number(3),';
    else
    l_statement:=l_statement||' A'||i||' number(3))';
    end if;
  end loop;
  execute immediate l_statement;
end;
/

-- Insert Single Row into the table. This block constructs the INSERT Statement.

declare
l_statement	long:='insert into intra_block(';
begin
   for i in 1..:b1
   loop
     if i<:b1 then
     l_statement:=l_statement||'A'||i||',';
     else
     l_statement:=l_statement||'A'||i||') values(';
     end if;
   end loop;
   for i in 1..:b1
   loop
     if i<:b1 then
     l_statement:=l_statement||i||',';
     else
     l_statement:=l_statement||i||')';
     end if;
   end loop;
   execute immediate l_statement;   
   commit;
end;
/
-- Gather Statistics
exec dbms_stats.gather_table_stats(user,'INTRA_BLOCK');

select table_name, blocks, num_rows from dba_tables where table_name='INTRA_BLOCK';

OWNER           PAR   NUM_ROWS     BLOCKS
--------------- --- ---------- ----------
VIVEK           NO           1          5

I created this table with 300 columns and inserted one row. DBA_Tables shows 5 blocks in the table. Next, the following queries show that the entire 300 columns fit into 1 block, block#182 in my case.

select dbms_rowid.rowid_block_number(rowid) bno, count(*) from intra_block group by dbms_rowid.rowid_block_number(rowid);

       BNO   COUNT(*)
---------- ----------
       182          1

Next, we shall execute a Query against the first column, which is A1, and then on the last column, which is A300. We will compare the I/O’s done by each of these queries.

set autot on stat

select a1 from intra_block;

        A1
----------
         1

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          7  consistent gets <--- 7 Logical I/O's

select A300 from intra_block;

      A300
----------
       300

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          8  consistent gets <-- One Additional Logical I/O's

Query on the last column (for a table with more than 255 columns) incurred one Additional Logical I/O. Let us now check, row piece structure. We now know that with 300 columns, the Rows will be inserted into 2 Row Pieces. One Row Piece will incur less I/O as compared to the another Row Piece. I want to check the columns in Row Piece 1 and Columns in Row Piece 2, which will help us know, which column onwards the query will start incurring additional I/O. Following pl/sql executes a query on each of the columns and we shall check for the I/O’s (from buffer_gets column of v$sqlarea).

 declare
l_statement varchar2(132);
l_value number;
begin
  for i in 1..:b1
  loop
    l_statement:='select /*+ intra_'||lpad(i,3,0)||' */ A'||i||' from intra_block';
    execute immediate l_statement into l_value;
  end loop;
end;
/

select sql_id, sql_text, buffer_gets, executions, rows_processed
from v$sqlarea where sql_text like 'select /*+ intra%'
order by to_number(substr(sql_text,instr(sql_text,'i',1)+6,3));

SQL_ID        SQL_TEXT                                           BUFFER_GETS EXECUTIONS ROWS_PROCESSED
------------- -------------------------------------------------- ----------- ---------- --------------
47u9afvh40vqf select /*+ intra_001 */ A1 from intra_block                  7          1              1
cvjvxgnfz7h78 select /*+ intra_002 */ A2 from intra_block                  7          1              1
cbdajswujwnyf select /*+ intra_003 */ A3 from intra_block                  7          1              1
8g18r5syanmbx select /*+ intra_004 */ A4 from intra_block                  7          1              1
d5kyqtgd2d4q3 select /*+ intra_005 */ A5 from intra_block                  7          1              1
c0yvymb9p7gj0 select /*+ intra_006 */ A6 from intra_block                  7          1              1
.....
..... (some entries trimmed)
.....
f1rrwb22u89gf select /*+ intra_029 */ A29 from intra_block                 7          1              1
1556fp9466r4x select /*+ intra_030 */ A30 from intra_block                 7          1              1
3sk2zpnyqanmc select /*+ intra_031 */ A31 from intra_block                 7          1              1
asp7vk16b07sb select /*+ intra_032 */ A32 from intra_block                 7          1              1
9ts2g9w5dra0q select /*+ intra_033 */ A33 from intra_block                 7          1              1
c9dnc235v6w00 select /*+ intra_034 */ A34 from intra_block                 7          1              1
a532dxfj0d0w6 select /*+ intra_035 */ A35 from intra_block                 7          1              1
1p2yzsxax20qm select /*+ intra_036 */ A36 from intra_block                 7          1              1
6zb16tkbyrbhm select /*+ intra_037 */ A37 from intra_block                 7          1              1
38zqdzgvvg70w select /*+ intra_038 */ A38 from intra_block                 7          1              1
51f39r5tnw73d select /*+ intra_039 */ A39 from intra_block                 7          1              1
f1fywzj4avnz0 select /*+ intra_040 */ A40 from intra_block                 7          1              1
46015j3s4trsk select /*+ intra_041 */ A41 from intra_block                 7          1              1
1sdrag4sxbcjh select /*+ intra_042 */ A42 from intra_block                 7          1              1
a0t2nrpb8r83s select /*+ intra_043 */ A43 from intra_block                 7          1              1
aj2w0gvj5zy1g select /*+ intra_044 */ A44 from intra_block                 7          1              1
6nk7x7430k9uk select /*+ intra_045 */ A45 from intra_block                 7          1              1
ahdr7bqsm18x8 select /*+ intra_046 */ A46 from intra_block                 8          1              1
1vsus8qg2rsft select /*+ intra_047 */ A47 from intra_block                 8          1              1
86njp3z8adan3 select /*+ intra_048 */ A48 from intra_block                 8          1              1
6v6u6hp71vtwb select /*+ intra_049 */ A49 from intra_block                 8          1              1
7a9a7gbyhrvkr select /*+ intra_050 */ A50 from intra_block                 8          1              1
8wp83m0fn4kgw select /*+ intra_051 */ A51 from intra_block                 8          1              1
7q464wb184tpu select /*+ intra_052 */ A52 from intra_block                 8          1              1
.....
..... (some entries trimmed)
.....
9dmv986638rk4 select /*+ intra_235 */ A235 from intra_block                8          1              1
d2ks041mqfv1b select /*+ intra_236 */ A236 from intra_block                8          1              1
b0muufu2qqs7y select /*+ intra_237 */ A237 from intra_block                8          1              1
fbd5x0z45udvr select /*+ intra_238 */ A238 from intra_block                8          1              1
c31yuufwr2wzc select /*+ intra_239 */ A239 from intra_block                8          1              1
2aczq7gsf0680 select /*+ intra_240 */ A240 from intra_block                8          1              1
7us824yhm9c4s select /*+ intra_241 */ A241 from intra_block                8          1              1
c26kyzhmm1ad1 select /*+ intra_242 */ A242 from intra_block                8          1              1
dqrp4n7a5uzjp select /*+ intra_243 */ A243 from intra_block                8          1              1
4wzw76dh7sa6h select /*+ intra_244 */ A244 from intra_block                8          1              1
b4qwud4k9j9qg select /*+ intra_245 */ A245 from intra_block                8          1              1
6cyzxr6c7t4km select /*+ intra_246 */ A246 from intra_block                8          1              1
7u8krqjy6aq0k select /*+ intra_247 */ A247 from intra_block                8          1              1
8sxmn5ukxzqad select /*+ intra_248 */ A248 from intra_block                8          1              1
a66frsvy8q8vq select /*+ intra_249 */ A249 from intra_block                8          1              1
9htnkqmfchypc select /*+ intra_250 */ A250 from intra_block                8          1              1
058f3hvkf2y6m select /*+ intra_251 */ A251 from intra_block                8          1              1
1m9wfv39zmn6v select /*+ intra_252 */ A252 from intra_block                8          1              1
23mmdngutgbc6 select /*+ intra_253 */ A253 from intra_block                8          1              1
bnxt1ujafku6s select /*+ intra_254 */ A254 from intra_block                8          1              1
a4gv26ujgjhjv select /*+ intra_255 */ A255 from intra_block                8          1              1
draytdg79hc5c select /*+ intra_256 */ A256 from intra_block                8          1              1
4gr6ntv7zx764 select /*+ intra_257 */ A257 from intra_block                8          1              1
csfz96fkfz2vj select /*+ intra_258 */ A258 from intra_block                8          1              1
1v4sjc8740kdc select /*+ intra_259 */ A259 from intra_block                8          1              1
b38nmwf9bnfv0 select /*+ intra_260 */ A260 from intra_block                8          1              1
b52d8sd512zpr select /*+ intra_261 */ A261 from intra_block                8          1              1
6s8963hfnpt9b select /*+ intra_262 */ A262 from intra_block                8          1              1
5h19qru6dsudz select /*+ intra_263 */ A263 from intra_block                8          1              1
fpnvs4ry55qqg select /*+ intra_264 */ A264 from intra_block                8          1              1
2us1f52hmqz4j select /*+ intra_265 */ A265 from intra_block                8          1              1
g7xbq7nvg119d select /*+ intra_266 */ A266 from intra_block                8          1              1
78z5hjw7nam1n select /*+ intra_267 */ A267 from intra_block                8          1              1
82w5nq265jyfp select /*+ intra_268 */ A268 from intra_block                8          1              1
fh0hnjydm2unq select /*+ intra_269 */ A269 from intra_block                8          1              1
39tqqf1p9jm1s select /*+ intra_270 */ A270 from intra_block                8          1              1
cn4np8vkgk4mc select /*+ intra_271 */ A271 from intra_block                8          1              1
.....
..... (some entries trimmed)
.....
19m0kx9w80afs select /*+ intra_296 */ A296 from intra_block                8          1              1
1q9kg224txpqu select /*+ intra_297 */ A297 from intra_block                8          1              1
36dya8xmc8a0a select /*+ intra_298 */ A298 from intra_block                8          1              1
abn6s5jr2bac3 select /*+ intra_299 */ A299 from intra_block                8          1              1
grnbn11qw3xjk select /*+ intra_300 */ A300 from intra_block                8          1              1

300 rows selected.

From the output above, it can be seen that for 300 Columns, the I/O’s increased from 46th Column (A46), which is 300-255=45. This means, columns A300-A46 (255 columns) are together stored in One Row Piece, whereas, A45-A1 are stored in another Row Piece. This information was a bit interesting. Initially I thought, any query that queries columns A256 and above, would incur additional I/O, but the rows are stored backward and 255 limit also starts from back i.e.last column onwards.

For curiosity, I ran the entire test again for 256 columns. The table was dropped and recreated to have A1..A256 columns. With this, I could see that the I/O’s increased by 1 from column A2 onwards (256-255=1). Again, Columns A2..A256 were stored in one Row Piece and A1 in another. This also means, if you have a table with 255 columns, you add one column and would immediately see I/O’s go up by 1 for any column starting 2nd Column.

You may run this for any number of columns, and should see the same results. The third test I did was on a 600 Column Table. The I/O’s were as under :

Columns A1..A90 --- 7 I/O's
Columns A91..A345 --- 8 I/O's
Columns A346 onwards --- 9 I/O's 

600-255=345
345-255=90

This test proves that the Table Design requires careful planning. There is hardly any reason, why a Table needs to be created with more than 255 columns. However, I have seen many applications that create tables with more number of columns. This blog should be an eye opener for them. Any additional (unwanted) I/O’s that are saved will have dramatic impact on Application performance. This will improve the Scalability of the Application.

Filed under Performance Tagged with chaining, consistent gets, intra block, logical reads, Performance

Query Performance Issue ! SQL*Plus v/s Oracle Reports…

February 27, 2012 2 Comments

Have editted on 1st March to include the Execution Plan of the Modified Query. See at the end.

Recently, got a mail from one of my customer on a Query Performance Issue. The Customer mentioned that the Query, when executed from SQL*Plus, takes around 10 Seconds, whereas, when the pasted in Oracle Reports and run through Concurrent Manager, it runs from around 30-40 minutes and is finally killed. The Customer also confirmed that the Query is exactly same i.e it is purely a Copy-Paste of the Query that was run from SQL*Plus and that the Bind Type & Length too are same. The run time plan for both the executions are totally different, with the plan of SQL*Plus better than the plan generated by Oracle Reports. The Question was, “Why is the plan different between SQL*Plus and Oracle Reports ?”

As usual, I asked for a 10053 trace for both the executions, which is one of the best way in investigating Optimizer Calculations. The 10053 trace revealed some important facts about the Query Performance, and therefore helped resolve the issue. Before we dive into the investigation, let me paste the Query and both the Execution Plans.

This is the Query.

SELECT /*+ INDEX(WDD WSH_DELIVERY_DETAILS_TI4) */
             WDD.ORGANIZATION_ID, 
             COUNT(*) LPNS,
             TRP.TRIP_ID,
             RC.CUSTOMER_NUMBER,
             RC.CUSTOMER_NAME, 
             RAA.ADDRESS1 TO_ADDRESS1, 
             RAA.CITY RAA.STATE TO_CITY_STATE, 
             HLA.DESCRIPTION FROM_DESC, 
             HLA.ADDRESS_LINE_1||' '||HLA.TOWN_OR_CITY||' '||HLA.TOWN_OR_CITY)  FROM_ADDRESS1_CITY, 
             WDA.DELIVERY_ID, 
             WDA.PARENT_DELIVERY_DETAIL_ID,
             WDD.DELIVERY_DETAIL_ID, 
             WDD1.CONTAINER_NAME    CONTAINER_NAME, 
             SUM(WDD.REQUESTED_QUANTITY) REQUESTED_QUANTITY, 
             WDD.REQUESTED_QUANTITY_UOM, 
             SUM(ROUND(WDD.GROSS_WEIGHT,0)) GROSS_WEIGHT, 
             WDD.WEIGHT_UOM_CODE,
             WT.VEHICLE_NUM_PREFIX ||' '||WT.VEHICLE_NUMBER VEHICLE_NUMBER,
             MSI.SEGMENT1 ITEM_CODE, 
	     MSI.ATTRIBUTE10 PRT_DESC 
FROM 	apps.WSH_DELIVERY_DETAILS WDD,
     	apps.WSH_DELIVERY_DETAILS WDD1,
        apps.WSH_DELIVERY_ASSIGNMENTS WDA ,
        apps.WSH_DELIVERY_ASSIGNMENTS WDA1,    
        apps.WSH_TRIPS WT, 
        apps.MTL_SYSTEM_ITEMS MSI, 
	TI_RA_ADDRESSES_V RAA,
	TI_RA_CUSTOMERS_V RC ,
        apps.HR_LOCATIONS_ALL HLA,
    	(SELECT DISTINCT  T.TRIP_ID, WDA.DELIVERY_ID 
    	FROM	apps.WSH_DELIVERY_ASSIGNMENTS WDA, 
          	apps.WSH_DELIVERY_LEGS DL,
          	apps.WSH_TRIP_STOPS PICKUP_STOP,  
             	apps.WSH_TRIPS T
	WHERE	WDA.DELIVERY_ID     = DL.DELIVERY_ID 
        AND     DL.PICK_UP_STOP_ID  = PICKUP_STOP.STOP_ID 
        AND     PICKUP_STOP.TRIP_ID = T.TRIP_ID
        AND     T.TRIP_ID  >=  : P_TRIPID_FR
        AND     T.TRIP_ID  = : P_TRIPID_FR 
AND 	TRP.TRIP_ID <= : P_TRIPID_TO 
GROUP BY WDD.ORGANIZATION_ID , TRP.TRIP_ID , RC.CUSTOMER_NUMBER , RC.CUSTOMER_NAME , RAA.ADDRESS1 , 
RAA.CITY , RAA.STATE , HLA.DESCRIPTION , HLA.ADDRESS_LINE_1 , HLA.TOWN_OR_CITY , WDA.DELIVERY_ID , 
WT.VEHICLE_NUM_PREFIX , WT.VEHICLE_NUMBER , WDA.PARENT_DELIVERY_DETAIL_ID , WDD.DELIVERY_DETAIL_ID , 
WDD1.CONTAINER_NAME , WDD.REQUESTED_QUANTITY_UOM , WDD.WEIGHT_UOM_CODE , MSI.SEGMENT1 , MSI.ATTRIBUTE10

The Runtime Plan of the Query when Executed from SQL*Plus.

-----------------------------------------------------------------------------------------------
| Id  | Operation                                       | Name                        | Rows  |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                |                             |       |
|   1 |  SORT GROUP BY                                  |                             |     1 |
|   2 |   VIEW                                          | VM_NWVW_1                   |     1 |
|   3 |    SORT UNIQUE                                  |                             |     1 |
|*  4 |     FILTER                                      |                             |       |
|   5 |      NESTED LOOPS                               |                             |       |
|   6 |       NESTED LOOPS                              |                             |     1 |
|   7 |        NESTED LOOPS                             |                             |     1 |
|   8 |         NESTED LOOPS                            |                             |     1 |
|   9 |          NESTED LOOPS                           |                             |     5 |
|  10 |           NESTED LOOPS                          |                             |     5 |
|  11 |            NESTED LOOPS                         |                             |     5 |
|  12 |             NESTED LOOPS                        |                             |     5 |
|  13 |              NESTED LOOPS OUTER                 |                             |    27 |
|  14 |               NESTED LOOPS OUTER                |                             |    27 |
|  15 |                NESTED LOOPS                     |                             |    27 |
|  16 |                 NESTED LOOPS                    |                             |  1908 |
|  17 |                  NESTED LOOPS                   |                             |   427 |
|  18 |                   NESTED LOOPS                  |                             |   127 |
|  19 |                    MERGE JOIN                   |                             |    72 |
|  20 |                     TABLE ACCESS BY INDEX ROWID | WSH_TRIP_STOPS              |   187 |
|* 21 |                      INDEX RANGE SCAN           | WSH_TRIP_STOPS_N1           |   187 |
|* 22 |                     SORT JOIN                   |                             |    72 |
|  23 |                      TABLE ACCESS BY INDEX ROWID| WSH_TRIPS                   |    72 |
|* 24 |                       INDEX RANGE SCAN          | WSH_TRIPS_U1                |    72 |
|  25 |                    TABLE ACCESS BY INDEX ROWID  | WSH_DELIVERY_LEGS           |     2 |
|* 26 |                     INDEX RANGE SCAN            | WSH_DELIVERY_LEGS_N2        |     2 |
|* 27 |                   INDEX RANGE SCAN              | WSH_DELIVERY_ASSIGNMENTS_N1 |     3 |
|  28 |                  TABLE ACCESS BY INDEX ROWID    | WSH_DELIVERY_ASSIGNMENTS    |     4 |
|* 29 |                   INDEX RANGE SCAN              | WSH_DELIVERY_ASSIGNMENTS_N1 |     4 |
|* 30 |                 TABLE ACCESS BY INDEX ROWID     | WSH_DELIVERY_DETAILS        |     1 |
|* 31 |                  INDEX RANGE SCAN               | WSH_DELIVERY_DETAILS_TI4    |     1 |
|* 32 |                INDEX RANGE SCAN                 | WSH_DELIVERY_ASSIGNMENTS_N3 |     1 |
|  33 |               TABLE ACCESS BY INDEX ROWID       | WSH_DELIVERY_DETAILS        |     1 |
|* 34 |                INDEX UNIQUE SCAN                | WSH_DELIVERY_DETAILS_U1     |     1 |
|  35 |              TABLE ACCESS BY INDEX ROWID        | MTL_SYSTEM_ITEMS_B          |     1 |
|* 36 |               INDEX UNIQUE SCAN                 | MTL_SYSTEM_ITEMS_B_U1       |     1 |
|  37 |             TABLE ACCESS BY INDEX ROWID         | HZ_CUST_ACCOUNTS            |     1 |
|* 38 |              INDEX UNIQUE SCAN                  | HZ_CUST_ACCOUNTS_U1         |     1 |
|  39 |            TABLE ACCESS BY INDEX ROWID          | HZ_LOCATIONS                |     1 |
|* 40 |             INDEX UNIQUE SCAN                   | HZ_LOCATIONS_U1             |     1 |
|  41 |           TABLE ACCESS BY INDEX ROWID           | HZ_PARTY_SITES              |     1 |
|* 42 |            INDEX RANGE SCAN                     | HZ_PARTY_SITES_N2           |     1 |
|* 43 |          TABLE ACCESS BY INDEX ROWID            | HZ_CUST_ACCT_SITES_ALL      |     1 |
|* 44 |           INDEX RANGE SCAN                      | HZ_CUST_ACCT_SITES_N1       |     1 |
|  45 |         TABLE ACCESS BY INDEX ROWID             | HZ_PARTIES                  |     1 |
|* 46 |          INDEX UNIQUE SCAN                      | HZ_PARTIES_U1               |     1 |
|* 47 |        INDEX UNIQUE SCAN                        | HR_LOCATIONS_PK             |     1 |
|  48 |       TABLE ACCESS BY INDEX ROWID               | HR_LOCATIONS_ALL            |     1 |
-----------------------------------------------------------------------------------------------

Runtime Plan of the Query when Executed from Oracle Reports.

  
  -----------------------------------------------------------------------------------------------
  | Id  | Operation                                       | Name                        | Rows  |
  -----------------------------------------------------------------------------------------------
  |   0 | SELECT STATEMENT                                |                             |       |
  |   1 |  SORT ORDER BY                                  |                             |     1 |
  |   2 |   SORT GROUP BY                                 |                             |     1 |
  |   3 |    VIEW                                         | VM_NWVW_1                   |     1 |
  |   4 |     SORT UNIQUE                                 |                             |     1 |
  |*  5 |      FILTER                                     |                             |       |
  |   6 |       NESTED LOOPS                              |                             |       |
  |   7 |        NESTED LOOPS                             |                             |     1 |
  |   8 |         NESTED LOOPS                            |                             |     1 |
  |   9 |          NESTED LOOPS                           |                             |     4 |
  |  10 |           NESTED LOOPS                          |                             |     4 |
  |  11 |            NESTED LOOPS                         |                             |     1 |
  |  12 |             NESTED LOOPS OUTER                  |                             |     1 |
  |  13 |              NESTED LOOPS OUTER                 |                             |     1 |
  |  14 |               NESTED LOOPS                      |                             |     1 |
  |* 15 |                HASH JOIN                        |                             |     1 |
  |  16 |                 TABLE ACCESS FULL               | HZ_CUST_ACCT_SITES_ALL      | 36983 |
  |* 17 |                 HASH JOIN                       |                             | 23952 |
  |  18 |                  TABLE ACCESS FULL              | HZ_PARTY_SITES              | 76710 |
  |* 19 |                  HASH JOIN                      |                             | 22674 |
  |  20 |                   TABLE ACCESS FULL             | HZ_LOCATIONS                | 72598 |
  |* 21 |                   HASH JOIN                     |                             | 22737 |
  |  22 |                    TABLE ACCESS FULL            | HZ_CUST_ACCOUNTS            | 14291 |
  |* 23 |                    HASH JOIN                    |                             | 22801 |
  |  24 |                     TABLE ACCESS BY INDEX ROWID | MTL_SYSTEM_ITEMS_B          | 11073 |
  |* 25 |                      INDEX RANGE SCAN           | MTL_SYSTEM_ITEMS_B_N16      | 11073 |
  |* 26 |                     HASH JOIN                   |                             |   130K|
  |  27 |                      TABLE ACCESS FULL          | HR_LOCATIONS_ALL            |   919 |
  |* 28 |                      TABLE ACCESS BY INDEX ROWID| WSH_DELIVERY_DETAILS        |   130K|
  |* 29 |                       INDEX RANGE SCAN          | WSH_DELIVERY_DETAILS_TI4    |   143K|
  |* 30 |                TABLE ACCESS BY INDEX ROWID      | WSH_DELIVERY_ASSIGNMENTS    |     1 |
  |* 31 |                 INDEX RANGE SCAN                | WSH_DELIVERY_ASSIGNMENTS_N3 |     1 |
  |* 32 |               INDEX RANGE SCAN                  | WSH_DELIVERY_ASSIGNMENTS_N3 |     1 |
  |  33 |              TABLE ACCESS BY INDEX ROWID        | WSH_DELIVERY_DETAILS        |     1 |
  |* 34 |               INDEX UNIQUE SCAN                 | WSH_DELIVERY_DETAILS_U1     |     1 |
  |  35 |             TABLE ACCESS BY INDEX ROWID         | HZ_PARTIES                  |     1 |
  |* 36 |              INDEX UNIQUE SCAN                  | HZ_PARTIES_U1               |     1 |
  |* 37 |            INDEX RANGE SCAN                     | WSH_DELIVERY_ASSIGNMENTS_N1 |     4 |
  |  38 |           TABLE ACCESS BY INDEX ROWID           | WSH_DELIVERY_LEGS           |     1 |
  |* 39 |            INDEX RANGE SCAN                     | WSH_DELIVERY_LEGS_N1        |     1 |
  |* 40 |          TABLE ACCESS BY INDEX ROWID            | WSH_TRIP_STOPS              |     1 |
  |* 41 |           INDEX UNIQUE SCAN                     | WSH_TRIP_STOPS_U1           |     1 |
  |* 42 |         INDEX UNIQUE SCAN                       | WSH_TRIPS_U1                |     1 |
  |  43 |        TABLE ACCESS BY INDEX ROWID              | WSH_TRIPS                   |     1 |
  -----------------------------------------------------------------------------------------------

Since the investigation is around the Cardinality of each of the row source, I have removed other information (irrelevant for this discussion) from the execution plan. The Query contains a Subquery with alias as TRP. The Tables queried in the Subquery Drives the Good Plan, whereas, the Subquery is unnested and joined at a later stage in the Problematic Plan (Step 37-43). In the Problematic Plan, the Driving table is HR_LOCATIONS_ALL, which is hash joined to WSH_DELIVERY_DETAILS. The Investigation was around the change in the Execution Path. One of the primary goal of Optimizer is to calculate the Number of rows to be returned by each of the Filter Predicate and then evaluate the Access Patch, including the Driving Table, based on the predicate that fetches the least number of rows. It then also evaluates the Join Method, which is to consider Nested Loop Join for small set of rows or Hash Join / Sort Merge Join for larger set of rows.

The text of the TRP Subquery is pasted below

    	(SELECT DISTINCT  T.TRIP_ID, WDA.DELIVERY_ID 
    	FROM	apps.WSH_DELIVERY_ASSIGNMENTS WDA, 
          	apps.WSH_DELIVERY_LEGS DL,
          	apps.WSH_TRIP_STOPS PICKUP_STOP,  
             	apps.WSH_TRIPS T
	WHERE	WDA.DELIVERY_ID     = DL.DELIVERY_ID 
        AND     DL.PICK_UP_STOP_ID  = PICKUP_STOP.STOP_ID 
        AND     PICKUP_STOP.TRIP_ID = T.TRIP_ID
        AND     T.TRIP_ID  >=  : P_TRIPID_FR
        AND     T.TRIP_ID  <=  : P_TRIPID_TO )  TRP

The Good Plan for this subquery is a Sort Merge Join on WSH_TRIPS & WSH_TRIP_STOPS. Both the tables are accessed via an Index on TRIP_ID. Note that though the range predicate is on TRIP_ID of WSH_TRIPS, the column is joined to TRIP_ID of WSH_TRIP_STOPS and therefore, the optimizer scans an Index on these columns to get the relevant rows from the table. The Optimizer calculation says 72 rows from WSH_TRIPS and 187 rows from WSH_TRIP_STOPS, which are merged join to get final 72 rows from the two tables. This calculation is based on the Input values that Optimized peeked into during the hard parse. For the same set of input values, we have a 10053 trace of the Bad Plan and therefore, I immediately concentrated on the cardinality calculation for the two tables mentioned here. Surprisingly, the first caveat that I observed in the 10053 trace was missing Bind Peeked Data and this was enough to get into the Optimizer calculations.

The Peeked Bind Values section of the 10053 trace shows following information.

*******************************************
Peeked values of the binds in SQL statement
*******************************************
----- Bind Info (kkscoacd) -----
 Bind#0
  oacdty=02 mxl=22(01) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000000 frm=00 csi=00 siz=72 off=0
  No bind buffers allocated
 Bind#1
  oacdty=02 mxl=22(01) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000000 frm=00 csi=00 siz=0 off=24
  No bind buffers allocated
 Bind#2
  oacdty=02 mxl=22(01) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000000 frm=00 csi=00 siz=0 off=48
  No bind buffers allocated
 Bind#3
  No oacdef for this bind.
 Bind#4
  No oacdef for this bind.

When the Query was executed from SQL*Plus, the values for the Binds were populated and were visible in the Peeked Bind values section of 10053 trace. The missing information means, the Optimizer would fall back to the default calculation for a range predicate. From the trace, the computed cardinality for the two tables, WSH_TRIPS & WSH_TRIP_STOPS, are as under :

Access path analysis for WSH_TRIP_STOPS
***************************************
Access path analysis for WSH_TRIP_STOPS
***************************************
SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for WSH_TRIP_STOPS[PICKUP_STOP] 

  Table: WSH_TRIP_STOPS  Alias: PICKUP_STOP
    Card: Original: 4666397.000000  Rounded: 11666  Computed: 11665.99  Non Adjusted: 11665.99

Access path analysis for WSH_TRIPS
***************************************
SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for WSH_TRIPS[T] 

  Table: WSH_TRIPS  Alias: T
    Card: Original: 1802653.000000  Rounded: 4507  Computed: 4506.63  Non Adjusted: 4506.63

In the absence of the actual values provided to the Bind Variables, the Optimizer computed the Cardinality based on defaults, which in this case is 0.25% (5% for >= and 5% of = : P_TRIPID_FR and TRIP_ID <= : P_TRIPID_TO is too high. Another predicate in the Outer query is ORGANIZATION_ID and based on this predicate, optimizer generates a plan with the smallest rowsource first. The smallest rowsource is HR_LOCATIONS_ALL with 919 rows and is joined to WSH_DELIVERY_DETAILS table, which is accessed via hinted index WSH_DELIVERY_DETAILS_TI4.

To check and confirm this behaviour, I executed an EXPLAIN PLAN FOR for this query, without setting the values for the bind variables, from SQL*Plus. With no values to the Bind, the EXPLAIN PLAN FOR computed the BAD plan from SQL*Plus. Therefore, now I was sure that the problem occurs when the peeked values to the bind are not available. The next concern was “Why Oracle Report does not peek into the Bind Variables before actually generating the plan ?”.

I assume this to be a default behaviour, where the query is parsed first and then binded. Unfortunately, there is not much written on this. A better option is to write a query in an efficient way or hint the query. To resolve the issue, I modified the query and this modified version worked perfectly well on both the environments i.e. SQL*Plus and Oracle Reports. The modification was in the Subquery TRP. The change is pasted below.

    (SELECT TRIP_ID, DELIVERY_ID FROM 
    (SELECT T.TRIP_ID, WDA.DELIVERY_ID, row_number() over(partition by t.trip_id, wda.delivery_id) rnum 
                FROM   apps.WSH_DELIVERY_ASSIGNMENTS WDA, 
                                apps.WSH_DELIVERY_LEGS DL,
                                apps.WSH_TRIP_STOPS PICKUP_STOP,  
                                apps.WSH_TRIPS T
                WHERE   WDA.DELIVERY_ID     = DL.DELIVERY_ID 
        AND     DL.PICK_UP_STOP_ID  = PICKUP_STOP.STOP_ID 
        AND     PICKUP_STOP.TRIP_ID = T.TRIP_ID
        AND     T.TRIP_ID  >=  : P_TRIPID_FR
        AND     T.TRIP_ID  <=  : P_TRIPID_TO)
     WHERE RNUM=1)  TRP

The modification made sure that the TRP subquery becomes the Driving Query. The relevant portion of the execution plan post modification is attached below. The calculation for WSH_TRIPS is still at 0.25% and is therefore computed as 4507, even then, the Optimizer takes this as a Driving Table, which is what we wanted. This again leads to a common argument, which is, Optimizer is a piece of Code that works on Statistics. It is the way we write a query that dominates the Optimizer.


|* 19 |                  WINDOW SORT PUSHED RANK         |                             | 26689 |
|* 20 |                   FILTER                         |                             |       |
|  21 |                    NESTED LOOPS                  |                             | 26689 |
|  22 |                     NESTED LOOPS                 |                             |  7968 |
|  23 |                      NESTED LOOPS                |                             |  4516 |
|* 24 |                       INDEX RANGE SCAN           | WSH_TRIPS_U1                |  4507 |
|  25 |                       TABLE ACCESS BY INDEX ROWID| WSH_TRIP_STOPS              |     1 |
|* 26 |                        INDEX RANGE SCAN          | WSH_TRIP_STOPS_N1           |     1 |
|  27 |                      TABLE ACCESS BY INDEX ROWID | WSH_DELIVERY_LEGS           |     2 |
|* 28 |                       INDEX RANGE SCAN           | WSH_DELIVERY_LEGS_N2        |     2 |
|* 29 |                     INDEX RANGE SCAN             | WSH_DELIVERY_ASSIGNMENTS_N1 |     3 |

  20 - filter(:P_TRIPID_FR=:P_TRIPID_FR AND "T"."TRIP_ID"=:P_TRIPID_FR AND "PICKUP_STOP"."TRIP_ID"<=:P_TRIPID_TO))
  28 - access("DL"."PICK_UP_STOP_ID"="PICKUP_STOP"."STOP_ID")
  29 - access("WDA"."DELIVERY_ID"="DL"."DELIVERY_ID")
       filter("WDA"."DELIVERY_ID" IS NOT NULL)

Filed under Optimizer, Performance Tagged with cardinality, Cost Based Optimizer, Optimizer, Oracle Reports, Performance, Query Optimization

Yet another Query Plan Change ! Cost Based Optimization…

December 6, 2010 7 Comments

Optimizer relies on Statistics to compute an Optimal Plan, which includes : Choice between Index Scan or Table Scan, in case of Index Scan, which index to choose, and more importantly, the Join Order and Join Type. Therefore, it is quite evident that accurate statistics means a good and accurate estimation of Cardinality, which in turn, results in optimal execution plan. But there are cases, when accurate statistics is not good enough for an Optimizer to come out with an Optimal plan. In this blog, I will post a real life issue which is an interesting case, in which, the Optimizer generated plan, impacted the performance of a query whereas the underlying object statistics were accurate. Accurate statistics means the the statistics that reflect the accurate picture of the data in the underlying table.

In many of the events that I presented, one of the common statement that I emphasize is “Differentiate Performance Symptom and Performance Problem”. In this case too, the inhouse team was investigating and focussing more on the Symptom while the problem was somewhere else. We will walk through this as well.

Recently, at a customer site, the performance of one of a critical performance, that runs every few minutes, got impacted. The inhouse adminitrator time could investigate that execution time of one particular query, that usually takes less than a second, changed to 30-40 minutes. Fortunately, the customer maintains Query Baselines and this helped them further delve into the issue and they observed that the plan was changed. While they were investigating the issue and the reason for this plan change, the issue was temporarily fixed using SQL Plan Management. The customer wanted to know the reason for this behaviour, so as to avoid, this in future.

This was a custom process. The original query and the good / bad plan is pasted below.

SELECT WIP_ENTITY_NAME FROM WIP_DJ_CLOSE_TEMP 
WHERE	WIP_ENTITY_ID IN
	(SELECT wdct.WIP_ENTITY_ID            
	FROM	WIP_TRANSACTIONS wt,
		WIP_DJ_CLOSE_TEMP wdct             
	WHERE	wdct.GROUP_ID = :b1             
	AND	wdct.ORGANIZATION_ID = :b2
	AND	wdct.WIP_ENTITY_ID = wt.WIP_ENTITY_ID             
	AND	wt.ORGANIZATION_ID = :b3             
	AND	wt.TRANSACTION_DATE > wdct.ACTUAL_CLOSE_DATE   
	UNION    
	SELECT wdct.WIP_ENTITY_ID            
	FROM	MTL_MATERIAL_TRANSACTIONS mmt,
		WIP_DJ_CLOSE_TEMP wdct             
	WHERE	wdct.GROUP_ID = :b1             
	AND	wdct.ORGANIZATION_ID = :b2
	AND	wdct.WIP_ENTITY_ID = mmt.TRANSACTION_SOURCE_ID            
	AND	mmt.TRANSACTION_SOURCE_TYPE_ID = 5             
	AND	mmt.ORGANIZATION_ID= :b3
	AND mmt.TRANSACTION_DATE > wdct.ACTUAL_CLOSE_DATE)

Good Plan

Plan hash value: 2500423228

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name                         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                              |       |       |  6017 (100)|          |       |       |
|   1 |  NESTED LOOPS                           |                              |       |       |            |          |       |       |
|   2 |   NESTED LOOPS                          |                              |    70 |  1960 |  6017   (1)| 00:00:36 |       |       |
|   3 |    VIEW                                 | VW_NSO_1                     |    70 |   910 |  5947   (1)| 00:00:35 |       |       |
|   4 |     SORT UNIQUE                         |                              |    70 |  3040 |  5947  (92)| 00:00:35 |       |       |
|   5 |      UNION-ALL                          |                              |       |       |            |          |       |       |
|   6 |       NESTED LOOPS                      |                              |       |       |            |          |       |       |
|   7 |        NESTED LOOPS                     |                              |    45 |  1890 |   490   (1)| 00:00:03 |       |       |
|*  8 |         TABLE ACCESS BY INDEX ROWID     | WIP_DJ_CLOSE_TEMP            |    80 |  1920 |     9   (0)| 00:00:01 |       |       |
|*  9 |          INDEX RANGE SCAN               | WIP_DJ_CLOSE_TEMP_N1         |   797 |       |     2   (0)| 00:00:01 |       |       |
|* 10 |         INDEX RANGE SCAN                | WIP_TRANSACTIONS_N1          |    24 |       |     0   (0)|          |       |       |
|* 11 |        TABLE ACCESS BY INDEX ROWID      | WIP_TRANSACTIONS             |     1 |    18 |     6   (0)| 00:00:01 |       |       |
|  12 |       NESTED LOOPS                      |                              |       |       |            |          |       |       |
|  13 |        NESTED LOOPS                     |                              |    25 |  1150 |  5455   (1)| 00:00:32 |       |       |
|* 14 |         TABLE ACCESS BY INDEX ROWID     | WIP_DJ_CLOSE_TEMP            |    80 |  1920 |     9   (0)| 00:00:01 |       |       |
|* 15 |          INDEX RANGE SCAN               | WIP_DJ_CLOSE_TEMP_N1         |   797 |       |     2   (0)| 00:00:01 |       |       |
|  16 |         PARTITION RANGE ITERATOR        |                              |     2 |       |    67   (0)| 00:00:01 |   KEY |    36 |
|* 17 |          INDEX RANGE SCAN               | MTL_MATERIAL_TRANSACTIONS_N2 |     2 |       |    67   (0)| 00:00:01 |   KEY |    36 |
|* 18 |        TABLE ACCESS BY LOCAL INDEX ROWID| MTL_MATERIAL_TRANSACTIONS    |     1 |    22 |    68   (0)| 00:00:01 |     1 |     1 |
|* 19 |    INDEX UNIQUE SCAN                    | WIP_DJ_CLOSE_TEMP_U1         |     1 |       |     0   (0)|          |       |       |
|  20 |   TABLE ACCESS BY INDEX ROWID           | WIP_DJ_CLOSE_TEMP            |     1 |    15 |     1   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------------------------

Bad Plan

Plan hash value: 821054275

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name                         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                              |       |       |  7295 (100)|          |       |       |
|   1 |  NESTED LOOPS                           |                              |    19 |   532 |  7295   (1)| 00:00:43 |       |       |
|   2 |   VIEW                                  | VW_NSO_1                     |    19 |   247 |  7276   (1)| 00:00:43 |       |       |
|   3 |    SORT UNIQUE                          |                              |    19 |   826 |  7276  (88)| 00:00:43 |       |       |
|   4 |     UNION-ALL                           |                              |       |       |            |          |       |       |
|*  5 |      TABLE ACCESS BY INDEX ROWID        | WIP_TRANSACTIONS             |     1 |    18 |     6   (0)| 00:00:01 |       |       |
|   6 |       NESTED LOOPS                      |                              |    12 |   504 |   930   (1)| 00:00:06 |       |       |
|*  7 |        TABLE ACCESS BY INDEX ROWID      | WIP_DJ_CLOSE_TEMP            |   152 |  3648 |    16   (0)| 00:00:01 |       |       |
|*  8 |         INDEX RANGE SCAN                | WIP_DJ_CLOSE_TEMP_N1         |  1522 |       |     4   (0)| 00:00:01 |       |       |
|*  9 |        INDEX RANGE SCAN                 | WIP_TRANSACTIONS_N1          |    24 |       |     0   (0)|          |       |       |
|* 10 |      HASH JOIN                          |                              |     7 |   322 |  6344   (1)| 00:00:37 |       |       |
|* 11 |       TABLE ACCESS BY INDEX ROWID       | WIP_DJ_CLOSE_TEMP            |   152 |  3648 |    16   (0)| 00:00:01 |       |       |
|* 12 |        INDEX RANGE SCAN                 | WIP_DJ_CLOSE_TEMP_N1         |  1522 |       |     4   (0)| 00:00:01 |       |       |
|  13 |       PARTITION RANGE ALL               |                              | 25149 |   540K|  6327   (1)| 00:00:37 |     1 |    36 |
|  14 |        TABLE ACCESS BY LOCAL INDEX ROWID| MTL_MATERIAL_TRANSACTIONS    | 25149 |   540K|  6327   (1)| 00:00:37 |     1 |    36 |
|* 15 |         INDEX RANGE SCAN                | MTL_MATERIAL_TRANSACTIONS_N8 | 25149 |       |   207   (1)| 00:00:02 |     1 |    36 |
|  16 |   TABLE ACCESS BY INDEX ROWID           | WIP_DJ_CLOSE_TEMP            |     1 |    15 |     1   (0)| 00:00:01 |       |       |
|* 17 |    INDEX UNIQUE SCAN                    | WIP_DJ_CLOSE_TEMP_U1         |     1 |       |     0   (0)|          |       |       |
----------------------------------------------------------------------------------------------------------------------------------------

Before diving into the analysis, let us compare both the plans and this will be from the customer’s perspective. They compared the plans and noticed that the Tables and Indexes selected by the optimizer, in both the cases, are same except for MTL_MATERIAL_TRANSACTION Table. The good plan uses N2 Index while the bad plan uses N8. Therefore, the customer hinted the query with N2 index and fixed the plan using SPM. Once fixed, they started their investigation on the reason for this change from N2 and N8. As mentioned earlier, this was just a Symptom while the underlying problem was something else. This something triggered the change in Index Selection and we will investigate this next.

Analysis

Comparing both the plans, it was now evident that the problem is with the second part of the Query, therefore, I took this part out for investigation. This is now a simple and small query, as show below :

SELECT wdct.WIP_ENTITY_ID            
FROM	apps.MTL_MATERIAL_TRANSACTIONS mmt,
	wip.WIP_DJ_CLOSE_TEMP wdct             
WHERE	wdct.GROUP_ID = :b1             
AND	wdct.ORGANIZATION_ID = :b2
AND	wdct.WIP_ENTITY_ID = mmt.TRANSACTION_SOURCE_ID            
AND	mmt.TRANSACTION_SOURCE_TYPE_ID = 5             
AND	mmt.ORGANIZATION_ID= :b3             
AND	mmt.TRANSACTION_DATE > wdct.ACTUAL_CLOSE_DATE

The another change that I noticed here was the change of Nested Loop Join to Hash Join. The customer had a clone environment of this database which was 2 months old and I executed this query on the clone enviroment. The Query executed in less than a second on this and therefore, the investigation wa further made simpler for me. The Query plan on the clone and production is pasted below :

Production Plan : BAD
Plan hash value: 3088015387

------------------------------------------------------------------------------------
| Id  | Operation                           | Name                         | Rows  |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                              |       |
|*  1 |  HASH JOIN                          |                              |    49 |
|*  2 |   TABLE ACCESS BY INDEX ROWID       | WIP_DJ_CLOSE_TEMP            |   155 |
|*  3 |    INDEX RANGE SCAN                 | WIP_DJ_CLOSE_TEMP_N1         |  1552 |
|   4 |   PARTITION RANGE ALL               |                              | 24662 |
|   5 |    TABLE ACCESS BY LOCAL INDEX ROWID| MTL_MATERIAL_TRANSACTIONS    | 24662 |
|*  6 |     INDEX RANGE SCAN                | MTL_MATERIAL_TRANSACTIONS_N8 | 24662 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("WDCT"."WIP_ENTITY_ID"="MMT"."TRANSACTION_SOURCE_ID")
       filter("MMT"."TRANSACTION_DATE">"WDCT"."ACTUAL_CLOSE_DATE")
   2 - filter("WDCT"."ORGANIZATION_ID"=:ORGANIZATION_ID)
   3 - access("WDCT"."GROUP_ID"=:GROUP_ID)
   6 - access("MMT"."TRANSACTION_SOURCE_TYPE_ID"=5 AND "MMT"."ORGANIZATION_ID"=:ORGANIZATION_ID)

Clone DB Plan : GOOD
Plan hash value: 569619275

-----------------------------------------------------------------------------------
| Id  | Operation                          | Name                         | Rows  |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                              |       |
|   1 |  NESTED LOOPS                      |                              |       |
|   2 |   NESTED LOOPS                     |                              |    26 |
|*  3 |    TABLE ACCESS BY INDEX ROWID     | WIP_DJ_CLOSE_TEMP            |    80 |
|*  4 |     INDEX RANGE SCAN               | WIP_DJ_CLOSE_TEMP_N1         |   798 |
|   5 |    PARTITION RANGE ITERATOR        |                              |     2 |
|*  6 |     INDEX RANGE SCAN               | MTL_MATERIAL_TRANSACTIONS_N2 |     2 |
|*  7 |   TABLE ACCESS BY LOCAL INDEX ROWID| MTL_MATERIAL_TRANSACTIONS    |     1 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("WDCT"."ORGANIZATION_ID"=:ORGANIZATION_ID)
   4 - access("WDCT"."GROUP_ID"=:GROUP_ID)
   6 - access("WDCT"."WIP_ENTITY_ID"="MMT"."TRANSACTION_SOURCE_ID" AND "MMT"."ORGANIZATION_ID"=:ORGANIZATION_ID AND
              "MMT"."TRANSACTION_DATE">"WDCT"."ACTUAL_CLOSE_DATE")
   7 - filter("MMT"."TRANSACTION_SOURCE_TYPE_ID"=5)

Notice the estimated rows calculated by the optimizer for the driving Index and Table. In clone db, the optimizer says 798 rows from an Index TEMP_N1 and then 80 Rows from the Table Access component. On production, the optimizer computes 1552 rows from the driving Index TEMP_N1 and then 155 rows from the Table Access. The optimizer further computes that 155 Iteration of MTL_MATERIAL_TRANSACTION (Nested Loop) will be a costly affair and therefore opts for a Hash Join. Since the clone db was 2 months old, the statistics were also old. Once I was convinced that estimated cardinality for the driving table seems to be an issue, I imported the production stats to clone and simulated the issue. I imported the stats of only the WIP_DJ_CLOSE_TEMP table on clone and could simulate the performance impact. The Table and relevant Column Level statistics of WIP_DJ_CLOSE_TEMP is pasted below. I have removed unwanted column stats from this post. This is from the clone and production.

Clone Stats - Performance Optimal

SQL> @table_stats
Enter value for 1: WIP_DJ_CLOSE_TEMP
old   2: where table_name='&1'
new   2: where table_name='WIP_DJ_CLOSE_TEMP'

OWNER                          PAR   NUM_ROWS     BLOCKS LAST_ANAL GLO T
------------------------------ --- ---------- ---------- --------- --- -
WIP                            NO      118836        829 26-SEP-10 YES N

Elapsed: 00:00:00.23
SQL> @column_stats
Enter value for 1: WIP
old   3: where owner='&1'
new   3: where owner='WIP'
Enter value for 2: WIP_DJ_CLOSE_TEMP
old   4: and   table_name='&2'
new   4: and   table_name='WIP_DJ_CLOSE_TEMP'

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
------------------------------ ------------ ---------- ---------- ---------------
GROUP_ID                                149          0 .006711409 NONE
ORGANIZATION_ID                          10          0         .1 NONE

Production Stats - Performance sub-optimal

OWNER                          PAR   NUM_ROWS     BLOCKS LAST_ANAL GLO
------------------------------ --- ---------- ---------- --------- ---
WIP                            NO      116383        829 28-NOV-10 YES

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
------------------------------ ------------ ---------- ---------- ---------------
GROUP_ID                                 75          0 .013333333 NONE
ORGANIZATION_ID                          10          0         .1 NONE

Index Details of MTL_MATERIAL_TRANSACTIONS

INDEX_NAME                     COLUMN_NAME                    COLUMN_POSITION
------------------------------ ------------------------------ ---------------
MTL_MATERIAL_TRANSACTIONS_N2   TRANSACTION_SOURCE_ID                        1
                               ORGANIZATION_ID                              2
                               TRANSACTION_DATE                             3

MTL_MATERIAL_TRANSACTIONS_N8   TRANSACTION_SOURCE_TYPE_ID                   1
                               ORGANIZATION_ID                              2
                               TRANSACTION_DATE                             3
                               TRANSACTION_SOURCE_NAME                      4

Issue

As mentioned at the start of this post, while the statistics are relevant for selection between an Index Scan and Table Scan and in case of Index Scan, which Index to select; the most important factor is the Computed Cardinality of the driving table as this has an impact on the Join Order and Join Type. Further, the Join Type dictates the selection of an Index for the other tables. The difference between the Nested Loop Join and Hash Join, in terms of Index Selection is : Nested Loop Join will select rows from the driving table and then use an Index on the Join Column, if it exists (else it will be an hash join), of the second table. Therefore, in case of a good plan, with the Nested Loop Join in place, the optimizer opted for N2 Index of MTL_MATERIAL_TRANSACTION table which is an Index on Join Column. In case of Hash Join, the Optimizer will try to filter out as many rows as possible for both the tables using an Index on filter predicates and once the rows are filtered, will join these using hash function on the Join Columns. Therefore, in case of the Hash Join, the index N8 of MTL_MATERIAL_TRANSACTION is on filter column and then the rows filtered were joined using hash function on the Join Column. See the PREDICATE INFORMATION for both the Good and Bad Plans above and the Index Column details for MTL_MATERIAL_TRANSACTION pasted above.

The Calculation : Clone Old Stats

Let us get into the computed cardinality calculation for the driving table, as this had an impact on the Join Type. On clone the number of Rows are : 118836 and the WHERE predicate for this table is as under :

WHERE	wdct.GROUP_ID = :b1             
AND	wdct.ORGANIZATION_ID = :b2

The Table has an Index on GROUP_ID column and the calculation says 118836*1/149 = 797.55, rounded off to 798. Further, it has to filter based on non-indexed column, which is Organization_ID and the calculation says 798*1/10=79.8 rounded off to 80. The clone db (good) plan shows the same.

The Calculation : Production Stats

On Production, the computed cardinality calculation for the index scan step is 116383*1/75 = 1551.77, rounded off to 1552. Further, the filter based on non-indexed column is 1552*1/10 = 155.2, rounded off to 155. The production plan shows the same.

In the case of production statistics, the statistics were collected on Sunday and this triggered the change in plan. Further, these stats were accurate, even then, the optimizer came out with a plan that was sub-optimal. As mentioned, we could simulate the issue on Clone DB after importing the statistics of the driving table – WIP_DJ_CLOSE_TEMP. As a resolution and to check the impact, we modified the column level statistics of GROUP_ID column on clone db to 200 and the plan changed to good.

As the name of the driving table implies – WIP_DJ_CLOSE_TEMP, this is an interface table and the rows are deleted and inserted into this table. The table usually contain More or Less 0.1 Million Rows. Therefore, as a final action, once the column level statistics for GROUP_ID column was fixed on production, we locked the statistics of this table. This was to ensure that the plan of this query will not be impacted after periodic statistics generation and remains stable.

After manually setting the column level statistics for GROUP_ID column to 200, the plan is as under :

Plan hash value: 569619275

-----------------------------------------------------------------------------------
| Id  | Operation                          | Name                         | Rows  |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                              |       |
|   1 |  NESTED LOOPS                      |                              |       |
|   2 |   NESTED LOOPS                     |                              |    18 |
|*  3 |    TABLE ACCESS BY INDEX ROWID     | WIP_DJ_CLOSE_TEMP            |    58 |
|*  4 |     INDEX RANGE SCAN               | WIP_DJ_CLOSE_TEMP_N1         |   582 |
|   5 |    PARTITION RANGE ITERATOR        |                              |     2 |
|*  6 |     INDEX RANGE SCAN               | MTL_MATERIAL_TRANSACTIONS_N2 |     2 |
|*  7 |   TABLE ACCESS BY LOCAL INDEX ROWID| MTL_MATERIAL_TRANSACTIONS    |     1 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("WDCT"."ORGANIZATION_ID"=:ORGANIZATION_ID)
   4 - access("WDCT"."GROUP_ID"=:GROUP_ID)
   6 - access("WDCT"."WIP_ENTITY_ID"="MMT"."TRANSACTION_SOURCE_ID" AND "MMT"."ORGANIZATION_ID"=:ORGANIZATION_ID AND
              "MMT"."TRANSACTION_DATE">"WDCT"."ACTUAL_CLOSE_DATE")
   7 - filter("MMT"."TRANSACTION_SOURCE_TYPE_ID"=5)

Production Stats - Performance sub-optimal

OWNER                          PAR   NUM_ROWS     BLOCKS LAST_ANAL GLO
------------------------------ --- ---------- ---------- --------- ---
WIP                            NO      116383        829 28-NOV-10 YES

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
------------------------------ ------------ ---------- ---------- ---------------
GROUP_ID                                200          0 .013333333 NONE
ORGANIZATION_ID                          10          0         .1 NONE

The calculation now stands as Index Step : 116383*1/200 = 581.91, rounded off to 582. Table Access : 582*1/10 = 58.2, rounded off to 58. This resolved the issue.

In this scenario, Investigating the statistics on MTL_MATERIAL_TRANSACTIONS table would have resulted futile,as this change was just a symptom due to the problem with stats on the driving table.

Filed under Optimizer, Performance Tagged with cardinality, cbo, Cost Based Optimizer, Optimizer, Performance, Query Optimization

Cost based Optimizer ! Inefficient input yields inefficient Output…

September 4, 2010 Leave a comment

Since the time my old blog went inaccessible, and still is, I have been receiving several mails on my previous blogs that helped some of my readers relate and resolve their performance issues. They wanted access to my previous blogs and I have been requesting them to let me know any particular blog that was of interest and I may publish them right here. There were many requests to publish blogs on Cost Based Optimizer and therefore, thought of publishing some of them. While these blogs may not be relevant for the current versions of Oracle, there are still some production systems that are still on Oracle 8i and 9i, and these blogs may be of help for those instances. Wherever any of the Cost Based Optimization blog that are relevant for certain versions of Oracle, I will specifically mention them for the benefits of the readers.

This blog is relevant for Oracle 8i and 9i (without System Statistics)

Cost Based Optimizer has always been a mystery for most of the Database Administrators and Developers. The calculation made by the Optimizer to come-up with an Execution Plan has been doubted by many of the Oracle users, especially, when (to name a few) :

The Application Performance Degrades after Version Upgrade
Plan of a Query change after gathering of fresh Statistics, and
The Performance of a Query with higher cost is much better that the plan with a least cost

Cost Based Optimizer has improved a lot in previous few versions. Therefore, it can be said that CBO is still undergoing some enhancements. With these enhancements, Oracle Users have accepted the fact that the Application Performance might degrade after version upgrade and leave no room with rigorous testing of the application before production upgrade. It is fact no.2 and 3 that raises a doubt over Cost Based Optimization.

Cost Based Optimizer is a piece of code, wherein, you provide Inputs, by way of Optimizer Parameters, Statistics and it computes and generates an Output which is seen as an Execution Plan. It is not only the Optimizer Parameters and Statistics, which is passed as an Input. There is one more component, which is mostly overlooked, a QUERY. The initial input passed is the Query and based on the way it is written, optimizer does its cost calculation by taking into account the other two : Parameters and Statistics. In this blog, I will address the last two facts stated above, plus, a common misconception that Index Scan is better that Full Table Scan and finally, the issue with the critical first component passed as an input to CBO, A Query.

Recently, was working on a performance issue, related to a critical process. The overall performance of the system was OK, but one critical process started taking 18-20 hours, failing several times with ora-1555. This process, usually takes 1 hour 30 minutes. Therefore, all of a sudden, why this degradation ?, was the question raised by the customer.

The Database version is 8174. This is one of a critical process and a part of seeded application. It connects to a user, dedicated for this process, therefore, when this process is running, it is very easy to get the sid of the session executing this process. While monitoring the performance of this process, by querying v$session_wait for the sid, I could see one particular query doing exorbitant amount of Logical Reads. The text of the query and execution plan is as under :

explain plan for
select CS.CO_ID ,CS.SPCODE ,CS.SNCODE ,CS.TMCODE ,CS.CS_SEQNO ,CS.CS_STAT_CHNG ,
to_char(CS.TRIAL_END_DATE,'YYYYMMDDHH24MISS')
from       CR_SERVICES CS
where      (CS.CO_ID not  in (select CO_ID
                             from CO_ALL
                             where (CO_ID=CS.CO_ID and CUSTOMER_ID=DEALER_ID))
and        CS.SNCODE in (select distinct SNCODE  from MPV
                     where (SNCODE=CS.SNCODE and RATING_IND='Y')));

Explained.

SQL> @?/rdbms/admin/utlxpls

Plan Table
------------------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |                    |     1K|   77K| 141575 |       |       |
|  FILTER                   |                    |       |      |        |       |       |
|   NESTED LOOPS            |                    |     1K|   77K| 141575 |       |       |
|    TABLE ACCESS FULL      |MPV                 |    74 |  444 |     13 |       |       |
|    TABLE ACCESS BY INDEX R|CR_SERVICES         |     3M|  120M|   1913 |       |       |
|     INDEX RANGE SCAN      |I1_CR_SERVICES      |     3M|      |   1616 |       |       |
|   FILTER                  |                    |       |      |        |       |       |
|    INDEX RANGE SCAN       |IDX_CUST_CO_DEALER  |     1 |   14 |      2 |       |       |
--------------------------------------------------------------------------------

There are three tables in the query. The Driving Table, from the explain plan, MPV is a small table with 220 rows and 80 Blocks, while the other two are huge tables with 69 Million and 30 Million Rows. Usually, such situation can crop-up due to following reasons :

Change in Optimizer Parameters
Change in Statistics
Creation of New Index

Initial Investigation revealed that the Statistics of CR_SERVICES table and indexes were gathered the previous night. Once this was confirmed, Cost Based Optimization was at the receiving end. This is quite natural as without any change in the application query, the plan changed for bad, due to CBO. Therefore, thorough investigation was required about the cause of this plan change. Seeing the query and other optimizer statistics, Full Table Scan on CR_SERVICES would have been a better choice and therefore, my investigation was around this. Let us revisit the relevant portion of the Execution Plan, the statistics and the Query.

## Table Statistics of CR_SERVICES

OWNER                          PAR   NUM_ROWS     BLOCKS 
------------------------------ --- ---------- ---------- 
SM                             NO    69875510    1107733 

## Column Stats of the Columns used on the Query

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS    DENSITY
------------------------------ ------------ ---------- ----------
CO_ID                               6863377          0 1.4570E-07
SNCODE                                  370          0 .002702703  @?/rdbms/admin/utlxpls

Plan Table
------------------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
|   NESTED LOOPS            |                    |     1K|   77K| 141575 |       |       |
|    TABLE ACCESS FULL      |MPV                 |    74 |  444 |     13 |       |       |
|    TABLE ACCESS BY INDEX R|CR_SERVICES         |     3M|  120M|   1913 |       |       |
|     INDEX RANGE SCAN      |I1_CR_SERVICES      |     3M|      |   1616 |       |       |

The Query is using an Index on SNCODE column, which is a very low cardinality column. Therefore, this was my primary target and I worked on the Optimizer Calculation to get the possible cause of this Index Scan.

Optimizer Calculation


The Cost Calculation of an Index Scan and Table Scan via Index is

Index Cost = Blevel+ceil(leaf_blocks*ix_selectivity) 
Table Scan = ceil(clustering_factor*table_selectivity) 

select 3+ceil(596745*1/370) "Index Scan Cost", ceil(40527179*1/370) from dual;
Index Cost = 1616 ## This matches the execution plan
Table Scan = 109533
Final Cost = 1616+109533=111149

select 3+ceil(596745*1/370) "Index Scan Cost", ceil(40527179*1/370*1/370) from dual;
Index Cost = 1616
Table Scan = 297
Final Cost = 1616+297=1913 ## This Matches with the Cost calculated by the Optimizer

This additional 1/370, was due to the fact that SNCODE column is used twice in the query. One for Index Filter, therefore considered only once for an index cost, and twice for Table cost due to SNCODE=CS.SNCODE join. The final Nested Loop Cost is 141575 and this is calculated as

NL Cost = Cost of an Outer Table + (Cost of Inner Table * Cardinality of Outer Table)
NL Cost = 13 + (1913*74)
NL Cost = 141575

Cost of Full Table Scan = Num Blocks / 6.59 (8k block size)
Cost of Full Table Scan = 1107733 / 6.59
Cost of Full Table Scan = 168093.02

Clearly, the cost of a Full Table Scan is higher than the Nested Loop Join and optimizer opted for the cheaper costed plan, which is, a Nested Loop Join and this requires an Index Scan of an Inner Table. When I checked the execution plan of this query with a FULL hint on CR_SERVICES, the FTS cost was 168200. I am yet to figure out the difference between my calculation and optimizer calculation, but for investigation purpose, I considered the calculation done by the optimizer. Based on this calculation, I came out with three possible solutions, these are :

Modify the Query (Immediately Not possible, as it is a seeded query)
Change Optimizer Statistics
Change Optimizer Parameters

The first optimization was a better and safe approach, but was not possible immediately. The other two required some testing, as change in Optimizer Statistis can have negative impact on other queries using these objects and, change in optimizer parameters at db level can have an global impact on the overall performance. As mentioned earlier, this process connects to a user dedicated for this process, therefore, ON-LOGON trigger with the Optimizer Parameter change at session level was thought of as a better option without impacting overall application performance. The parameter, I could immediately think of was, OPTIMIZER_INDEX_COST_ADJ. This parameter was default i.e.100. For this query, to force Index Cost be Costlier than FTS, the calculation is OICA = (FTS Cost/NL Cost)*100 = (168200/141575)*100 = 118.80. Any value above this, would compute the NL cost be costlier than Full Table Cost and would opt for FTS and Hash Join. Initially, the Change was done at session level and the plan changed.

Plan Table
------------------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |                    |     1K|   77K| 168219 |       |       |
|  FILTER                   |                    |       |      |        |       |       |
|   HASH JOIN               |                    |     1K|   77K| 168219 |       |       |
|    TABLE ACCESS FULL      |MPV                 |    74 |  444 |     13 |       |       |
|    TABLE ACCESS FULL      |CR_SERVICES         |     3M|  120M| 168200 |       |       |
|   FILTER                  |                    |       |      |        |       |       |
|    INDEX RANGE SCAN       |IDX_CUST_CO_DEALER  |     1 |   14 |      2 |       |       |
--------------------------------------------------------------------------------

Finally, ON-LOGON trigger was created to set this parameter to 120, whenever, this user logged on to the database. The process, when re-executed, completed in 1 Hour 15 minutes and the desired performance was achieved.

Optimizer Calculation with OICA as 120


NL Table Scan Cost = (Current Cost of Inner Table Scan * Outer Table Card)*OICA/100
NL Table Scan Cost = (1913*74)*120/100
NL Table Scan Cost = (1913*74)*120/100
NL Table Scan Cost = 169874.4 > 168200 (FTS)

With this setting, the plan changed. Now for the reverse engineering. I wanted to check for any of the Index Statistics, say Clustering Factor, that would have calculated NL Join costlier than Full Table Scan. The Cost of NL Join with OICA is 169874.4. The cost of Table Scan via Index should be round(169874.4/74), which is 2296. Index Scan Cost is 1616, so the Table Scan cost component should be 2296-1616 is 680. Based on this, I calculated that if the Clustering Factor is changed to 93092000, then without any parameter setting, the query would go for a Full Table Scan. 

SQL> exec dbms_stats.SET_INDEX_STATS('SM','I1_CR_SERVICES',CLSTFCT=>93092000);

PL/SQL procedure successfully completed.

SQL> @index_stats
Enter value for o: SM
old   3: where table_owner='&O'
new   3: where table_owner='SM'
Enter value for t: CR_SERVICES
old   4: and   table_name='&T'
new   4: and   table_name='CR_SERVICES'

INDEX_NAME                       NUM_ROWS     BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
------------------------------ ---------- ---------- ----------- ------------- -----------------
I1_CR_SERVICES                   70724121          3      596745      11207514          93092000 

12 rows selected.

explain plan for
select CS.CO_ID ,CS.SPCODE ,CS.SNCODE ,CS.TMCODE ,CS.CS_SEQNO ,CS.CS_STAT_CHNG ,
to_char(CS.TRIAL_END_DATE,'YYYYMMDDHH24MISS')
from       CR_SERVICES CS
where      (CS.CO_ID not  in (select CO_ID
                             from CO_ALL
                             where (CO_ID=CS.CO_ID and CUSTOMER_ID=DEALER_ID))
and        CS.SNCODE in (select distinct SNCODE  from MPV
                     where (SNCODE=CS.SNCODE and RATING_IND='Y')));

Explained.

SQL> @?/rdbms/admin/utlxpls

Plan Table
------------------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |                    |     1K|   77K| 168219 |       |       |
|  FILTER                   |                    |       |      |        |       |       |
|   HASH JOIN               |                    |     1K|   77K| 168219 |       |       |
|    TABLE ACCESS FULL      |MPV                 |    74 |  444 |     13 |       |       |
|    TABLE ACCESS FULL      |CR_SERVICES         |     3M|  120M| 168200 |       |       |
|   FILTER                  |                    |       |      |        |       |       |
|    INDEX RANGE SCAN       |IDX_CUST_CO_DEALER  |     1 |   14 |      2 |       |       |
--------------------------------------------------------------------------------

11 rows selected.

Back to the problem, does it look like to be an Optimizer Issue ? This table has a very high growth rate. While the num_blocks value of a table grows faster, the statistics of Leaf Blocks, BLevel and Clustering Factor does not change significantly. This is enough to blow up the cost of a Full Table Scan. If you see the query and the relevant portion of the query, pasted above, the join on SNCODE is really not required as it uses an IN clause. This unwanted join, forced the optimizer to consider the Density of SNCODE twice in Table Scan cost calculation.

## Relevant portion of the Query and Plan

and        CS.SNCODE in (select distinct SNCODE  from MPV
                     where (SNCODE=CS.SNCODE and RATING_IND='Y')));

## Optimizer Calculation (1/370 is considered twice to get Table Scan cost of 297, thus making it cheaper).
Index Cost = Blevel+ceil(leaf_blocks*ix_selectivity) 
Table Scan = ceil(clustering_factor*table_selectivity) 

select 3+ceil(596745*1/370) "Index Scan Cost", ceil(40527179*1/370*1/370) from dual;
Index Cost = 1616
Table Scan = 297
Final Cost = 1616+297=1913 

If this portion of the query is modified as :


and        CS.SNCODE in (select distinct SNCODE  from MPV
                     where (RATING_IND='Y')));

## Optimizer Calculation, with this change
Index Cost = Blevel+ceil(leaf_blocks*ix_selectivity) 
Table Scan = ceil(clustering_factor*table_selectivity) 

select 3+ceil(596745*1/370) "Index Scan Cost", ceil(40527179*1/370) from dual;
Index Cost = 1616
Table Scan = 109533
Final Cost = 1616+109533= 111149

## With this Cost, the cost of NL Join would have been

NL Cost = Cost of an Outer Table + (Cost of Inner Table * Cardinality of Outer Table)
NL Cost = 13 + (111149*74)
NL Cost = 8225039 > 168200 (FTS)

One unwanted Join Condition made Cost based Optimizer to come up with an in-efficient plan. Therefore, as mentioned earlier, my Initial and Crucial input to the optimizer is a Query and if these are not optimally written, Cost Based Optimizer is bound to compute an sub-optimal plan. The Query, modified by removing unwanted join and no parameter change or statistics change, gave the desired and better plan (see below ).

explain plan for
select CS.CO_ID ,CS.SPCODE ,CS.SNCODE ,CS.TMCODE ,CS.CS_SEQNO ,CS.CS_STAT_CHNG ,
to_char(CS.TRIAL_END_DATE,'YYYYMMDDHH24MISS')
from       CR_SERVICES CS  
where      (CS.CO_ID not  in (select CO_ID
                               from CO_ALL
                               where (CO_ID=CS.CO_ID and CUSTOMER_ID=DEALER_ID))
and        CS.SNCODE in (select distinct SNCODE  from MPV
                       where (RATING_IND='Y')));

Explained.

SQL> @?/rdbms/admin/utlxpls

Plan Table
------------------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |                    |     1K|   77K| 168219 |       |       |
|  FILTER                   |                    |       |      |        |       |       |
|   HASH JOIN               |                    |     1K|   77K| 168219 |       |       |
|    TABLE ACCESS FULL      |MPV                 |    74 |  444 |     13 |       |       |
|    TABLE ACCESS FULL      |CR_SERVICES         |     3M|  120M| 168200 |       |       |
|   FILTER                  |                    |       |      |        |       |       |
|    INDEX RANGE SCAN       |IDX_CUST_CO_DEALER  |     1 |   14 |      2 |       |       |
--------------------------------------------------------------------------------

Many developers are of the Opinion that an Execution Plan with cheaper cost is better than a Plan with higher cost, and Index Scans are better than Full Table Scans. This blog demonstrates that these are just misconceptions and it is the application design and knowledge that should be taken into account while writing an effecient code.

To sum up, it is just not the Optimizer Statistics and Parameter that control the Cost Based Optimizer, Queries play an important role too. A well written code should never bother or doubt about Cost Based Optimization.

Filed under Optimizer, Performance Tagged with cardinality, Cost Based Optimizer, cost based optmizer, optimizer_index_cost_adj, Performance, Query Optimization

PL/SQL Parallelism made easier in 11g

September 3, 2010 4 Comments

Today was the first day of the two day AIOUG Event Sangam 2010. Jonathan Lewis was the Key Speaker this year and I presented a session on “Developing a Scalable Application”. One of my slide mentioned about PL/SQL Parallelism. This is in continuation to my previous blog where I posted about Manual Parallelism that takes the ROWID’s of the table and distributes it across multiple chunks. The limitation with Manual Parallelism is that even though a pl/sql block needs to process 5% to 10% of the rows from a huge table, the Manual Parallelism will select and split the ROWIDs of the entire table. This means, many of the chunks may process very less data or almost zero data.

Oracle Database 11g Release 2 introduced a package dbms_parallel_execute, that makes it easier to split a table into multiple chunks, based on ROWID’s or SQL or by Number Column. During my AIOUG session, I demonstrated the use of this package to split a table into multiple chunks and schedule a batch process in parallel. The best part of this package is that the chunks can be split based on a condition and only the chunks that contain to-be-processed data will be created. Thus the rows to be processed by each of the chunks will have uniform distribution.

As mentioned in my earlier blog, I will reiterate the benefit of pl/sql parallelism, which is, optimal use of available resources. In this blog, I will post the test case that I executed during the AIOUG session and this should help you reproduce this in your environment.

In this example, consider a Nightly Batch Job NIGHTLY_JOB and this job processes the rows from VIVEK_TEST Table using an approach which is widely implemented by the developers or the dba’s. This is single job processing the required rows and thus is a very slow and an inefficient approach. There are 7200 rows in this table to be processed and assuming that the business logic in the pl/sql takes atleast 1 second and then moves on to another row. Which means, at a minimum, each row to be processed will take atleast 1 second, thus the entire process will take atleast 7200 second (2 Hours). This is the code of a batch process :

create table vivek_test pctfree 50 pctused 50 as
select * from all_objects a
where rownum=1000 then
    commit;
    l_limit:=0;
  end if;
  end loop;
end;
/
exec nightly_batch;

Once the job is executed, you can monitor the progress of this batch process from another session. V$SESSION will report the progress in MODULE column, as posted by dbms_application_info procedure.

The changes required to parallize the process is pasted below. To monitor the progress and maintain the log of each of the CHUNKS, I have created a process_status table. One thing worth noticing is the PCTFREE and PCTUSED setting, which is set to 50. The reason I have this setting is that the number of chunks to be created is dependant on the number of extents in the table and I wanted multiple extents for my table. In a production environment, number of extents should not be an issue.

create table process_status (
process_name		varchar2(30),
job_id			number,
s_rowid			rowid,
e_rowid			rowid,
start_time		date,
end_time		date,
processed_rows		number,
completed_yn		varchar2(1),
status			varchar2(1000));

create or replace procedure parallel_job(chks in number) is
l_statement	varchar2(1000);
l_jobno		number;
l_retry		number;
l_status	   number;
l_chunkid	   number;
l_srowid	   rowid;
l_erowid	   rowid;
l_anyrows   	boolean;
l_errcode	   pls_integer;
begin

  execute immediate 'truncate table process_status';
  /* CREATE A PARALLEL TASK */
  dbms_parallel_execute.create_task('NIGHT_JOB');

  dbms_parallel_execute.create_chunks_by_rowid('NIGHT_JOB','VIVEK','VIVEK_TEST',true, chks);
 /* INSERT THE CHUNKS INTO PROCESS_STATUS TO LOG THE STATUS OF EACH JOB */
  insert into process_status(process_name, job_id, s_rowid, e_rowid)
  select 'NIGHT_JOB', chunk_id, start_rowid, end_rowid from user_parallel_execute_chunks
  where task_name='NIGHT_JOB' order by 1;
  commit;
  /* Run the Procedure here */
  LOOP
      -- Get a chunk to process; if there is nothing to process, then exit the 
      -- loop;
      DBMS_PARALLEL_EXECUTE.GET_ROWID_CHUNK('NIGHT_JOB',l_chunkid, l_srowid, l_erowid,l_anyrows);
      IF (l_anyrows = false) THEN EXIT; END IF;
      BEGIN
	l_statement:='nightly_batch('||l_chunkid||','||''''||l_srowid||''''||','||''''||l_erowid||''''||');';
	dbms_job.submit(l_jobno,l_statement);
	update process_status set completed_yn='P', start_time=sysdate where job_id=l_chunkid;
	commit;
        DBMS_PARALLEL_EXECUTE.SET_CHUNK_STATUS('NIGHT_JOB',l_chunkid,DBMS_PARALLEL_EXECUTE.PROCESSED);
      EXCEPTION WHEN OTHERS THEN
        DBMS_PARALLEL_EXECUTE.SET_CHUNK_STATUS('NIGHT_JOB', l_chunkid,
          DBMS_PARALLEL_EXECUTE.PROCESSED_WITH_ERROR, SQLCODE, SQLERRM);
	  l_errcode:=sqlcode;
  	update process_status set completed_yn='F', end_time=sysdate, status='FAILED WITH ERROR '||l_errcode||'.' where job_id=l_chunkid;
      END;
  commit;
  END LOOP;
  dbms_parallel_execute.drop_task('NIGHT_JOB');
end;
/

The only change required in our NIGHTLY_BATCH is the introduction of INPUT parameters and update of the process_status at the completion or failure of a chunk.

create or replace procedure nightly_batch(jobid number, low_rowid rowid, high_rowid rowid) is
l_count		number:=0;
l_module	    varchar2(30);
l_limit		number:=0;
l_errcode	    pls_integer;
cursor c_main is
select rowid, owner, object_name, object_type from vivek_test
where nvl(processed_flag,'N')='N'
and	rowid between low_rowid and high_rowid;
begin
  for i in c_main
  loop
  l_count:=l_count+1;
  l_limit:=l_limit+1;
  l_module:=l_count||' out of 7200/40=180';
  dbms_application_info.set_module('PL_SQL',l_module);
  dbms_lock.sleep(1); /* Assuming Processing of a row takes 1 second */
  update vivek_test set processed_flag='Y' where rowid = i.rowid;
  if l_limit>=1000 then
    commit;
    l_limit:=0;
  end if;
  end loop;
  update process_status set completed_yn='Y', end_time=sysdate, processed_rows=l_count 
  where job_id = jobid;
  commit;
  exception when others then
  l_errcode:=sqlcode;
  update process_status set completed_yn='F', status='FAILED WITH '||l_errcode||'.' where job_id = jobid;
  commit;
end;
/
exec parallel_job(100);

Monitor the progress of this job and you would see a significant improvement as the job completes in less than 5 minutes, as against 2 Hours, and utilizes the hardware resources optimally.

Filed under AIOUG, Performance Tagged with Performance, Rowid Parallelism

Manual Parallelism – Rowid Based for Partitioned / Sub Partitioned Tables

August 27, 2010 2 Comments

Currently, I am working on an optimization project where few Scheduler Jobs are Scheduled and Run every night at 9:00 pm using dbms_scheduler. These jobs are then stopped every day morning 9:00 a:m, so that, these do not impact the OLTP sessions. Customer has implemented VPD and each of these job is scheduled from multiple users, so that, the records are processed based on the policy function and the access predicates for that user. The flaw with this logic is that the total rows to be processed across each of the users are not uniform, which means, if a job for all the users takes 45-50 minutes, one user processing takes almost 3-4 hours.

For optimizing these, the first thought came to the mind was to distribute the rows to be processed uniformly across each of the jobs and then to apply vpd predicates at runtime. This required splitting of the main processing cursor into multiple chunks. Therefore, the table involved in the main cursor query was taken out with an intention to use ROWID Manual Parallelism, introduced by Thomas Kyte. This would help us distribute the rows uniformly across each of the jobs with an efficient use of available resources. I demonstrated a case study on one of my previous blog, where I used this to purge data from a table. The link is as under :

I have come across many pl/sql procedures that processes data and take certain amount of time that can be brought down. Moreover, whenever these procedures are scheduled or run, the CPU Utilization is very less. I remember a discussion with a customer, wherein, they mentioned that one of my “Annual Interest Calculation” job is running very slow and the CPU is 90 to 95% IDLE. This is an expected behaviour as a job, when run as a single process, will be scheduled on only one Processor and will not take the advantage of available resources. Therefore, splitting the job into multiple chunks should help in reducing the process completion time by way of better resource utilization.

As an example, a pl/sql block, as mentioned below, can be optimized and run in parallel. If a single job process 10000 Rows and takes 30 minutes, running it into 10 parallel stream should bring it down to 3-5 minutes.

create procedure test_job as
cursor main is
select column1, column2, column3
from	test
where x=1
and   y=2;
begin
  for i in main
   loop
     some processing;
   end loop;
end;


create procedure test_job(job_id number, low_rid rowid, hi_rid rowid) as
cursor main is
select column1, column2, column3
from	test
where x=1
and   y=2
and   rowid between low_rid and hi_rid;
begin
  for i in main
   loop
     some processing;
   end loop;
end;

This change may require minor changes in the procedure which is worth implementing.

Back to the Original case study. As mentioned earlier, for splitting the rows into multiple chunks, I executed the Query on dba_extents and dba_objects, to get the Start and End ROWID based on the number of chunks. This Query usually takes 2-3 minutes, but was taking almost 45-50 minutes and had to be cancelled. I checked the plan of this Query and it was doing a Merge Join Cartesian. Further digging into the issue, I could notice that the table which was to be splitted was a partitioned and subpartitioned table. This was for the first time I faced this issue and therefore had to modify the logic of the Query to accomodate Partitioned and Sub-Partitioned Objects.

I created a pl/sql block to check whether the table is a Non-Partitioned or Partitioned or Sub-Partitioned and based on this, a loop is executed. Below is the code that accomplished this task for me. An explanation on this is as under :

Input Values are Table_Owner, Process_name, Table_name.
L_CHKS, in my case, is a Global Variable declared in Package specification and defaults to the number of chunks to be created.
For a Partitioned or Sub-Partitioned Objects, the number of chunks will be more than L_CHKS, therefore, at the end of the loop, these are updated again using mod function.

  procedure manual_parallelism(t_owner varchar2, p_name varchar2, t_name varchar2) as
   /* l_chks is dynamic and should be based on available CPU's and is standard for all the jobs */
   /* This value can be changed at later stage if the hardware is upgraded */
   l_part	varchar2(3);
   l_subpart	varchar2(1);
   part_name	varchar2(30);
   l_dataobjectid number;
   cursor partitioned(t_owner varchar2, t_name varchar2) is
   select partition_name from dba_tab_partitions
   where table_owner = t_owner
   and   table_name = t_name
   order by partition_position desc;

   cursor sub_partitioned(t_owner varchar2, t_name varchar2, part_name varchar2) is
   select subpartition_name from dba_tab_subpartitions s
   where s.table_owner = t_owner
   and   s.table_name = t_name
   and   s.partition_name = part_name;
  begin
   /* Assumption in this case is that at any given point in time l_chks Jobs can be easily scheduled */
     select partitioned into l_part from dba_tables where owner=t_owner and table_name=t_name;
     if l_part='YES' then
       begin
	   select 'Y' into l_subpart from dba_tab_subpartitions 
	   where	table_owner=t_owner 
	   and		table_name=t_name
	   and		rownum=1;
       exception when no_data_found then
	   l_subpart:='N';
       end;
       if l_subpart='Y' then /* if Sub Partitioned */
	   for i in partitioned(t_owner, t_name)
	   loop
	     for j in sub_partitioned(t_owner, t_name, i.partition_name)
	     loop
	     insert into ast.process_status(process_name, job_id, s_rowid, e_rowid)
	     select p_name, grp,
  		   dbms_rowid.rowid_create( 1, data_object_id, lo_fno, lo_block, 0 ) min_rid,
		   dbms_rowid.rowid_create( 1, data_object_id, hi_fno, hi_block, 10000 ) max_rid
	     from (
		   select distinct grp,
		   first_value(relative_fno) over (partition by grp order by relative_fno, 
		   block_id rows between unbounded preceding and unbounded following) lo_fno,
		   first_value(block_id) over (partition by grp order by relative_fno, 
		   block_id rows between unbounded preceding and unbounded following) lo_block,
		   last_value(relative_fno) over (partition by grp order by relative_fno, 
		   block_id rows between unbounded preceding and unbounded following) hi_fno,
		   last_value(block_id+blocks-1) over (partition by grp order by relative_fno, 
		   block_id rows between unbounded preceding and unbounded following) hi_block,
		   sum(blocks) over (partition by grp) sum_blocks
		   from (
		   select	relative_fno,
				block_id,
				blocks,
				trunc( (sum(blocks) over (order by relative_fno, block_id)-0.01) /
				(sum(blocks) over ()/l_chks) ) grp
		   from		dba_extents
		   where	segment_name = t_name
		   and		owner = t_owner
		   and		partition_name=j.subpartition_name
		   order by block_id)
		   ),
	     (select data_object_id from dba_objects 
	     where	object_name = t_name 
	     and	subobject_name=j.subpartition_name 
	     and	owner=t_owner);
	     end loop; /* End loop of Sub Partitioned cursor */
	   end loop; /* End loop of Partitioned Cursor */
	   commit;
       else /* it means only Partitioned no Sub Partitions */
	   for i in partitioned(t_owner, t_name)
	   loop
	     insert into ast.process_status(process_name, job_id, s_rowid, e_rowid)
	     select p_name, grp,
		   dbms_rowid.rowid_create( 1, data_object_id, lo_fno, lo_block, 0 ) min_rid,
		   dbms_rowid.rowid_create( 1, data_object_id, hi_fno, hi_block, 10000 ) max_rid
	     from (
		   select distinct grp,
		   first_value(relative_fno) over (partition by grp order by relative_fno, 
		   block_id rows between unbounded preceding and unbounded following) lo_fno,
		   first_value(block_id) over (partition by grp order by relative_fno, 
		   block_id rows between unbounded preceding and unbounded following) lo_block,
		   last_value(relative_fno) over (partition by grp order by relative_fno, 
		   block_id rows between unbounded preceding and unbounded following) hi_fno,
		   last_value(block_id+blocks-1) over (partition by grp order by relative_fno, 
		   block_id rows between unbounded preceding and unbounded following) hi_block,
		   sum(blocks) over (partition by grp) sum_blocks
		   from (
		   select	relative_fno,
				block_id,
				blocks,
				trunc( (sum(blocks) over (order by relative_fno, block_id)-0.01) /
				(sum(blocks) over ()/l_chks) ) grp
		   from		dba_extents
		   where	segment_name = t_name
		   and		owner = t_owner
		   and		partition_name=i.partition_name
		   order by block_id)
		   ),
	     (select data_object_id from dba_objects 
	     where	object_name = t_name 
	     and	subobject_name=i.partition_name 
	     and	owner=t_owner);
	   end loop; /* End loop of Partitioned Cursor */
	   commit;
       end if; /* End if of l_Subpart='Y' */
     else /* Insert Statement for Non-Partitioned Object */ 
  	   insert into process_status(process_name, job_id, s_rowid, e_rowid)
	   select p_name, grp, 
		   dbms_rowid.rowid_create(1,data_object_id, lo_fno, lo_block,0) min_rid,
		   dbms_rowid.rowid_create(1,data_object_id, hi_fno, hi_block,0) max_rid
	   from (
	   select 	distinct grp,
			first_value(relative_fno) over (partition by grp order by relative_fno, 
			block_id rows between unbounded preceding and unbounded following) lo_fno,
			first_value(block_id) over (partition by grp order by relative_fno, 
			block_id rows between unbounded preceding and unbounded following) lo_block,
			last_value(relative_fno) over (partition by grp order by relative_fno, 
			block_id rows between unbounded preceding and unbounded following) hi_fno,
			last_value(block_id) over (partition by grp order by relative_fno, block_id 
			rows between unbounded preceding and unbounded following) hi_block,
			sum(blocks) over (partition by grp) sum_blocks from (
	   select 	relative_fno, block_id, blocks, 
			sum(blocks) over (order by relative_fno, block_id) cum_blocks,
			sum(blocks) over () tot_blocks,
			trunc((sum(blocks) over (order by relative_fno, block_id)-0.01) /
			(sum(blocks) over ()/l_chks)) grp
	   from		dba_extents
	   where 	segment_name=t_name
	   and		owner=t_owner
	   order by relative_fno, block_id)),
	   (select data_object_id from dba_objects where owner=t_owner and object_name=t_name);
	   commit;
     end if; /* END IF of L_PART='Y' */
     update ast.process_status set job_id=mod(rownum,l_chks) where process_name=p_name;
     commit;
  end manual_parallelism;

You can define the value of L_CHKS in the procedure and the procedure will split the table into multiple chunks with minimum group as 0 and maximum group as LCHKS-1. Once these chunks are created, additional intelligence can be built to schedule these L_CHKS jobs and spawn the new jobs based on the completion status of the running job. The Inputs to the Scheduled procedure will be the the Job_ID, which is GRP in our case, Starting Rowid, Ending Rowid, Process_Name. In my case, I scheduled 80 jobs at a time and wrote a logic at the end of the scheduled procedure wherein the process immediately starts the next job with the same job_id, which means, if a process with job_id 1 completes, it will spawn another job with job_id as 1 and so on. This will ensure that at any given point of time, there are 80 jobs running and no manual intervention is required.

Once this pl/sql parallelism was implemented, the total completion time of the process was around 30-35 minutes. Further, ROWID parallelism also ensured that there was no Block Level Contention between each of the jobs thus improving the performance.

Oracle 11g provides a package to accomplish this very easily using dbms_parallel_execute and developers may leverage the benefit of this readymade package to split a long running pl/sql block in parallel, thus improving the performance.

Filed under Performance Tagged with Parallelism, Performance, Rowid Parallelism

← Older posts

Newer posts →

Real Life Database / SQL Experiences : An Oracle Blog from Vivek Sharma

Performance Tuning Day…All India Oracle User Group

Like this:

AIOUG Performance Tuning Day – Pune

Like this:

AIOUG Performance Tuning Day ! Hyderabad…

Like this:

Auto DOP ! 11g Feature…

Like this:

Intra Block Chaining ! Impact on Logical Reads

Like this:

Query Performance Issue ! SQL*Plus v/s Oracle Reports…

Like this:

Yet another Query Plan Change ! Cost Based Optimization…

Like this:

Cost based Optimizer ! Inefficient input yields inefficient Output…

Like this:

PL/SQL Parallelism made easier in 11g

Like this:

Manual Parallelism – Rowid Based for Partitioned / Sub Partitioned Tables

Like this:

Blog Subscription on Email

Search on this Blog

Recent Posts

Archives

Top Clicks

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

Blog Subscription on Email

Search on this Blog

Recent Posts

Archives

Top Clicks