Optimizer | Real Life Database / SQL Experiences : An Oracle Blog from Vivek Sharma

Oracle Community Yatra 2024 ! AIOUG

June 18, 2024 Leave a comment

Happy to announce that I will be traveling across 6 cities in India speaking on my favorite topic around Database and Application Performance. This event is organized by “All India Oracle User Group” and is scheduled from 20th July 2024. Prominent Speakers from across the globe will be traveling as well. A great opportunity to Learn and to build a Network. For registration, click on the following link.

https://www.aioug.org/ocyatra/2024

Filed under AIOUG, Optimizer, Performance, SQL Tuning

Oracle Groundbreaker Yatra ! July 2019..

May 19, 2019 Leave a comment

After Sangem 2018, now it’s time for another OTN Yatra, now renamed to Oracle Groundbreaker Yatra. Sangam is hosted every year and had been either in Bangalore or Hyderabad. It is a One City Event. Oracle Groundbreaker Yatra on the other hand is a Multi-City Tour. I assume, last year it was a 6 City Tour. This year, it is scheduled to be hosted in Mumbai, Chennai, Delhi, Bengaluru, Hyderabad, Kolkata, Pune, Ahmedabad, Visakhapatnam and Thiruvananthapuram.

Registration for the Groundbreaker would open soon. Many prominent speakers would be travelling and it should be a never-miss event for the Oracle Database Community (DBA’s, Developers, Architects, Data Scientists etc). Sandesh Rao, Connor Mcdonald, Roy Swonger and Gurmeet Goindi would be speaking on some interesting topic (as always). Roy Swonger from Oracle US would be visiting India for the first time. Every time you attend the sessions from these experts, you tend to learn something new and I am sure, this time as well, you will come out with bags full of Knowledge.

I am also one of the speakers for the Groundbreaker Yatra and would be travelling to 4 cities, apart from Mumbai (as it is my base location). I have opted for the locations, where I am either travelling for the first time for an User Group event or have visited only once or twice. Now, you can guess it :). I will be in Ahmedabad, Kolkata, Visakhapatnam, Thiruvananthapuram and Mumbai. I assume the agenda for some of the locations is already published and for the other locations, is in the final stage.

Keep a tab on this link Groundbreaker Yatra 2019, so that you do not miss on the registration.

Now, on my session. I am presenting a Session on “Database and Application Performance ! Then and Now.” It is a 1 hour session. I would be covering some of the Performance Challenges the DBA’s or Developers use to face and still face and how these are automatically taken care in Autonomous Databases. I will cover Optimizer, Parallelism, Oracle 19c and walk through some of the topics around Reinforcement Learning and Data Science. Usually, my sessions are supported by live demo’s. But for a 1 hour session, I am not sure whether I will be able to, but shall try to make it more interesting. See you all in July 2019.

Filed under AIOUG, Autonomous database, Exadata, Optimizer, Performance, Uncategorized Tagged with aioug, Autonomous database, Connor Mcdonald, Oracle Groundbreaker Yatra, Sandesh Rao

Importance of Constraints ! Know your Reporting Tools

November 14, 2018 2 Comments

Recently, I was working on a customer issue. The issue was a performance comparison between a Competition and an Oracle Database. The performance of Competition was reported to be better than Oracle. Now, this is a classic case of Bad Schema Design and a Badly Written Query. Working on this issue reminded me of a great learning that I had after reading “Expert Oracle Database Architecture” by Thomas Kyte. He wrote about a classic issue with a Pl/SQL Block running in SQL Server and generating wrong results when ported to Oracle Database due to the way these 2 databases compare NULL values. Each of these databases are different. It also reminded me of one of my response to a query from a customer on “A count(*) from a Table is doing a Full Table Scan, even though it has a Unique Index on it”.

As Tom mentioned in his book, every database is different and implement the features differently. Now, the third party tools that connect to each of these different data sources generate queries that syntactically work on all but may not run optimally. Therefore, it is important to understand our Data and design the Schema with all the required constraints and indexes in place.

In this case, customer was running few analytical reports from a Third Party Reporting Tool (Tableau). Customer selected few columns with couple of predicates with Date Range and some product name. The Queries were around 40-50 Lines. Surprisingly, for each of the Predicates, Tableau added few additional predicates and due to these additional predicates, the run time plan was around 2700+ lines long. As an example, I have a table T1 and have to generate a report selecting object id’s and names for any of the 2 TEMPORARY values (Y or N). My query will look like :

select object_id, object_name from t1 where temporary=:b1;

If I have an Index on TEMPORARY Column, Optimizer will come out with an optimal plan based on it’s cost calculation. However, when run from Tableau, it came out with the following query:

select object_id, object_name from t1
where ((temporary=:b1) or (temporary is NULL and :b1 is NULL));

What will be the implication of this added OR conditions ? Let’s see.

SQL> create table t1 as
select * from all_objects;  2

Table created.

SQL> exec dbms_stats.gather_table_stats(user,'T1', method_opt=>'for all columns size auto for columns temporary size 100');

PL/SQL procedure successfully completed.

SQL> create index t1_idx on t1(temporary);

Index created.

SQL> select num_rows, blocks from dba_tables where table_name='T1';

  NUM_ROWS     BLOCKS
---------- ----------
     68417       1375

SQL> variable l_temporary varchar2(1);
SQL> exec :l_temporary:='Y';

PL/SQL procedure successfully completed.

SQL> select object_id, object_name from t1
where ((temporary=:l_temporary) or (temporary is NULL and :l_temporary is NULL)); 

161 rows selected.

SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
SQL_ID  7xsahyu4q594y, child number 0
-------------------------------------
select object_id, object_name from t1 where ((temporary=:l_temporary)
or (temporary is NULL and :l_temporary is NULL))

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   375 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   161 |  6923 |   375   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("TEMPORARY"=:L_TEMPORARY OR ("TEMPORARY" IS NULL AND
              :L_TEMPORARY IS NULL)))

161 Rows out of 68000, which is less than 1%. Index Access would have been a better option. At customer end, the table was huge with Billions of Rows and with multiple such OR Predicates, the very first observation was the amount of time the Optimizer took to Parse the Query. The Query took around 184 Seconds to run and the observation was that out of 184 Seconds, around 175 Seconds were spent on Parsing. This was identified as a BUG in 18c (BUG#28725660) and the primary cause identified as the change in OR Expansion behaviour in 18c. This BUG is fixed in 19c. Backporting it to 18c would have taken some time, so the other fix that we applied was to add NOT NULL Constraints to some of the columns. From the Data, we could see that none of the columns had NULL values. We checked with the developers and they mentioned that NULL values are not stored in this column. Therefore, it was safe to add these constraints. Continuing with our example above, let’s add a NOT NULL constraint to our Table T1.

SQL> alter table t1 modify temporary not null;

Table altered.

select object_id, object_name from t1
where ((temporary=:l_temporary) or (temporary is NULL and :l_temporary is NULL));

select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

Plan hash value: 1775246573

----------------------------------------------------------------------------------------------
| Id  | Operation                           | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |       |       |     5 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1     |   161 |  6923 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_IDX |   161 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TEMPORARY"=:L_TEMPORARY)

The run time plan of the same query is now an Index Scan and is much better than previous. This additional input to the Optimizer was enough to transform the query. You can generate 10053 trace to see the transformation.

NOT NULL constraint combined with Unique Index is required to answer a COUNT(*) query from an Index, which one of my customer asked long time back. This is another case, where NOT NULL constraint helped optimizer come up with an optimal plan.

It is important to optimize the Schema and provide all the critical inputs to the Optimizer. Generic reporting tools would come out with queries that work on all the databases. These generated queries may work on some database and may not work on other. Therefore, the title of this blog “Know your Reporting Tool” :).

On the real customer issue, post adding the constraints to few of the columns, the query response time reduced drastically from 184 Seconds to less than 10 seconds far better than the competition.

Filed under Optimizer, Performance, SQL Tuning Tagged with Cost Based Optimizer, NOT NULL CONSTRAINTS, Performance, Query Transformation

Autonomous Database Tech Day ! Gurgaon

August 19, 2018 Leave a comment

#Autonomous #AIOUG Presenting a Full Day Event on 8th September 2018 for North India Oracle User Group in Gurgaon. This is on Oracle Autonomous Database. Would be covering some interesting technical capabilities of Autonomous Databases. I am covering the 2 offerings i.e. Autonomous Data Warehouse and Autonomous Transaction Processing. For Registration, click on the following link :

Meraevents Link

This being an Oracle User Group Session, focus would be on the Technical Capabilities of ADW / ATP, like Parallel Processing, Concurrency, Optimizer Enhancements and Behaviour and most importantly, Competition.

So, North India Folks : See you all on 8th September 2018.

Filed under AIOUG, Autonomous database, Optimizer, Performance, Uncategorized Tagged with ADW, ATP, Autonomous database, Autonomous DB

Optimizer – Part IV (12c Enhancements)

December 14, 2016 1 Comment

This is the final part of my 4 part series on Optimizer. The previous 3 parts can be viewed clicking the following links:

In this blog, I will go through the 12c enhancements. Before we go through the 12c enhancement, let me also briefly cover the real life example that motivated me to write this series. I covered this in Part III. In that post, I missed to paste the relevant output of 10053 trace file, so let me take this up again in this post.

The problem query and it’s run time plan is pasted below.

select count(*) from nca.s_p_attributes a1
WHERE   a1.value='olwassenen';

    COUNT(*)
------------
      591168

SQL> select plan_table_output from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------
SQL_ID	79dfpvydpk710, child number 0
-------------------------------------
select count(*) from nca.s_p_attributes a1 WHERE
a1.value='olwassenen’
----------------------------------------------------------
| Id  | Operation	        | Name 	        | Rows	| 
----------------------------------------------------------
|   0 | SELECT STATEMENT 	|		|	|
|   1 |  SORT AGGREGATE  	|		|      1| 
|*  2 |   INDEX SKIP SCAN	| SP_P_IND3     |      8| 
----------------------------------------------------------

As can be seen, the optimizer calculation is way out (Actuals = 591168 v/s Assumptions=8). Believe me, the table and the column used in the WHERE predicate has a height balanced histogram on it. This will be clearly visible in the 10053 trace. The relevant portion of the 10053 trace is as under.

SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for S_P_ATTRIBUTES[A1] 
  Column (#3): 
    NewDensity:0.000000, OldDensity:0.001202 BktCnt:254, PopBktCnt:122, PopValCnt:20, NDV:35078144
  Column (#3): VALUE(
    AvgLen: 11 NDV: 35078144 Nulls: 0 Density: 0.000000
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 153
  Table: S_P_ATTRIBUTES  Alias: A1
    Card: Original: 541600373.000000  Rounded: 8  Computed: 8.02  Non Adjusted: 8.02

This was a Non-Popular value and therefore, the selectivity calculation for Non-Popular value (NewDensity in 10053) is as under:

[(NPBKTCNT)/(BKTCNT * (NDV – POPVALCNT))]
[(254-122)/(254 * (35078144-20))] = 132/(254 * 35078124) = 132/8909843496 = 0.0000000148

The calculated selectivity multiplied by the number of rows in the table is the expected cardinality i.e. 0.0000000148*541600373=8.

Now, let’s briefly discuss the 12c Enhancements. Oracle Database 12c introduced additional histograms. These are : Top-n Frequency and Hybrid Histograms. There are few criteria’s for these histograms to be generated. The criteria is :

Following variables are used -
n - Number of Buckets either explicitly specified or left to default (254)
NDV - Number of Distinct Values in the Column
p - Internal Percentage, which is calculated as (1-(1/n))*100

If Data Skew Observed ?
   If NDV > n ?
      If estimate_percent => auto_sample_size ?
         if %of rows for topn frequent values >= p ?
            generate topn frequency histogram;
         else if
            generate hybrid histogram;
         end if;
      else if
         generate height balanced histogram;
      end if;
   else if
      generate frequency histogram;
   end if;
end if;

For the simplicity, I have color coded each IF-ELSE-ENDIF statement. From the preceding pseudo-code, it is clear that to generate 12c specific histograms, we need to specify estimate_percent as auto_sample_size, which is a default. So, if your statistics gathering policy states manual percentage for estimate, then it won’t gather these new histograms. The new auto_sample_size algorithm has improved a lot and therefore, you can safely change the manual estimate to auto.

The value of p requires some explanation. It is calculated as (1-(1/n))*100, where n is the number of buckets specified during stats gathering. Consider following 4 examples :

Example 1 : PROD_ID has 72 Distinct Values. As the bucket size specified is 100, which is more than the number of distinct values, Optimizer will generate Frequency Histogram.

execute dbms_stats.gather_table_stats(ownname=>'SH',tabname=>'SALES',method_opt=>'for columns PROD_ID size 100');

Example 2 : PROD_ID has 72 Distinct Values. In this case, the number of bucket (n) is less than the number of distinct values (72). However, the script explicitly estimates the percent as 30 and therefore, it will generate Height Balanced Histogram.

execute dbms_stats.gather_table_stats(ownname=>'SH',tabname=>'SALES',method_opt=>'for columns PROD_ID size 50', estimate_percent=>30);

Example 3 : PROD_ID has 72 Distinct Values. In this case, the number of bucket (n) is less than the number of distinct values (72). No estimate_percent specified means, it is left to default, which is AUTO_SAMPLE_SIZE. In this case, the decision on Hybrid or Top-N Frequency will be basis the value of p. The calcution for p is (1-(1/50))*100 = 98%. This means, if the top 50 distinct PROD_ID occupy more than 98% rows then, it will be a TopN Frequency, else it will be a Hybrid Histogram. We shall see this in action.

execute dbms_stats.gather_table_stats(ownname=>'SH',tabname=>'SALES',method_opt=>'for columns PROD_ID size 50');

Example : Our own table (TEST_SELECTIVITY) that we created for testing purpose in Optimizer – Part I. In this case, I am generating statistics using method_opt=>’for all columns size AUTO’. The columns will be considered basis the data available in COL_USAGE$, which is populated post execution of any query against a table and column. Column AMOUNT_SOLD has 636 Distinct values, which is higher than 254 (default buckets). The value of n is 254 and p will be (1-(1/254))*100 = 99.61. This means, if the number of rows occupied by top 254 distinct values is more than 99.61%, then it will be TopN Frequency, else it will be Hybrid.

execute dbms_stats.gather_table_stats(user,'TEST_SELECTIVITY',method_opt=>'FOR ALL COLUMNS SIZE AUTO');

select column_id, column_name, num_distinct, num_nulls,
	density, histogram
from	dba_tab_columns
where	owner='SCOTT'
and	table_name='TEST_SELECTIVITY'
and	column_name='AMOUNT_SOLD'
order by 1;

 COLUMN_ID COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
---------- ------------------------------ ------------ ---------- ---------- -------------------------
         6 AMOUNT_SOLD                             636          0    .000361 HYBRID

SQL> select round((1-(1/254))*100,2) value_p from dual;

   VALUE_P
----------
     99.61

## I have 800000 Rows in the table. So, 99.61% of 800000 is 796850.
## My Statistics Gathering command generated HYBRID, which means, 
## the top 254 distinct values occupy less than 99.61% or less than 796850 rows

SQL> select round((1-(1/254))*800000,0) value_p from dual;

   VALUE_P
----------
    796850

## Query to check Running Count and %Age 
## My Top 254 values occupy 776524, which is less than 796850
## My Top 254 values occupy 97.09%, which is less than 99.61%
## Therefore, the script generated HYBRID Histogram.

SQL> select rownum, amount_sold, cnt, running_cnt, running_perc from (
  2  select amount_sold, cnt, running_cnt,
  3                  sum(r_perc) over(order by cnt desc) running_perc from (
  4  select amount_sold, cnt, sum(cnt) over(order by cnt desc) running_cnt,
  5                  round(ratio_to_report(cnt) over()*100,2) r_perc from (
  6  select amount_sold, count(*) cnt from test_selectivity group by amount_sold order by 2 desc))
  7  order by cnt desc);

    ROWNUM AMOUNT_SOLD        CNT RUNNING_CNT RUNNING_PERC
---------- ----------- ---------- ----------- ------------
         1          10      33600       33600          4.2
         2           9      31964       65564          8.2
         3          11      28850       94414        11.81
         4          13      26968      121382        15.18
         5          47      26934      148316        18.55
         6          48      26114      174430        21.81
         7           8      24906      199336        24.92
         8          49      24239      223575        27.95
         9          46      18784      242359         30.3
        10          23      17014      259373        32.43
        11          21      16965      276338        34.55
        12          24      16246      292584        36.58
        13          22      15954      308538        38.57
        14          25      15782      324320        40.54
        15          51      14906      339226         42.4
        16          30      12932      352158        44.02
        17           7      12030      364188        45.52
        18          26      11958      376146        47.01
        19          17      11585      387731        48.46
        20          12      11547      399278         49.9
        21          52      11344      410622        51.32
        22          28      11227      421849        52.72
        23          29      10728      432577        54.06
        24          14      10180      442757        55.33
        25          53      10059      452816        56.59
        26          31       9907      462723        57.83
        27          19       9682      472405        59.04
        28          39       9571      481976        60.24
        29          33       9145      491121        61.38
        30          38       9027      500148        62.51
        31          54       8468      508616        63.57
        32          42       8177      516793        64.59
        33          32       8017      524810        65.59
        34          27       7651      532461        66.55
        35          34       7617      540078         67.5
        36          63       6607      546685        68.33
        37          20       6582      553267        69.15
        38          64       6575      559842        69.97
        39          56       6204      566046        70.75
        40          40       6060      572106        71.51
        41          57       5723      577829        72.23
        42          50       5511      583340        72.92
        43          35       5388      588728        73.59
        44          15       5219      593947        74.24
        45          60       5195      599142        74.89
        46          36       5062      604204        75.52
        47          58       5031      609235        76.15
        48          16       4903      614138        76.76
        49          41       4831      618969        77.36
        50          18       4562      623531        77.93
        51          62       4542      628073         78.5
        52          59       4421      632494        79.05
        53          45       4311      636805        79.59
        54          43       3946      640751        80.08
        55          65       3625      644376        80.53
        56          55       3372      647748        80.95
        57         123       2852      650600        81.31
        58          44       2327      652927         81.6
        59         126       2296      655223        81.89
        60         203       2286      657509        82.18
        61         115       2200      659709        82.46
        62          61       2187      661896        82.73
        63         211       2105      664001        82.99
        64         213       2032      666033        83.24
        65          91       2022      668055        83.49
        66         128       1953      670008        83.73
        67         113       1837      671845        83.96
        68         629       1799      673644        84.18
        69          73       1656      675300        84.39
        70         100       1641      676941         84.6
        71          72       1596      678537         84.8
        72         602       1577      680114           85
        73          37       1547      681661        85.19
        74         101       1533      683194        85.38
        75          66       1450      684644        85.56
        76         127       1402      686046        85.74
        77          71       1333      687379        85.91
        78          74       1314      688693        86.07
        79         163       1243      689936        86.23
        80          94       1237      691173        86.38
        81         210       1214      692387        86.53
        82         121       1139      693526        86.67
        83         160       1127      694653        86.81
        84          69       1111      695764        86.95
        85          70       1106      696870        87.09
        86         225       1101      697971        87.23
        87         168       1097      699068        87.37
        88         116       1088      700156        87.51
        89         307       1069      701225        87.64
        90         152       1063      702288        87.77
        91         134       1055      703343         87.9
        92         170       1034      704377        88.03
        93          77       1020      705397        88.16
        94          97       1012      706409        88.29
        95         151       1011      707420        88.42
        96         594        992      708412        88.54
        97         630        919      709331        88.65
        98          67        906      711143        88.87
        99         214        906      711143        88.87
       100          76        905      712048        88.98
       101          96        897      712945        89.09
       102         180        864      713809         89.2
       103         136        855      714664        89.31
       104         208        847      715511        89.42
       105         114        835      716346        89.52
       106         135        829      717175        89.62
       107         610        809      717984        89.72
       108         167        803      718787        89.82
       109         202        787      719574        89.92
       110         133        784      720358        90.02
       111         600        779      721137        90.12
       112         120        774      721911        90.22
       113          84        771      722682        90.32
       114         178        769      723451        90.42
       115         117        764      724215        90.52
       116          95        761      724976        90.62
       117        1000        758      725734        90.71
       118         306        735      726469         90.8
       119         199        734      727203        90.89
       120         125        718      727921        90.98
       121         303        688      728609        91.07
       122         112        686      729295        91.16
       123         639        679      729974        91.24
       124         179        670      730644        91.32
       125         228        665      731309         91.4
       126         177        663      731972        91.48
       127        1577        646      732618        91.56
       128         159        630      733248        91.64
       129         539        628      733876        91.72
       130         205        623      734499         91.8
       131          89        614      735113        91.88
       132          90        599      735712        91.95
       133         216        582      736294        92.02
       134         547        579      736873        92.09
       135        1050        564      737437        92.16
       136         557        557      737994        92.23
       137         119        552      738546         92.3
       138          99        536      739082        92.37
       139        1016        528      739610        92.44
       140          78        521      740131        92.51
       141          93        510      740641        92.57
       142         217        485      741126        92.63
       143         304        481      741607        92.69
       144        1053        480      742087        92.75
       145        1068        477      742564        92.81
       146        1496        476      743040        92.87
       147        1014        475      743515        92.93
       148         206        474      743989        92.99
       149        1004        468      744457        93.05
       150        1065        462      744919        93.11
       151        1566        461      745380        93.17
       152         156        458      745838        93.23
       153         207        457      746295        93.29
       154         140        452      746747        93.35
       155         142        451      747198        93.41
       156        1260        448      747646        93.47
       157         552        447      748093        93.53
       158        1599        442      748535        93.59
       159         222        433      748968        93.64
       160         212        424      749392        93.69
       161        1698        422      749814        93.74
       162         215        416      750230        93.79
       163        1240        415      750645        93.84
       164          85        408      751053        93.89
       165         155        403      751456        93.94
       166         148        401      751857        93.99
       167         554        399      752256        94.04
       168         900        395      752651        94.09
       169         556        391      753042        94.14
       170        1118        388      753430        94.19
       171          75        384      753814        94.24
       172          88        382      754196        94.29
       173         302        375      754571        94.34
       174        1057        372      754943        94.39
       175        1029        368      755311        94.44
       176        1321        366      755677        94.49
       177        1109        365      756042        94.54
       178          79        356      756398        94.58
       179         102        354      756752        94.62
       180        1729        344      757096        94.66
       181         204        341      757437         94.7
       182         659        339      758454        94.82
       183         195        339      758454        94.82
       184        1551        339      758454        94.82
       185          98        337      758791        94.86
       186         169        323      759114         94.9
       187         562        322      759436        94.94
       188        1195        320      759756        94.98
       189        1215        318      760074        95.02
       190         936        317      760391        95.06
       191          68        313      760704         95.1
       192         187        310      761014        95.14
       193         548        309      761632        95.22
       194        1714        309      761632        95.22
       195        1217        308      761940        95.26
       196         468        307      762554        95.34
       197        1297        307      762554        95.34
       198        1656        305      762859        95.38
       199         608        303      763162        95.42
       200         124        301      763463        95.46
       201         150        299      763762         95.5
       202        1206        294      764056        95.54
       203         181        293      764642        95.62
       204         106        293      764642        95.62
       205         288        288      764930        95.66
       206        1078        286      765216         95.7
       207         158        283      765499        95.74
       208        1633        282      765781        95.78
       209         154        277      766058        95.81
       210        1556        274      766606        95.87
       211        1061        274      766606        95.87
       212         161        273      766879         95.9
       213        1176        270      767149        95.93
       214         914        268      767417        95.96
       215         490        267      767684        95.99
       216         289        261      767945        96.02
       217        1758        256      768713        96.11
       218        1353        256      768713        96.11
       219        1045        256      768713        96.11
       220        1508        255      768968        96.14
       221         196        249      769217        96.17
       222         183        248      769465         96.2
       223        1674        246      769711        96.23
       224        1237        244      769955        96.26
       225        1531        243      770198        96.29
       226        1421        242      771166        96.41
       227         118        242      771166        96.41
       228         544        242      771166        96.41
       229        1501        242      771166        96.41
       230        1304        235      771636        96.47
       231        1076        235      771636        96.47
       232         842        232      771868         96.5
       233         561        230      772098        96.53
       234         143        228      772326        96.56
       235         632        222      772548        96.59
       236         193        216      772980        96.65
       237         305        216      772980        96.65
       238         596        215      773195        96.68
       239         194        209      773613        96.74
       240        1003        209      773613        96.74
       241         131        207      773820        96.77
       242        1553        206      774026         96.8
       243         937        202      774430        96.86
       244         580        202      774430        96.86
       245        1125        201      774631        96.89
       246         200        194      774825        96.91
       247         524        193      775018        96.93
       248         947        192      775594        96.99
       249         141        192      775594        96.99
       250        1709        192      775594        96.99
       251        1488        189      775783        97.01
       252        1334        187      775970        97.03
       253         281        186      776156        97.05
       254        1227        184      776524        97.09 <-- Top 254 Distinct Values
       255        1703        184      776524        97.09
.....
.....
.....
.....
       624        1231          1      800000        99.94
       625        1177          1      800000        99.94
       626        1638          1      800000        99.94
       627        1189          1      800000        99.94
       628         162          1      800000        99.94
       629          81          1      800000        99.94
       630         647          1      800000        99.94
       631         584          1      800000        99.94
       632         619          1      800000        99.94
       633        1036          1      800000        99.94
       634         708          1      800000        99.94
       635         518          1      800000        99.94
       636         522          1      800000        99.94

Next, I gather statistics on PRODUCTS table under SH Schema. This table has 72 rows with 22 Distinct values in PROD_SUBCATEGORY_ID column. First, I gather statistics using 10 Buckets. Optimizer will generate HYBRID Histogram, as the number of rows occupied by Top 10 Distinct values is less than than the value of p (1-(1/10)) = 90%. Then, I will generate using 18 Buckets and this time, it will be TopN Frequency, as the number of rows occupied by Top 18 Distinct Values is more than the value of p (1-(1/18)) = 94%.

SQL> select table_name, num_rows from dba_tables where table_name='PRODUCTS';

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
PRODUCTS                               72

SQL> select column_name, num_distinct from dba_tab_columns
  2  where      table_name='PRODUCTS'
  3  and        owner='SH'
  4  and        column_name='PROD_SUBCATEGORY_ID';

COLUMN_NAME                    NUM_DISTINCT
------------------------------ ------------
PROD_SUBCATEGORY_ID                      22


SQL> select 1-(1/10) from dual;

  1-(1/10)
----------
        .9

SQL> select round((1-(1/10))*72,0) from dual;

ROUND((1-(1/10))*72,0)
----------------------
                    65


SQL> compute sum of running_sum on report
SQL> break on report
SQL> select rownum, prod_subcategory_id, cnt, running_cnt, running_perc from (
  2  select prod_subcategory_id, cnt,
  3                  sum(cnt) over(order by cnt desc) running_cnt,
  4                  sum(r_perc) over(order by cnt desc) running_perc from (
  5  select prod_subcategory_id, cnt, round(ratio_to_report(cnt) over()*100,2) r_perc from (
  6  select prod_subcategory_id, count(*) cnt from products group by prod_subcategory_id order by 2 desc))
  7  order by cnt desc);

    ROWNUM PROD_SUBCATEGORY_ID        CNT RUNNING_CNT RUNNING_PERC
---------- ------------------- ---------- ----------- ------------
         1                2014          8           8        11.11
         2                2055          7          15        20.83
         3                2032          6          27        37.49
         4                2054          6          27        37.49
         5                2056          5          47        65.25
         6                2051          5          47        65.25
         7                2031          5          47        65.25
         8                2042          5          47        65.25
         9                2036          4          51        70.81
        10                2043          3          54        74.98 <-- Less than 90% or 65 Rows (HYBRID)
        11                2033          2          66        91.66
        12                2035          2          66        91.66
        13                2053          2          66        91.66
        14                2012          2          66        91.66
        15                2013          2          66        91.66
        16                2034          2          66        91.66
        17                2021          1          72          100
        18                2011          1          72          100
        19                2044          1          72          100
        20                2041          1          72          100
        21                2022          1          72          100
        22                2052          1          72          100

22 rows selected.

## STATS WITH BUCKET 10
exec dbms_stats.gather_table_stats(ownname=>'SH',tabname=>'PRODUCTS',method_opt=>'for columns prod_subcategory_id size 10');



SQL> select column_id, column_name, num_distinct, num_nulls,
  2                  density, histogram
  3  from       dba_tab_columns
  4  where      owner='SH'
  5  and        table_name='PRODUCTS'
  6  and             column_name='PROD_SUBCATEGORY_ID';

 COLUMN_ID COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
---------- ------------------------------ ------------ ---------- ---------- -------------------------
         5 PROD_SUBCATEGORY_ID                      22          0    .044976 HYBRID


SQL> select (1-(1/18)) from dual;

(1-(1/18))
----------
.944444444


SQL> select round((1-(1/18))*72,0) from dual;

ROUND((1-(1/18))*72,0)
----------------------
                    68

SQL> select rownum, prod_subcategory_id, cnt, running_cnt, running_perc from (
  2  select prod_subcategory_id, cnt,
  3                  sum(cnt) over(order by cnt desc) running_cnt,
  4                  sum(r_perc) over(order by cnt desc) running_perc from (
  5  select prod_subcategory_id, cnt, round(ratio_to_report(cnt) over()*100,2) r_perc from (
  6  select prod_subcategory_id, count(*) cnt from products group by prod_subcategory_id order by 2 desc))
  7  order by cnt desc);

    ROWNUM PROD_SUBCATEGORY_ID        CNT RUNNING_CNT RUNNING_PERC
---------- ------------------- ---------- ----------- ------------
         1                2014          8           8        11.11
         2                2055          7          15        20.83
         3                2032          6          27        37.49
         4                2054          6          27        37.49
         5                2056          5          47        65.25
         6                2051          5          47        65.25
         7                2031          5          47        65.25
         8                2042          5          47        65.25
         9                2036          4          51        70.81
        10                2043          3          54        74.98
        11                2033          2          66        91.66
        12                2035          2          66        91.66
        13                2053          2          66        91.66
        14                2012          2          66        91.66
        15                2013          2          66        91.66
        16                2034          2          66        91.66
        17                2021          1          72          100
        18                2011          1          72          100 <-- More than 94% or 68 Rows
        19                2044          1          72          100
        20                2041          1          72          100
        21                2022          1          72          100
        22                2052          1          72          100

22 rows selected.

SQL> exec dbms_stats.gather_table_stats(ownname=>'SH',tabname=>'PRODUCTS',method_opt=>'for columns prod_subcategory_id size 18');

PL/SQL procedure successfully completed.

SQL> select column_id, column_name, num_distinct, num_nulls,
  2                  density, histogram
  3  from       dba_tab_columns
  4  where      owner='SH'
  5  and        table_name='PRODUCTS'
  6  and             column_name='PROD_SUBCATEGORY_ID';

 COLUMN_ID COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
---------- ------------------------------ ------------ ---------- ---------- -------------------------
         5 PROD_SUBCATEGORY_ID                      22          0 .006944444 TOP-FREQUENCY

In this blog, I have tried to explain the underlying calculation by way of which 12c introduced histograms are generated. Hope, the explanation were clear. As always, comments are welcome.

Filed under Optimizer, Performance

Optimizer – Part III (Frequency & Height Balanced)

October 6, 2016 3 Comments

Finally, got some time to write the third post of this series. The Optimizer – Part I and Optimizer – Part II are the best reference before reading this post. From the Part II, we inferred that :

TIME_ID – Assumptions v/s Actuals √
AMOUNT_SOLD – Assumptions v/s Actuals Χ
PROMO_ID – Assumptions v/s Actuals Χ

However, with a minor change, which was on a copy of TEST_SELECTIVITY table, the equation changed to:

TIME_ID – Assumptions v/s Actuals √
TIME_ID – Assumptions v/s Actuals (minor change) Χ
AMOUNT_SOLD – Assumptions v/s Actuals Χ
PROMO_ID – Assumptions v/s Actuals Χ

A small change triggered a mismatch in the cardinality calculations of TIME_ID column, which was otherwise nearly accurate. For a Query Performance, optimal execution plan is very critical and for an optimal execution plan, it is very important that the Optimizer comes out with an accurate cardinality. As we have seen, in our previous blogs, SELECTIVITY is another significant factor and is the starting point for the Optimizer. While Cardinality is calculated by the Optimizer, Selectivity is (in most of the cases) stored in the data dictionary, by way of Statistics gathered using dbms_stats (or any other method provided by some Application Vendors).

Optimizer is a piece of code. The default behaviour (at least for a newly created table) of the optimizer is that it considers the data distribution as UNIFORM. For example, in our case (before the minor change), the data in TIME_ID column was Uniform and therefore, the optimizer calculation was nearly accurate. However, the other two columns (AMOUNT_SOLD & PROMO_ID), the data was non-uniform and therefore, Optimizer assumption v/s the actual data distribution were way out. After the table creation, the statistics were gathered automatically (as a part new feature of 12c). In 11g or earlier versions, you will have to gather the statistics manually. You should see the same results. The initial statistics were fed to the optimizer as a Uniform data. See below :

COLUMN_NAME        NUM_DISTINCT  NUM_NULLS    DENSITY
------------------ ------------ ---------- ----------
AMOUNT_SOLD                 636          0 .001572327
CUST_ID                    7056          0 .000141723
PROD_ID                      72          0 .013888889
PROMO_ID                      4          0        .25
QUANTITY_SOLD                 1          0          1
TIME_ID                    1096          0 .000912409

How do we fix the problem of Mis-Estimates? In this case, the DENSITY column was used as a SELECTIVITY and for each of the columns, it is calculated as if the data is Uniform. This mis-calculation resulted in errorneous optimizer calculation. How do we fix it? As mentioned, optimizer is a piece of code and it has to come out with it’s calculation based on the input provided. In the absence of additional statistics or accurate statistics, Optimizer will assume UNIFORM distribution and will mis-calculate the SELECTIVITY and the CARDINALITY, as we have seen with our test cases. We have to provide accurate inputs for the optimizer to come up with nearly accurate statistics and one approach to provide these additional and accurate statistics are Histograms.

Let us regather statistics on the table again and check the change in the DENSITY value for each of the columns.

exec dbms_stats.gather_table_stats(user,'TEST_SELECTIVITY', method_opt=>'for all columns size auto', estimate_percent=>100);

The resultant output is as below:

select column_id, column_name, num_distinct, num_nulls,
	density, histogram
from	dba_tab_columns
where	owner='SCOTT'
and	table_name='TEST_SELECTIVITY'
order by 1;

 COLUMN_ID COLUMN_NAME       NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
---------- ----------------- ------------ ---------- ---------- --------------------
         1 PROD_ID                     72          0 .013888889 NONE
         2 CUST_ID                   7056          0 .000141723 NONE
         3 TIME_ID                   1096          0 .000912409 NONE
         4 PROMO_ID                     4          0 .000000625 FREQUENCY
         5 QUANTITY_SOLD                1          0          1 NONE
         6 AMOUNT_SOLD                636          0   .0018217 HEIGHT BALANCED

The Density for the two out of the three columns is changed and the HISTOGRAM column gives an additional information that we have some additional statistics on the two columns.

There are 2 questions here

Why the subsequent gathering of statistics gathered additional statistics (HISTOGRAM)?
Why there are no Additional Statistics (HISTOGRAMS) on other Columns?

The answer to the first question is that the queries on each of the tables and each of the columns are tracked in SYS.COL_USAGE$. The subsequent stats gathering job will refer to this table to get the column details on which the additional statistics are required. See below :

exec dbms_stats.flush_database_monitoring_info();

select intcol#, column_name, equality_preds, RANGE_PREDS
from	sys.col_usage$ cu, dba_tab_columns tc
where	obj# = (select data_object_id from dba_objects
		where object_name='TEST_SELECTIVITY')
and	cu.intcol# = tc.column_id
and	tc.table_name='TEST_SELECTIVITY';

   INTCOL# COLUMN_NAME       EQUALITY_PREDS RANGE_PREDS
---------- ----------------- -------------- -----------
         6 AMOUNT_SOLD                    1           1
         4 PROMO_ID                       1           0
         3 TIME_ID                        1           1

The answer to the second question is for the other columns (except TIME_ID), there were no queries executed, thus there were no information collected in COL_USAGE$. For the TIME_ID, there are no HISTOGRAMS even though we executed few queries (and COL_USAGE$ has an entry). The data in this column is UNIFORM and this is the additional check, that is internally made at the time of gathering statistics. During statistics generation, sample data for each of the column is computed and data is validated. If the data is found to be UNIFORM, no histograms are generated as it is a resource intensive process and generating histogram will not make any sense (at least not worth the resources required to generate histograms).

If you recollect from our Part II, the minor changes on the TIME_ID column was on another table TEST_SELECTIVITY_M, which was an exact replica of TEST_SELECTIVITY. If we gather statistics on TEST_SELECTIVITY_M, let’s see the results.

select column_id, column_name, num_distinct, num_nulls,
	density, histogram
from	dba_tab_columns
where	owner='SCOTT'
and	table_name='TEST_SELECTIVITY_M'
order by 1;

 COLUMN_ID COLUMN_NAME          NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
---------- -------------------- ------------ ---------- ---------- ---------------
         1 PROD_ID                        72          0 .013888889 NONE
         2 CUST_ID                      7056          0 .000141723 NONE
         3 TIME_ID                      1097          0 .000914025 HEIGHT BALANCED
         4 PROMO_ID                        4          0        .25 NONE
         5 QUANTITY_SOLD                   1          0          1 NONE
         6 AMOUNT_SOLD                   636          0 .001572327 NONE

On this table, the query executed was only on TIME_ID column and therefore, the additional statistics were on TIME_ID column.

Coming back to TEST_SELECTIVITY. Now, we have a Frequency Histograms on PROMO_ID Column and Height Balanced Histogram on AMOUNT_SOLD column. Until 11g, we had these 2 types of Histograms. 12c introduced TopN Frequency and Hybrid Histograms, which I will cover in the last part of this series. I am on 12c and therefore, to generate Frequency and Height Balanced Histograms, I had to use estimate_percent as 100 (more on this in the next blog).

Frequency Histograms are generated if the number of distinct values are less than the number of Buckets. These Buckets, if not specified during statistics gathering, defaults to 254. PROMO_ID column has 4 distinct values, whereas, AMOUNT_SOLD has 636, which is more than 254 and hence Height Balanced Histograms. Lets execute our queries on these 2 columns and check the CARDINALITY estimates.

select column_id, column_name, num_distinct, num_nulls,
	density, histogram
from	dba_tab_columns
where	owner='SCOTT'
and	table_name='TEST_SELECTIVITY'
order by 1;

 COLUMN_ID COLUMN_NAME       NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
---------- ----------------- ------------ ---------- ---------- --------------------
         1 PROD_ID                     72          0 .013888889 NONE
         2 CUST_ID                   7056          0 .000141723 NONE
         3 TIME_ID                   1096          0 .000912409 NONE
         4 PROMO_ID                     4          0 .000000625 FREQUENCY
         5 QUANTITY_SOLD                1          0          1 NONE
         6 AMOUNT_SOLD                636          0   .0018217 HEIGHT BALANCED

Since we have additional statistics, lets check the details from DBA_TAB_HISTOGRAMS for this column.

SQL> select ENDPOINT_NUMBER, ENDPOINT_VALUE
  2  from dba_tab_histograms
  3  where table_name='TEST_SELECTIVITY'
  4  and   column_name='PROMO_ID'
  5  order by 1;

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
           2074             33
          20052            350
          22297            351
         800000            999

For the Frequency Histogram, the data is stored in a cumulative manner. The Endpoint_number stores the cumulative number of rows and the Endpoint_value stores the actual column value. For example, for PROMO_ID=33, we expect 2074 rows, for PROMO_ID=350, we expect 20052-2074=17981 rows, for PROMO_ID=351, we expect 22297-20052=2245 rows and so on.. Lets run the queries for each of these PROMO_ID’s.

SQL> set autot trace
SQL> select cust_id, amount_sold, promo_id from test_selectivity where promo_id=999;

777703 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |   777K|  9873K|   960   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY |   777K|  9873K|   960   (2)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROMO_ID"=999)

The Optimizer Calculation for cardinality matches the actual number of rows fetched. For other values too, these were perfectly matching (see below).

SQL> select cust_id, amount_sold, promo_id from test_selectivity where promo_id=350;

17978 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  | 17978 |   228K|   958   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY | 17978 |   228K|   958   (2)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROMO_ID"=350)

SQL> select cust_id, amount_sold, promo_id from test_selectivity where promo_id=33;

2074 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |  2074 | 26962 |   958   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY |  2074 | 26962 |   958   (2)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROMO_ID"=33)

SQL> select cust_id, amount_sold, promo_id from test_selectivity where promo_id=351;

2245 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |  2245 | 29185 |   958   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY |  2245 | 29185 |   958   (2)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROMO_ID"=351)

Perfect. The calculation in this case is very simple. Take the values from DBA_TAB_HISTOGRAMS and get the accurate CARDINALITY. However, this stands good for the values that exists and are part of the histograms. What if we run a query against a value that doesn’t exists in the table or had no rows when the stats were gathered, but have few or more rows when the queries are executed against this value ? This value will have no cumulative data into DBA_TAB_HISTOGRAMS. In such cases, will Optimizer fall back to CARDINALITY = SELECTIVITY x NUM_ROWS, where SELECTIVITY is DENSITY ? Lets check.

select column_id, column_name, num_distinct, num_nulls,
	density, histogram
from	dba_tab_columns
where	owner='SCOTT'
and	table_name='TEST_SELECTIVITY'
and	column_name='PROMO_ID';

 COLUMN_ID COLUMN_NAME       NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
---------- ----------------- ------------ ---------- ---------- --------------------
         4 PROMO_ID                     4          0 .000000625 FREQUENCY

SQL> select &&optdensity * 800000 Cardinality from dual;
old   1: select &&optdensity * 800000 Cardinality from dual
new   1: select .000000625 * 800000 Cardinality from dual

CARDINALITY
-----------
         .5

If the Density is considered as a SELECTIVITY, the expected CARDINALITY will be 1 (ceil of 0.5). I will run a query with PROMO_ID=500, which doesn’t exists.

SQL> set autot trace
SQL> select * from test_selectivity where promo_id=500;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |  1037 | 25925 |   958   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY |  1037 | 25925 |   958   (2)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROMO_ID"=500)

Actual Number of Rows are ZERO, Optimizer Estimated as 1037 and SELECTIVITY (density) based expected was 1. ZERO v/s 1037, a huge mis-estimate. Also, we can see that with histograms, optimizer does not consider DENSITY column. How do we get the calculation ? Here, 10053 trace file comes handy. Lets generate a 10053 trace for a non-existent value and see the relevant portion that contains the calculation.

SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for TEST_SELECTIVITY[A]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
  Column (#4):
    NewDensity:0.001296, OldDensity:0.000001 BktCnt:800000.000000, PopBktCnt:800000.000000, PopValCnt:4, NDV:4
  Column (#4): PROMO_ID(NUMBER)
    AvgLen: 4 NDV: 4 Nulls: 0 <b?Density: 0.001296 Min: 33.000000 Max: 999.000000
    Histogram: Freq  #Bkts: 4  UncompBkts: 800000  EndPtVals: 4  ActualVal: yes
  Table: TEST_SELECTIVITY  Alias: A
    Card: Original: 800000.000000  Rounded: 1037  Computed: 1037.000000  Non Adjusted: 1037.000000

As per 10053, the Rounded and Computed Cardinality is 1037. The Density is 0.001296. However, the Density from DBA_TAB_COLUMNS is .000000625. There are two additional statistics : NewDensity and OldDensity. OldDensity is 0.000001, which is the rounded off value for the actual Density stored in DBA_TAB_COLUMNS i.e .000000625. What is NewDensity ? The value against this is used as a final Density to calculate the Cardinality i.e.0.001296*800000 = 1037. It seems, for a non-existent value, Optimizer computes this NewDensity and uses this as a SELECTIVITY to come out with the Expected Cardinality.

The calculation for NewDensity, in case of Frequency Histogram is 50% of the lowest number of rows in DBA_TAB_HISTOGRAMS, which is 0.5 x 2074/NUM_ROWS = 0.00129625. So, NewDensity becomes the SELECTIVITY and CARDINALITY is SELECTIVITY x NUM_ROWS, 0.00129625 x 800000 = 1037(see below).

select promo_Id, count(*) from test_selectivity group by promo_id order by 2;

  PROMO_ID   COUNT(*)
---------- ----------
        33       2074  select 0.5*2074/800000 NewDensity from dual;

NEWDENSITY
----------
 .00129625
SQL> select round(&&new_density*800000,0) from dual;
old   1: select round(&&new_density*800000,0) from dual
new   1: select round( .00129625*800000,0) from dual

ROUND(.00129625*800000,0)
-------------------------
                     1037

Before we get into more details, let us check the Height Balanced Histograms. We have a Height Balanced Histogram on Amount_Sold Column.

SQL> select column_id, column_name, num_distinct, num_nulls,
  2                  density, histogram
  3  from       dba_tab_columns
  4  where      owner='SCOTT'
  5  and        table_name='TEST_SELECTIVITY'
  6  and             column_name='AMOUNT_SOLD'
  7  order by 1;

 COLUMN_ID COLUMN_NAME       NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
---------- ----------------- ------------ ---------- ---------- --------------------
         6 AMOUNT_SOLD                636          0   .0018217 HEIGHT BALANCED

We have 636 Distinct Values for this column and the maximum number of Buckets are 254. The way these histograms are generated is that the number of rows in the table is equally divided into 254 buckets. The Maximum value for each of the bucket is calculated and then the buckets are compressed, if a value spans across more than 1 Bucket. I executed a query, which is similar to the query executed by the Optimizer during the statistics gathering (see below).

SQL> select bucket, count(*), min(amount_sold) min_amt, max(amount_sold) max_amt from (
  2  select amount_sold, ntile(254) over (order by amount_sold) bucket
  3  from       test_selectivity
  4  order by amount_sold)
  5  group by bucket
  6  order by 1;

    BUCKET   COUNT(*)    MIN_AMT    MAX_AMT
---------- ---------- ---------- ----------
         1       3150          6          7
         2       3150          7          7
         3       3150          7          7 <-- Popular Value (3 Buckets)
         4       3150          7          8
         5       3150          8          8
         6       3150          8          8
         7       3150          8          8
         8       3150          8          8
         9       3150          8          8
        10       3150          8          8
        11       3150          8          8 <-- Popular Value (8 Buckets)
        12       3150          8          9
        13       3150          9          9
        14       3150          9          9
        15       3150          9          9
        16       3150          9          9
        17       3150          9          9
        18       3150          9          9
        19       3150          9          9
        20       3150          9          9
        21       3150          9          9 <-- Popular Value (10 Buckets)
        22       3150          9         10
        23       3150         10         10
        24       3150         10         10
        25       3150         10         10
        26       3150         10         10
        27       3150         10         10
        28       3150         10         10
        29       3150         10         10
        30       3150         10         10
        31       3150         10         10
        32       3150         10         10
        33       3150         10         11
        34       3150         11         11
        35       3150         11         11
        36       3150         11         11
        37       3150         11         11
        38       3150         11         11
        39       3150         11         11
        40       3150         11         11
        41       3150         11         11
        42       3150         11         12
        43       3150         12         12
        44       3150         12         12
        45       3150         12         12
        46       3150         12         13
        47       3150         13         13
        48       3150         13         13
        49       3150         13         13
        50       3150         13         13
        51       3150         13         13
        52       3150         13         13
        53       3150         13         13
        54       3150         13         14
        55       3150         14         14
        56       3150         14         14
        57       3150         14         14
        58       3150         14         15 <-- Non-Popular (Only 1 Bucket)
        59       3150         15         16
        60       3150         16         16
        61       3150         16         17
        62       3150         17         17
        63       3150         17         17
        64       3150         17         17
        65       3150         17         18 <-- Non-Popular (1 Bucket)
        66       3150         18         19
        67       3150         19         19
        68       3150         19         19
        69       3150         19         20
        70       3150         20         20
        71       3150         20         21
        72       3150         21         21
        73       3150         21         21
        74       3150         21         21
        75       3150         21         21
        76       3150         21         21
        77       3150         21         22
        78       3150         22         22
        79       3150         22         22
        80       3150         22         22
        81       3150         22         22
        82       3150         22         23
        83       3150         23         23
        84       3150         23         23
        85       3150         23         23
        86       3150         23         23
        87       3150         23         24
        88       3150         24         24
        89       3150         24         24
        90       3150         24         24
        91       3150         24         24
        92       3150         24         25
        93       3150         25         25
        94       3150         25         25
        95       3150         25         25
        96       3150         25         25
        97       3150         25         26
        98       3150         26         26
        99       3150         26         26
       100       3150         26         26
       101       3150         26         27
       102       3150         27         27
       103       3150         27         28
       104       3150         28         28
       105       3150         28         28
       106       3150         28         28
       107       3150         28         29
       108       3150         29         29
       109       3150         29         29
       110       3150         29         30
       111       3150         30         30
       112       3150         30         30
       113       3150         30         30
       114       3150         30         30
       115       3150         30         31
       116       3150         31         31
       117       3150         31         31
       118       3150         31         32
       119       3150         32         32
       120       3150         32         33
       121       3150         33         33
       122       3150         33         33
       123       3150         33         34
       124       3150         34         34
       125       3150         34         34
       126       3150         34         35
       127       3150         35         36
       128       3150         36         36
       129       3150         36         38
       130       3150         38         38
       131       3150         38         38
       132       3150         38         39
       133       3150         39         39
       134       3150         39         39
       135       3150         39         40
       136       3150         40         40
       137       3150         40         41
       138       3150         41         41
       139       3150         41         42
       140       3150         42         42
       141       3150         42         43
       142       3150         43         43
       143       3150         43         45
       144       3150         45         45
       145       3150         45         46
       146       3150         46         46
       147       3150         46         46
       148       3150         46         46
       149       3150         46         46
       150       3150         46         46
       151       3150         46         47
       152       3150         47         47
       153       3150         47         47
       154       3150         47         47
       155       3149         47         47
       156       3149         47         47
       157       3149         47         47
       158       3149         47         47
       159       3149         47         48
       160       3149         48         48
       161       3149         48         48
       162       3149         48         48
       163       3149         48         48
       164       3149         48         48
       165       3149         48         48
       166       3149         48         48
       167       3149         48         49
       168       3149         49         49
       169       3149         49         49
       170       3149         49         49
       171       3149         49         49
       172       3149         49         49
       173       3149         49         49
       174       3149         49         49
       175       3149         49         50
       176       3149         50         50
       177       3149         50         51
       178       3149         51         51
       179       3149         51         51
       180       3149         51         51
       181       3149         51         51
       182       3149         51         52
       183       3149         52         52
       184       3149         52         52
       185       3149         52         53
       186       3149         53         53
       187       3149         53         53
       188       3149         53         54
       189       3149         54         54
       190       3149         54         54
       191       3149         54         55
       192       3149         55         56
       193       3149         56         56
       194       3149         56         57
       195       3149         57         57
       196       3149         57         58
       197       3149         58         58
       198       3149         58         59
       199       3149         59         60
       200       3149         60         60
       201       3149         60         62
       202       3149         62         62
       203       3149         62         63
       204       3149         63         63
       205       3149         63         64
       206       3149         64         64
       207       3149         64         65
       208       3149         65         66
       209       3149         66         70
       210       3149         70         72
       211       3149         72         74
       212       3149         74         79
       213       3149         79         90
       214       3149         90         94
       215       3149         94         97
       216       3149         97        101
       217       3149        101        113
       218       3149        113        115
       219       3149        115        117
       220       3149        117        123
       221       3149        123        125
       222       3149        125        127
       223       3149        127        131
       224       3149        131        136
       225       3149        136        151
       226       3149        151        158
       227       3149        158        163
       228       3149        163        170
       229       3149        170        180
       230       3149        180        199
       231       3149        199        203
       232       3149        203        208
       233       3149        208        211
       234       3149        211        214
       235       3149        214        225
       236       3149        225        302
       237       3149        302        307
       238       3149        307        531
       239       3149        531        552
       240       3149        552        594
       241       3149        594        602
       242       3149        602        629
       243       3149        629        900
       244       3149        900        973
       245       3149        973       1016
       246       3149       1016       1054
       247       3149       1054       1093
       248       3149       1093       1192
       249       3149       1192       1237
       250       3149       1237       1301
       251       3149       1301       1463
       252       3149       1463       1546
       253       3149       1546       1639
       254       3149       1639       1783

254 rows selected.

Total Number of rows in this table is 800000 divided by 254 Buckets is 3149 Rows. From the output above, it can be seen that each bucket has 3149 rows and there are some popular and non-popular values. For example : 7.8.9 are Popular (there are other popular values as well) and 15,18 are Non-Popular (there are other non-popular values as well). Popular values are values spanning across 2 or more Buckets. Non-Popular Values are values with 1 or less bucket. Finally, when the histogram is generated, the popular buckets are compressed to save dictionary space and the resultant output from DBA_TAB_HISTOGRAM is as under.

SQL> select ENDPOINT_NUMBER, ENDPOINT_VALUE
  2  from dba_tab_histograms
  3  where table_name='TEST_SELECTIVITY'
  4  and   column_name='AMOUNT_SOLD'
  5  order by 1;

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
              0              6 <-- Popular Value
              3              7 <-- Popular Value (3-0=3 Buckets)
             11              8 <-- Popular Value (11-3=8 Buckets)
             21              9 <-- Popular Value (21-11=10 Buckets)
             32             10
             41             11
             45             12
             53             13
             57             14
             58             15 <-- Non-Popular Value (58-57=1 Bucket)
             60             16
             64             17
             65             18
             68             19
             70             20
             76             21
             81             22
             86             23
             91             24
             96             25
            100             26
            102             27
            106             28
            109             29
            114             30
            117             31
            119             32
            122             33
            125             34
            126             35
            128             36
            131             38
            134             39
            136             40
            138             41
            140             42
            142             43
            144             45
            150             46
            158             47
            166             48
            174             49
            176             50
            181             51
            184             52
            187             53
            190             54
            191             55
            193             56
            195             57
            197             58
            198             59
            200             60
            202             62
            204             63
            206             64
            207             65
            208             66
            209             70
            210             72
            211             74
            212             79
            213             90
            214             94
            215             97
            216            101
            217            113
            218            115
            219            117
            220            123
            221            125
            222            127
            223            131
            224            136
            225            151
            226            158
            227            163
            228            170
            229            180
            230            199
            231            203
            232            208
            233            211
            234            214
            235            225
            236            302
            237            307
            238            531
            239            552
            240            594
            241            602
            242            629
            243            895
            244            973
            245           1016
            246           1054
            247           1093
            248           1192
            249           1237
            250           1301
            251           1463
            252           1546
            253           1639
            254           1783

104 rows selected.

254 Buckets are compressed into 104 Buckets. The CARDINALITY calculations, in these cases are very simple. For Popular Value, it is 3149 (number of rows in each bucket) multiplied by number of Buckets. Let us run the queries and see the results.

## For 2 Buckets

SQL> select * from test_selectivity where amount_sold=56;

6204 rows selected.

Elapsed: 00:00:00.16

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |  6299 |   153K|   960   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY |  6299 |   153K|   960   (2)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("AMOUNT_SOLD"=56)

## For 10 Buckets

SQL> select * from test_selectivity where amount_sold=9;

31964 rows selected.

Elapsed: 00:00:00.64

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  | 31496 |   768K|   960   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY | 31496 |   768K|   960   (2)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("AMOUNT_SOLD"=9)

For Non-Popular or Non-Existent values.
Will it be DENSITY x NUM_ROWS ? i.e. 0.0018217 x 800000 = 1457. Lets run the query to check this.

SQL> select column_id, column_name, num_distinct, num_nulls,
  2                  density, histogram
  3  from       dba_tab_columns
  4  where      owner='SCOTT'
  5  and        table_name='TEST_SELECTIVITY'
  6  and             column_name='AMOUNT_SOLD'
  7  order by 1;

 COLUMN_ID COLUMN_NAME       NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
---------- ----------------- ------------ ---------- ---------- --------------------
         6 AMOUNT_SOLD                636          0   .0018217 HEIGHT BALANCED

SQL> select &&densit*800000 from dual;
old   1: select &&densit*800000 from dual
new   1: select   .0018217*800000 from dual

.0018217*800000
---------------
        1457.36

The Cardinality for non-popular values, as can be seen after executing the queries is as under.

SQL> select * from test_selectivity where amount_sold=55;

3372 rows selected.

Elapsed: 00:00:00.11

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |   285 |  7125 |   960   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY |   285 |  7125 |   960   (2)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("AMOUNT_SOLD"=55)

Value 55 is a Non-Popular Value. We expected the expected cardinality as 1457, but it is 285. Let us generate a 10053 trace for this and check the trace.

SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for TEST_SELECTIVITY[A]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
  Column (#6):
    NewDensity:0.000356, OldDensity:0.001822 BktCnt:254.000000, PopBktCnt:201.000000, PopValCnt:50, NDV:636
  Column (#6): AMOUNT_SOLD(NUMBER)
    AvgLen: 4 NDV: 636 Nulls: 0 Density: 0.000356 Min: 6.000000 Max: 1783.000000
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 104  ActualVal: yes
  Table: TEST_SELECTIVITY  Alias: A
    Card: Original: 800000.000000  Rounded: 285  Computed: 284.862003  Non Adjusted: 284.862003

We see a similar pattern here. NewDensity is used as a SELECTIVITY to compute the CARDINALITY (0.000356×800000=285). How is this NewDensity calculated for Height Balanced Histograms ? It is computed as :

[(NPBKTCNT)/(BKTCNT * (NDV – POPVALCNT))]

From the 10053 trace, we can get the values of each of these. BKTCNT (Bucket Count) is 254, POPBKCNT (Popular Bucket Count) are 201. This makes NPBKCNT as 254-201=53. NDV (Number of Distinct Values is 636 and POPVALCNT (Popular Value Counts) are 50. Applying these values, we get [53/(254 *(636-50))] = .000356078

SQL> select (53/(254*(636-50))) newdensity from dual;

NEWDENSITY
----------
.000356078

SQL> select ceil(&&ndensit * 800000) from dual;
old   1: select ceil(&&ndensit * 800000) from dual
new   1: select ceil(.000356078 * 800000) from dual

CEIL(.000356078*800000)
-----------------------
                    285

NewDensity, I assume, was introduced in 11g, but is backported in 10204 as well. This was introduced as a Bug Fix. However, in our case, this is actually causing a mis-estimation. How do we disable this fix? The solution is disabling the fix_control 5483301 and setting _optimizer_enable_density_improvements to FALSE. Both these needs to be set together. We will set this at the session level and see the results for a Non-Existent value in a Frequency Histogram and a Non-Popular value in a Height Balanced Histogram.

SQL> alter session set "_fix_control"='5483301:off';
SQL> alter session set "_optimizer_enable_density_improvements"=false;

SQL> set autot trace
SQL> select * from test_selectivity where promo_id=500;
no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |     1 |    25 |   958   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY |     1 |    25 |   958   (2)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROMO_ID"=500)

SQL> select * from test_selectivity where amount_sold=55;

3372 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |  1457 | 36425 |   960   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY |  1457 | 36425 |   960   (2)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("AMOUNT_SOLD"=55)

With these 2 settings, the Optimizer falls back to its Original Calculation of DENSITY x NUM_ROWS for Cardinality calculation.

It had been a long posting, however, I felt this to be necessary because many people still don’t know about this NewDensity. I was myself surprised when I was working on a real life issue and came across this mis-estimation. A 10053 trace revealed NewDensity, which was new for me as well. For the real life example, see below.

select count(*) from nca.s_p_attributes a1
WHERE   a1.value='olwassenen';

    COUNT(*)
------------
      591168

SQL> select plan_table_output from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------
SQL_ID	79dfpvydpk710, child number 0
-------------------------------------
select count(*) from nca.s_p_attributes a1 WHERE
a1.value='olwassenen’
----------------------------------------------------------
| Id  | Operation	        | Name 	        | Rows	| 
----------------------------------------------------------
|   0 | SELECT STATEMENT 	|		|	|
|   1 |  SORT AGGREGATE  	|		|      1| 
|*  2 |   INDEX SKIP SCAN	| SP_P_IND3     |      8| 
----------------------------------------------------------

The estimated and actual is way out. 8 Rows v/s 591168 Rows. At this point, I requested a 10053 trace, which pointed me to NewDensity value. The issue was resolved by way of disabling the fix_control and setting _optimizer_enable_density_improvements to FALSE.

Filed under AIOUG, Optimizer, Performance, Uncategorized

Optimizer – Part II (Cardinality – Actuals v/s Assumed)

August 12, 2016 9 Comments

This is in continuation to my previous post, Optimizer – Part I of this series. In Part I, we covered the mathematical formulas used by the Optimizer. In this post, we shall see these calculations in action. For this, we will create a sample table and use this through out to see optimizer behaviour. So, lets create our table TEST_SELECTIVITY from SALES table under SH Schema. It is very critical to know your data. Therefore, while creating the table, I have manipulated the data to demonstrate the behaviour against the different data distribution.

exec dbms_random.seed(0);

create table test_selectivity as
select 	a.prod_id, 
	a.cust_id,
        trunc(sysdate)-round(dbms_random.value(0,1095),0) time_id,
        a.promo_id, 
        a.quantity_sold,
        round(a.amount_sold,0) amount_sold
from 	sh.sales a
where 	rownum<=8e5;

The table has 800k rows. The columns of interest for our demonstrations are TIME_ID, which is populated with 3 years of data, PROMO_ID and AMOUNT_SOLD. Once the table is created, Optimizer Statistics are automatically gathered on the table (Oracle 11g and above). Let’s query all the relevant statistics.

select table_name, num_rows, blocks from dba_tables
where table_name='TEST_SELECTIVITY';

TABLE_NAME                       NUM_ROWS     BLOCKS
------------------------------ ---------- ----------
TEST_SELECTIVITY                   800000       3478

select 	column_name, 
	num_distinct, 
	num_nulls, 
	density,
	histogram
from	dba_tab_columns
where	owner='SCOTT'
and	table_name='TEST_SELECTIVITY'
order by 1;

COLUMN_NAME        NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
------------------ ------------ ---------- ---------- --------------------
AMOUNT_SOLD                 636          0 .001572327 NONE
CUST_ID                    7056          0 .000141723 NONE
PROD_ID                      72          0 .013888889 NONE
PROMO_ID                      4          0        .25 NONE
QUANTITY_SOLD                 1          0          1 NONE
TIME_ID                    1096          0 .000912409 NONE

6 rows selected.

As mentioned earlier, for our demonstration, we will query the table on the three columns. AMOUNT_SOLD has 636 Distinct Values, PROMO_ID has 4 distinct values and TIME_ID has 1096 Distinct Values. In my previous blog (Part I), we discussed about SELECTIVITY, which in this case is 1/NDV for each of the columns in the table. Selectivity is very critical, as it drives the Access Path and is used to calculate the Cardinality, which drives the Access Order. Therefore, accurate calculation of Selectivity is very critical for the Optimizer.

Now, let us run our queries against each of these three columns and check the Optimizer calculation of Expected Rows against the Actual Rows. The queries will be on EQUALITY, LESS THAN and GREATER THAN predicated. Please refer to my previous blog for the calculation of SELECTIVITY for each of these predicate types. The effective CARDINALITY = SELECTIVITY X NUM_ROWS. Here we go with the first column (TIME_ID).

First lets check the Low_Value and High_Value for the TIME_ID column. These values are used for Range Predicate queries to calculate the Available Range (High_Value – Low_Value).

with function get_date(n_raw in raw) return date
as
	l_date        date;
begin
	dbms_stats.convert_raw_value(n_raw,l_date);
	return l_date;
end;
select	column_name,
	get_date(low_value) lo_value,
	get_date(high_value) hi_value
from	dba_tab_columns
where	owner='SCOTT'
and	table_name='TEST_SELECTIVITY'
and	data_type='DATE'
order by 1;
/

COLUMN_NAME        LO_VALUE             HI_VALUE
------------------ -------------------- --------------------
TIME_ID            13-AUG-2013 00:00:00 12-AUG-2016 00:00:00

Function in WITH clause is a 12c new feature. For Oracle Database versions prior to 12c, create the function using CREATE FUNCTION clause and then used it in the query.

For the Equality Predicate, SELECTIVITY is 1/Num_Distinct and CARDINALITY = SELECTIVITY X NUM_ROWS. After calculating these, we will then run the query on this table to validate the actual number or rows.

## EQUALITY PREDICATE
SQL> select 1/&&ndv Selectivity from dual;
old   1: select 1/&&ndv Selectivity from dual
new   1: select 1/      1096 Selectivity from dual

SELECTIVITY
-----------
 .000912409

SQL> select round(&&selective*800000,0) cardinality from dual;
old   1: select round(&&selective*800000,0) cardinality from dual
new   1: select round(.000912409*800000,0) cardinality from dual

CARDINALITY
-----------
        730

SQL> set autot trace
SQL> select cust_id, amount_sold, promo_id from test_selectivity
  2  where time_id=to_date('11-DEC-2015','DD-MON-YYYY');

704 rows selected.

Elapsed: 00:00:00.99

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |   730 | 15330 |  1088  (20)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY |   730 | 15330 |  1088  (20)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TIME_ID"=TO_DATE(' 2015-12-11 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))


SQL> set autot off

Assumption – 730 Rows and Actual – 704 Rows (Nearly Accurate).

Next, we run a query with Less Than Predicate. The SELECTIVITY in this case will be Required_Range/Available_Range (see part I for the exact formula – Required_Range will be computed as Required_Date – Low_Value). CARDINALITY is again SELECTIVITY x NUM_ROWS

## LESS THAN PREDICATE

## Required_Range - 850 Days
SQL> select to_date('11-DEC-2015','DD-MON-YYYY')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') req_range
  2  from       dual;
old   1: select to_date('11-DEC-2015','DD-MON-YYYY')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') req_range
new   1: select to_date('11-DEC-2015','DD-MON-YYYY')-to_date('13-AUG-2013 00:00:00','DD-MON-YYYY HH24:MI:SS') req_range

 REQ_RANGE
----------
       850

## Available_Range - 1095 Days
SQL> select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') avl_range
  2  from       dual;
old   1: select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') avl_range
new   1: select to_date('12-AUG-2016 00:00:00','DD-MON-YYYY HH24:MI:SS')-to_date('13-AUG-2013 00:00:00','DD-MON-YYYY HH24:MI:SS') avl_range

 AVL_RANGE
----------
      1095


## Selectivity - Required_Range/Available_Range
SQL> select &&r_range/&&a_range Selectivity from dual;
old   1: select &&r_range/&&a_range Selectivity from dual
new   1: select        850/      1095 Selectivity from dual

SELECTIVITY
-----------
 .776255708

## Assumed Cardinality
SQL> select round(&&selective*800000,0) cardinality from dual;
old   1: select round(&&selective*800000,0) cardinality from dual
new   1: select round(.776255708*800000,0) cardinality from dual

CARDINALITY
-----------
     621005


SQL> select cust_id, amount_sold, promo_id from test_selectivity
  2  where time_id<to_date('11-DEC-2015','DD-MON-YYYY');

620764 rows selected.

Elapsed: 00:00:06.06

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |   621K|    12M|  1125  (23)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY |   621K|    12M|  1125  (23)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TIME_ID"<TO_DATE(' 2015-12-11 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

Assumption – 621k Rows and Actual – 620k Rows (Nearly Accurate).

Next, the query with Greater Than Predicate and the SELECTIVITY will be again Required_Range/Available_Range. The difference, in this case, will be that High_Value will be used to calculate the Required_Range.

## GREATER THAN PREDICATE

## Required_Range - 245 Days
SQL> select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('11-DEC-2015','DD-MON-YYYY') req_range
  2  from       dual;
old   1: select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('11-DEC-2015','DD-MON-YYYY') req_range
new   1: select to_date('12-AUG-2016 00:00:00','DD-MON-YYYY HH24:MI:SS')-to_date('11-DEC-2015','DD-MON-YYYY') req_range

 REQ_RANGE
----------
       245

## Available_Range - 1095 Days
SQL> select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') avl_range
  2  from       dual;
old   1: select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') avl_range
new   1: select to_date('12-AUG-2016 00:00:00','DD-MON-YYYY HH24:MI:SS')-to_date('13-AUG-2013 00:00:00','DD-MON-YYYY HH24:MI:SS') avl_range

 AVL_RANGE
----------
      1095

## Selectivity - Required_Range/Available_Range
SQL> select &&r_range/&&a_range Selectivity from dual;
old   1: select &&r_range/&&a_range Selectivity from dual
new   1: select        245/      1095 Selectivity from dual

SELECTIVITY
-----------
 .223744292

## Assumed Cardinality 
SQL> select round(&&selective*800000,0) cardinality from dual;
old   1: select round(&&selective*800000,0) cardinality from dual
new   1: select round(.223744292*800000,0) cardinality from dual

CARDINALITY
-----------
     178995

SQL> set autot trace
SQL> select cust_id, amount_sold, promo_id from test_selectivity
  2  where time_id>to_date('11-DEC-2015','DD-MON-YYYY');

178532 rows selected.

Elapsed: 00:00:02.01

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |   178K|  3670K|  1099  (21)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY |   178K|  3670K|  1099  (21)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TIME_ID">TO_DATE(' 2015-12-11 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

Assumption – 178k Rows and Actual – 178k Rows (Accurate).

For Time_ID Column, the expected and actual cardinality were nearly accurate. Let’s shift our focus to the other column – AMOUNT_SOLD. We will run similar three queries – Equality, Less Than and Greater Than.

Before we execute the queries against this column, lets check the statistics (Density, Low_Value and High_Value).

## FOR AMOUNT_SOLD COLUMN

select	column_name, 
		num_distinct, 
		num_nulls,
		density, 
		histogram
from	dba_tab_columns
where	owner='SCOTT'
and		table_name='TEST_SELECTIVITY'
and		column_name='AMOUNT_SOLD';

COLUMN_NAME        NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
------------------ ------------ ---------- ---------- --------------------
AMOUNT_SOLD                 636          0 .001572327 NONE

with function get_number(n_raw in raw) return number
as
	l_number        number;
begin
	dbms_stats.convert_raw_value(n_raw,l_number);
	return l_number;
end;
select	column_name,
	get_number(low_value) lo_value,
	get_number(high_value) hi_value
from	dba_tab_columns
where	owner='SCOTT'
and	table_name='TEST_SELECTIVITY'
and	column_name='AMOUNT_SOLD';
/

COLUMN_NAME          LO_VALUE   HI_VALUE
------------------ ---------- ----------
AMOUNT_SOLD                 6       1783

All the calculation are same for this column as well.

## EQUALITY PREDICATE

SQL> select 1/&&ndv Selectivity from dual;
old   1: select 1/&&ndv Selectivity from dual
new   1: select 1/       636 Selectivity from dual

SELECTIVITY
-----------
 .001572327

SQL> select round(&&selective*800000,0) cardinality from dual;
old   1: select round(&&selective*800000,0) cardinality from dual
new   1: select round(.001572327*800000,0) cardinality from dual

CARDINALITY
-----------
       1258

SQL> set autot trace
SQL> select cust_id, amount_sold, promo_id from test_selectivity
  2  where amount_sold=1500;

122 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |  1258 | 16354 |  1136  (24)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY |  1258 | 16354 |  1136  (24)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("AMOUNT_SOLD"=1500)

Assumption – 1258 Rows and Actual – 122 Rows (Out by 10 times).

## LESS THAN PREDICATE

## Required_Range
SQL> select (1500-&&min_value) req_range from dual;
old   1: select (1500-&&min_value) req_range from dual
new   1: select (1500-         6) req_range from dual

 REQ_RANGE
----------
      1494

## Available_Range
SQL> select (&&max_value-&&min_value) avl_range from dual;
old   1: select (&&max_value-&&min_value) avl_range from dual
new   1: select (      1783-         6) avl_range from dual

 AVL_RANGE
----------
      1777


## Selectivity - Required_Range/Available_Range
SQL> select &&r_range/&&a_range Selectivity from dual;
old   1: select &&r_range/&&a_range Selectivity from dual
new   1: select       1494/      1777 Selectivity from dual

SELECTIVITY
-----------
 .840742825

## Assumed Cardinality
SQL> select round(&&selective*800000,0) Cardinality from dual;
old   1: select round(&&selective*800000,0) Cardinality from dual
new   1: select round(.840742825*800000,0) Cardinality from dual

CARDINALITY
-----------
     672594

SQL> set autot trace
SQL> select cust_id, amount_sold, promo_id from test_selectivity
  2  where amount_sold<1500;

791950 rows selected.

Elapsed: 00:00:07.57

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |   672K|  8538K|  1136  (24)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY |   672K|  8538K|  1136  (24)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("AMOUNT_SOLD"<1500)

Assumption – 672k Rows and Actual – 791k Rows (Inaccurate).

## GREATER THAN PREDICATE


## Required_Range
SQL> select (&&max_value-1500) req_range from dual;
old   1: select (&&max_value-1500) req_range from dual
new   1: select (      1783-1500) req_range from dual

 REQ_RANGE
----------
       283

## Available_range
SQL> select (&&max_value-&&min_value) avl_range from dual;
old   1: select (&&max_value-&&min_value) avl_range from dual
new   1: select (      1783-         6) avl_range from dual

 AVL_RANGE
----------
      1777

## Selectivity - Required_Range/Available_Range
SQL> select &&r_range/&&a_range Selectivity from dual;
old   1: select &&r_range/&&a_range Selectivity from dual
new   1: select        283/      1777 Selectivity from dual

SELECTIVITY
-----------
 .159257175

## Assumed Cardinality
SQL> select round(&&selective*800000,0) Cardinality from dual;
old   1: select round(&&selective*800000,0) Cardinality from dual
new   1: select round(.159257175*800000,0) Cardinality from dual

CARDINALITY
-----------
     127406

SQL> select cust_id, amount_sold, promo_id from test_selectivity
  2  where amount_sold>1500;

7928 rows selected.

Elapsed: 00:00:00.12

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |   127K|  1617K|  1136  (24)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY |   127K|  1617K|  1136  (24)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("AMOUNT_SOLD">1500)

Assumption – 127k Rows and Actual – 7928 Rows (Significantly Out).

Finally, we move to our last column of interest i.e. PROMO_ID. The distinct values in this column are 4 and the data distribution is as under:

## FOR PROMO_ID Column

select	promo_id, 
	count(*) cnt,
	round(ratio_to_report(count(*)) over()*100,2) "%age" 
from	test_selectivity
group by promo_id
order by 2;

  PROMO_ID        CNT       %age
---------- ---------- ----------
        33       2074        .26
       351       2245        .28
       350      17978       2.25
       999     777703      97.21

select	column_name, 
	num_distinct, 
	num_nulls, 
	density,
	histogram
from	dba_tab_columns
where	owner='SCOTT'
and	table_name='TEST_SELECTIVITY'
and	column_name='PROMO_ID'
order by 1;

COLUMN_NAME        NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
------------------ ------------ ---------- ---------- --------------------
PROMO_ID                      4          0        .25 NONE

For this column, the we will execute 4 queries and each of these will be EQUALITY Predicates. For Equality Predicates, the calculation for SELECTIVITY is simple, which is 1/NDV or DENSITY. From DBA_TAB_COLUMNS, we can see that the DENSITY for this column is 0.25 (1/4).

## FOR PROMO_ID

## Selectivity
SQL> select 1/&&ndv selectivity from dual;
old   1: select 1/&&ndv selectivity from dual
new   1: select 1/         4 selectivity from dual

SELECTIVITY
-----------
        .25

## Assumed Cardinality
SQL> select round(&&selective*800000,0) cardinality from dual;
old   1: select round(&&selective*800000,0) cardinality from dual
new   1: select round(       .25*800000,0) cardinality from dual

CARDINALITY
-----------
     200000

## VALUE 999
SQL> select cust_id, amount_sold, promo_id from test_selectivity where promo_id=999;

777703 rows selected.

Elapsed: 00:00:06.89

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |   200K|  2539K|  1112  (22)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY |   200K|  2539K|  1112  (22)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROMO_ID"=999)

Assumption – 200k Rows and Actual – 777k Rows (Out by 4 time).

## Value 350
SQL> select cust_id, amount_sold, promo_id from test_selectivity where promo_id=350;

17978 rows selected.

Elapsed: 00:00:00.27

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |   200K|  2539K|  1112  (22)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY |   200K|  2539K|  1112  (22)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROMO_ID"=350)

Assumption – 200k Rows and Actual – 17978 Rows (Significantly Out).

## VALUE 33
SQL> select cust_id, amount_sold, promo_id from test_selectivity where promo_id=33;

2074 rows selected.

Elapsed: 00:00:00.22

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |   200K|  2539K|  1112  (22)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY |   200K|  2539K|  1112  (22)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROMO_ID"=33)

Assumption – 200k Rows and Actual – 2074 Rows (Significantly Out).

## VALUE 351
SQL> select cust_id, amount_sold, promo_id from test_selectivity where promo_id=351;

2245 rows selected.

Elapsed: 00:00:00.11

Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |   200K|  2539K|  1112  (22)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY |   200K|  2539K|  1112  (22)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROMO_ID"=351)

Assumption – 200k Rows and Actual – 2245 Rows (Significantly Out).

Summary – So far

TIME_ID – Assumptions v/s Actuals √
AMOUNT_SOLD – Assumptions v/s Actuals Χ
PROMO_ID – Assumptions v/s Actuals Χ

Let’s make a simple change in the TIME_ID column. For this, I will create another table, which will be a replica of TEST_SELECTIVITY. We will make this change in the new table, so that, we do not disturb the Original Table.

create table test_selectivity_m as
select * from test_selectivity;

update test_selectivity_m set time_id=to_date('31-Dec-2050','DD-MON-YYYY')
where rownum=1;

exec dbms_stats.gather_table_stats(user,'TEST_SELECTIVITY_M');

select table_name, num_rows, blocks, partitioned from dba_tables
where table_name in ('TEST_SELECTIVITY','TEST_SELECTIVITY_M');

TABLE_NAME                       NUM_ROWS     BLOCKS PAR
------------------------------ ---------- ---------- ---
TEST_SELECTIVITY_M                 800000       3478 NO
TEST_SELECTIVITY                   800000       3478 NO

I created another table TEST_SELECTIVITY_M and updated a single row with a future date i.e.31st December 2050. Lets see, whether this minor change has any impact on the Optimizer Assumptions v/s Actuals.

with function get_date(n_raw in raw) return date
as
	l_date        date;
begin
	dbms_stats.convert_raw_value(n_raw,l_date);
	return l_date;
end;
select	column_name,
	num_distinct,
	num_nulls,
	density,
	histogram,
	get_date(low_value) lo_value,
	get_date(high_value) hi_value
from	dba_tab_columns
where	owner='SCOTT'
and	table_name='TEST_SELECTIVITY_M'
and	data_type='DATE'
order by 1;
/
COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM            LO_VALUE             HI_VALUE
------------------------------ ------------ ---------- ---------- -------------------- -------------------- --------------------
TIME_ID                                1097          0 .000911577 NONE                 13-AUG-2013 00:00:00 31-DEC-2050 00:00:00

Now, lets run the three queries on TIME_ID column against this table and see the results.

## EQUALITY PREDICATE

## Selectivity
SQL> select 1/&&ndv Selectivity from dual;
old   1: select 1/&&ndv Selectivity from dual
new   1: select 1/      1097 Selectivity from dual

SELECTIVITY
-----------
 .000911577

## Assumed Cardinality
SQL> select round(&&selective*800000,0) cardinality from dual;
old   1: select round(&&selective*800000,0) cardinality from dual
new   1: select round(.000911577*800000,0) cardinality from dual

CARDINALITY
-----------
        729

SQL> set autot trace
SQL> select cust_id, promo_id, amount_sold from test_selectivity_m
  2  where time_id=to_date('11-DEC-2015','DD-MON-YYYY');

704 rows selected.

Elapsed: 00:00:00.57

Execution Plan
----------------------------------------------------------
Plan hash value: 3843949181

----------------------------------------------------------------------------------------
| Id  | Operation         | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                    |   729 | 15309 |  1088  (20)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY_M |   729 | 15309 |  1088  (20)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TIME_ID"=TO_DATE(' 2015-12-11 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

Assumption – 729 Rows and Actual – 704 Rows (Nearly Accurate).

## LESS THAN PREDICATE

## Required_Range - 850 Days
SQL> select to_date('11-DEC-2015','DD-MON-YYYY')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') req_range
  2  from       dual;
old   1: select to_date('11-DEC-2015','DD-MON-YYYY')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') req_range
new   1: select to_date('11-DEC-2015','DD-MON-YYYY')-to_date('13-AUG-2013 00:00:00','DD-MON-YYYY HH24:MI:SS') req_range

 REQ_RANGE
----------
       850

## Available_Range - 13654 Days
SQL> select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') avl_range
  2  from       dual;
old   1: select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') avl_range
new   1: select to_date('31-DEC-2050 00:00:00','DD-MON-YYYY HH24:MI:SS')-to_date('13-AUG-2013 00:00:00','DD-MON-YYYY HH24:MI:SS') avl_range

 AVL_RANGE
----------
     13654

## Selectivity - Required_Range/Available_Range
SQL> select &&r_range/&&a_range Selectivity from dual;
old   1: select &&r_range/&&a_range Selectivity from dual
new   1: select        850/     13654 Selectivity from dual

SELECTIVITY
-----------
  .06225282

## Assumed Cardinality
SQL> select round(&&selective*800000,0) cardinality from dual;
old   1: select round(&&selective*800000,0) cardinality from dual
new   1: select round( .06225282*800000,0) cardinality from dual

CARDINALITY
-----------
      49802

SQL> select cust_id, promo_id, amount_sold from test_selectivity_m
  2  where time_id<to_date('11-DEC-2015','DD-MON-YYYY');

620764 rows selected.

Elapsed: 00:00:06.41

Execution Plan
----------------------------------------------------------
Plan hash value: 3843949181

----------------------------------------------------------------------------------------
| Id  | Operation         | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                    | 49802 |  1021K|  1091  (21)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY_M | 49802 |  1021K|  1091  (21)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TIME_ID"<TO_DATE(' 2015-12-11 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

Assumption – 49k Rows and Actual – 620k Rows (Significantly Out).

## GREATER THAN PREDICATE

## Required_Range - 12804 Days
SQL> select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('11-DEC-2015','DD-MON-YYYY') req_range
  2  from       dual;
old   1: select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('11-DEC-2015','DD-MON-YYYY') req_range
new   1: select to_date('31-DEC-2050 00:00:00','DD-MON-YYYY HH24:MI:SS')-to_date('11-DEC-2015','DD-MON-YYYY') req_range

 REQ_RANGE
----------
     12804

## Available_range - 13654 Days
SQL> select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') avl_range
  2  from       dual;
old   1: select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') avl_range
new   1: select to_date('31-DEC-2050 00:00:00','DD-MON-YYYY HH24:MI:SS')-to_date('13-AUG-2013 00:00:00','DD-MON-YYYY HH24:MI:SS') avl_range

 AVL_RANGE
----------
     13654

## Selectivity - Required_range/Available_range
SQL> select &&r_range/&&a_range Selectivity from dual;
old   1: select &&r_range/&&a_range Selectivity from dual
new   1: select      12804/     13654 Selectivity from dual

SELECTIVITY
-----------
  .93774718

## Assumed Cardinality
SQL> select round(&&selective*800000,0) cardinality from dual;
old   1: select round(&&selective*800000,0) cardinality from dual
new   1: select round( .93774718*800000,0) cardinality from dual

CARDINALITY
-----------
     750198

SQL> select cust_id, promo_id, amount_sold from test_selectivity_m
  2  where time_id>to_date('11-DEC-2015','DD-MON-YYYY');

178532 rows selected.

Elapsed: 00:00:02.15

Execution Plan
----------------------------------------------------------
Plan hash value: 3843949181

----------------------------------------------------------------------------------------
| Id  | Operation         | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                    |   750K|    15M|  1133  (24)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_SELECTIVITY_M |   750K|    15M|  1133  (24)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TIME_ID">TO_DATE(' 2015-12-11 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

Assumption – 750k Rows and Actual – 178k Rows (Out by 4 Times).

Final Summary:

TIME_ID – Assumptions v/s Actuals √
TIME_ID – Assumptions v/s Actuals (minor change) Χ
AMOUNT_SOLD – Assumptions v/s Actuals Χ
PROMO_ID – Assumptions v/s Actuals Χ

A minor change in the data had changed our Summary on Time_ID column. In case of our TEST_SELECTIVITY Table, the Optimizer estimation for Time_ID was nearly accurate, whereas, for other 2 columns, it was way out. What could be the reason ? Remember, accurate Selectivity and Cardinality is critical as it can have an impact on Access Path and Access Order respectively. Any discrepancy can cause a sub-optimal plan. Optimizer is a piece of code and completely depends on the Statistics that we gather and provide to it as an input. Therefore, the solution to these discrepancies has to be with us. In the next blog, which will be Part III of this series, we will cover the problem and the solution to fix this discrepancy, but that fix will cause problems for few cases and we shall cover those as well.

Filed under Optimizer, Performance

Optimizer – Part I

June 28, 2016 6 Comments

Just concluded a Full Day Event on Performance in Chandigarh for the North India Chapter of “All India Oracle User Group”. As committed in my earlier user group update blog, I thought for the benefit of the readers, posting the technical details on Optimizer, especially histograms, would help. Another reason, this is important is these sessions are attended by Developer communities as well and grasping everything in one session is very difficult. writeup will help them understand this critical piece.

I usually blog on real life challenges. The motivation behind this blog as well is an issue that I was working on and the fix that I applied to resolve it. This will be discussed at a relevant time and in a relevant part of this series.

I will publish a 4 part series, starting with the basics of Optimizer, the formulaes and then few examples with its various calculations. The series will be divided into 4 parts, such as :

Optimizer Basics & Calculations
Cardinality – Actuals v/s Assumed
Histograms – Frequency & Height Balanced
12c Enhancements – Hybrid and TopN Frequency Histograms

In the context of the Optimizer, the two terminologies commonly used are SELECTIVITY and CARDINALITY.

SELECTVITY :It is measured as a percentage of rows that would be returned from or filtered out of a row set. Thus the Selectivity of a predicate indicates how many rows pass a predicate test. Selectivity ranges from 0.0 to 1.0. A Selectivity of 0.0 means no rows are selected from a row set, whereas a selectivity of 1.0 means all the rows are selected. A predicate become more Selective as the values approaches 0.0 and less selective if it approaches 1.0. It drives the Access Path for example, tablescan or an Index Scan. With No Histograms, Selectivity for a column is computed as 1/NUM_DISTINCT or DENSITY.

CARDINALITY :Cardinality is the estimated number of rows returned by each operation in an Execution Plan. The Optimizer determines cardinality for each operation based on complex set of formulas that use both, the table and column level statistics or dynamic statistics. Cardinality estimates must be as accurate as possible because they influence all aspects of an execution plan. Cardinality is important when the Optimizer determines the cost of a Join. For example, in a Nested Loop Join between an EMP and DEPT table, the number of rows returned by EMP table determines how often the DEPT table will be probed. Cardinality drives the Access Order.

Just to simplify, for a Gender column with M & F (2 Distinct Values), the selectivity will be 1/2 or 0.5. If this table has around 100 rows, then the Cardinality will be Selectivity X Num_Rows, which is 0.5 x 100 = 50.

For a 100 row table with 2 columns, each with distinct values as 4 and 2, the combined selectivity (for AND predicate) will be 0.5 X 0.25 = 0.125 and the Cardinality will be 0.125 X 100 = 12.5 rounded off to 13.

Selectivity Calculation

Assume Column C
NDV is the Number of Distinct Values
minv is the Minumim Value for C
maxv is the Maximum Value for C

The formulae for Selectivity Calculation would be as under :

SEL(=C) = 1/NDV or DENSITY
SEL(<C) = (C-minv)/(maxv-minv)
SEL(<=C) = SEL(=C) + SEL(<C)
SEL(>C) = (maxv-C)/(maxv-minv)
SEL(>=C) = SEL(=C) + SEL(>C)

In case of a Range Predicate (<, , >=), the Numerator part is called as the Required Range and the Denominator Part is called as an Available Range.

Once the Selectivity is derived, Cardinality will be Selectivity multiplied by the Number of Rows. For multiple predicates involved, the Selectivity of each of these is derived based on above formulas and then used based on AND or OR predicates. For example :

WHERE A=:b1 and B=:b2 = SEL(=A) X SEL(=B)
WHERE A=:b1 and B>=:b2 = SEL(=A) X (SEL(=B) + SEL(>B))
WHERE A=:b1 or B=:b2 = SEL(=A) + SEL(=B) – (SEL(=A) X SEL(=B))

This was the first part of this series. In the next part, we will create a sample table and run through each of these formulas.

Filed under AIOUG, Optimizer

AIOUG Performance Tuning Day ! Hyderabad…

May 20, 2015 2 Comments

AIOUG is hosting a full day Performance Tuning day on 30th May 2015 in Hyderabad. I will be presenting on Oracle Query Optimizer and Performance. These sessions will be a mix of technical demos and real life examples. Hope to see a large gathering. Registrations are open at Performance Tuning Day.

Filed under AIOUG, Optimizer, Performance Tagged with aioug, Optimizer, Performance

Issue with Autotrace ! Criticality of Using a Right tool…..

November 7, 2013 2 Comments

As mentioned in my previous blog, my visit to Oracle Open World was very exciting and enriching. Every session I attended, I got to know something new and therefore thought of sharing this with my readers. I actually wanted to demonstrate this during my Sangam 2013 presentation. However, due to some official commitments, I had to pull out from the event.

In many of my presentations, especially for Developers, I had covered some concepts (along with the Demonstration) on Bind Peeking and issues with “Explain Plan” or Autotrace, as these are not bind-aware. This means, if a query contains a bind variable and if the execution plan of that query is dependant on the value provided to the bind, then the execution plan displayed (or generated) by “explain plan” / “autotrace” is not guaranted to be the same as that of runtime optimizer. Even today, I see many developers (and DBA’s as well) use either “explain plan for” or “autotrace” utilities to check for the execution plan of the queries they are working on.

I attended a very good session that was jointly presented by Maria Colgan and Jonathan Lewis. Maria mentioned about an issue with “set autot trace explain”, which has a potential of creating a (sub-optimal) plan and can cause a performance bottleneck as this sub-optimal plan can then be shared by other users.

Randolf Geist has already published a very good note on this issue and should help my readers understand this as it has been explained with good examples. However, since I was working on the demonstration to be presented during Sangam 2013, thought of sharing this here.

The demonstration is as under :


## Table Creation

drop table t1;

create table t1 as
select a.* from all_objects a, all_objects b
where rownum<=1e6;

create index t1_idx on t1(temporary);

select temporary, count(*) from t1 group by temporary;

exec dbms_stats.gather_table_stats(user,'T1',method_opt=>'for all columns size 1, for columns temporary size 100');

We created a table t1 with 1 Million Rows and an Index on Temporary Column. This Index has been created to demonstrate the difference in the plan when we query data on Temporary Column. Further, in order for the optimizer to generate different plan, we gather histogram on this column. This additional statistics will ensure that optimizer generates / computes the cardinality and the plan based on the input value.

Next, we execute two queries, one each with a Literal Value of ‘Y’ and a Bind Variable with Y passed as a value to the Bind. In both the case, the Optimizer computes nearly accurate cardinality and Index Scan Access Path. For the execution with Bind, the optimizer peeked into Bind to come out with the cardinality and Index Access Path.

select /*+ vivek_y */ OWNER, OBJECT_NAME from t1 where temporary='Y';

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
SQL_ID  7zfa7mstt63gv, child number 0
-------------------------------------
select /*+ vivek_y */ OWNER, OBJECT_NAME from t1 where temporary='Y'

Plan hash value: 546753835

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |       |       |    47 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1     |  2680 | 88440 |    47   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IDX |  2680 |       |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TEMPORARY"='Y')

variable b1 varchar2(32);
exec :b1:='Y';

select /*+ vivek_bind_y */ OWNER, OBJECT_NAME from t1 where temporary=:b1;

SQL_ID  9c5pp1gt64s7q, child number 0
-------------------------------------
select /*+ vivek_bind_y */ OWNER, OBJECT_NAME from t1 where
temporary=:b1

Plan hash value: 546753835

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |       |       |    47 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1     |  2680 | 88440 |    47   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IDX |  2680 |       |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - :B1 (VARCHAR2(30), CSID=178): 'Y'

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TEMPORARY"=:B1)

Let us check for the behaviour of “explain plan for” and “autotrace” utilities. Since these are not bind-aware, the plan displayed & generated will be a Full Table Scan. The computed cardinality is based on #Rows/NDV, which is 1000000/2 = 500000. Also, note the I/O’s (consistent read) as generated by autotrace.

explain plan for
select /*+ vivek_bind_y */ OWNER, OBJECT_NAME from t1 where temporary=:b1;

SQL> @utlxpls

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   500K|    15M|  3968   (2)| 00:00:48 |
|*  1 |  TABLE ACCESS FULL| T1   |   500K|    15M|  3968   (2)| 00:00:48 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TEMPORARY"=:B1)

set autot trace
select /*+ vivek_bind_y */ OWNER, OBJECT_NAME from t1 where temporary=:b1;

2327 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   500K|    15M|  3968   (2)| 00:00:48 |
|*  1 |  TABLE ACCESS FULL| T1   |   500K|    15M|  3968   (2)| 00:00:48 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TEMPORARY"=:B1)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        841  consistent gets <--- Note the I/O
          0  physical reads
          0  redo size

set autot trace

Last, we execute the query with “set autot trace explain” and the same query with “set autot off”. In this case, while the value passed to the bind is ‘Y’, the optimizer will re-use the plan (Full Table Scan) generated by “set autot trace explain” and this is where the problem starts. With “set autot trace explain”, the plan generated is stored in the Shared Pool and is shared by the same query, if run from an application and this can be a problem. Assuming the query executed next is a part of the application query and a developer runs the exactly same query using “set autot trace explain”, the plan generated will be without bind peek and will be sub-optimal.

set autot trace explain
select /*+ with_y */ OWNER, OBJECT_NAME from t1 where temporary=:b1;


SQL> @ap

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  cbf51h7v1276h, child number 0
-------------------------------------
select /*+ with_y */ OWNER, OBJECT_NAME from t1 where temporary=:b1

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |  3968 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   500K|    15M|  3968   (2)| 00:00:48 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TEMPORARY"=:B1)


18 rows selected.

set autot off
select /*+ with_y */ OWNER, OBJECT_NAME from t1 where temporary=:b1;

select plan_table_output from table(dbms_xplan.display_cursor(format=>'typical +peeked_binds'));

SQL_ID  cbf51h7v1276h, child number 0
-------------------------------------
select /*+ with_y */ OWNER, OBJECT_NAME from t1 where temporary=:b1

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |  3967 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   500K|    15M|  3967   (2)| 00:00:48 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TEMPORARY"=:B1)

new   1: select sql_id, sql_text, executions, buffer_gets, elapsed_time, rows_processed from v$sqlarea where sql_id='cbf51h7v1276h'

SQL_ID        SQL_TEXT                                           EXECUTIONS BUFFER_GETS ELAPSED_TIME ROWS_PROCESSED
------------- -------------------------------------------------- ---------- ----------- ------------ --------------
cbf51h7v1276h select /*+ with_y */ OWNER, OBJECT_NAME from t1 wh          1       14393      3563761           2327
              ere temporary=:b1

The last query is exactly same to the one that was executed with “set autot trace explain” and shared the plan generated by “set autotrace explain”, which is a Full table scan and is not the one that should have been generated. Interestingly, check for the I/O’s of the last Full Table Scan plan query. The query has done 14393 Logical reads, whereas, in one of the previous execution with “autotrace on”, the I/O’s shown was 891 (and the plan says full table scan). Why is there a difference between the two I/O’s of Full Table Scans ? For an explanation, read the blog from Randolf Geist.

Filed under Optimizer, Performance Tagged with autotrace, explain plan

← Older posts

Real Life Database / SQL Experiences : An Oracle Blog from Vivek Sharma

Oracle Community Yatra 2024 ! AIOUG

Like this:

Oracle Groundbreaker Yatra ! July 2019..

Like this:

Importance of Constraints ! Know your Reporting Tools

Like this:

Autonomous Database Tech Day ! Gurgaon

Like this:

Optimizer – Part IV (12c Enhancements)

Like this:

Optimizer – Part III (Frequency & Height Balanced)

Like this:

Optimizer – Part II (Cardinality – Actuals v/s Assumed)

Like this:

Optimizer – Part I

Like this:

AIOUG Performance Tuning Day ! Hyderabad…

Like this:

Issue with Autotrace ! Criticality of Using a Right tool…..

Like this:

Blog Subscription on Email

Search on this Blog

Recent Posts

Archives

Top Clicks

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

Blog Subscription on Email

Search on this Blog

Recent Posts

Archives

Top Clicks