Optimizer – Part II (Cardinality – Actuals v/s Assumed)
August 12, 2016 9 Comments
This is in continuation to my previous post, Optimizer – Part I of this series. In Part I, we covered the mathematical formulas used by the Optimizer. In this post, we shall see these calculations in action. For this, we will create a sample table and use this through out to see optimizer behaviour. So, lets create our table TEST_SELECTIVITY from SALES table under SH Schema. It is very critical to know your data. Therefore, while creating the table, I have manipulated the data to demonstrate the behaviour against the different data distribution.
exec dbms_random.seed(0);
create table test_selectivity as
select a.prod_id,
a.cust_id,
trunc(sysdate)-round(dbms_random.value(0,1095),0) time_id,
a.promo_id,
a.quantity_sold,
round(a.amount_sold,0) amount_sold
from sh.sales a
where rownum<=8e5;
The table has 800k rows. The columns of interest for our demonstrations are TIME_ID, which is populated with 3 years of data, PROMO_ID and AMOUNT_SOLD. Once the table is created, Optimizer Statistics are automatically gathered on the table (Oracle 11g and above). Let’s query all the relevant statistics.
select table_name, num_rows, blocks from dba_tables where table_name='TEST_SELECTIVITY'; TABLE_NAME NUM_ROWS BLOCKS ------------------------------ ---------- ---------- TEST_SELECTIVITY 800000 3478 select column_name, num_distinct, num_nulls, density, histogram from dba_tab_columns where owner='SCOTT' and table_name='TEST_SELECTIVITY' order by 1; COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM ------------------ ------------ ---------- ---------- -------------------- AMOUNT_SOLD 636 0 .001572327 NONE CUST_ID 7056 0 .000141723 NONE PROD_ID 72 0 .013888889 NONE PROMO_ID 4 0 .25 NONE QUANTITY_SOLD 1 0 1 NONE TIME_ID 1096 0 .000912409 NONE 6 rows selected.
As mentioned earlier, for our demonstration, we will query the table on the three columns. AMOUNT_SOLD has 636 Distinct Values, PROMO_ID has 4 distinct values and TIME_ID has 1096 Distinct Values. In my previous blog (Part I), we discussed about SELECTIVITY, which in this case is 1/NDV for each of the columns in the table. Selectivity is very critical, as it drives the Access Path and is used to calculate the Cardinality, which drives the Access Order. Therefore, accurate calculation of Selectivity is very critical for the Optimizer.
Now, let us run our queries against each of these three columns and check the Optimizer calculation of Expected Rows against the Actual Rows. The queries will be on EQUALITY, LESS THAN and GREATER THAN predicated. Please refer to my previous blog for the calculation of SELECTIVITY for each of these predicate types. The effective CARDINALITY = SELECTIVITY X NUM_ROWS. Here we go with the first column (TIME_ID).
First lets check the Low_Value and High_Value for the TIME_ID column. These values are used for Range Predicate queries to calculate the Available Range (High_Value – Low_Value).
with function get_date(n_raw in raw) return date as l_date date; begin dbms_stats.convert_raw_value(n_raw,l_date); return l_date; end; select column_name, get_date(low_value) lo_value, get_date(high_value) hi_value from dba_tab_columns where owner='SCOTT' and table_name='TEST_SELECTIVITY' and data_type='DATE' order by 1; / COLUMN_NAME LO_VALUE HI_VALUE ------------------ -------------------- -------------------- TIME_ID 13-AUG-2013 00:00:00 12-AUG-2016 00:00:00
Function in WITH clause is a 12c new feature. For Oracle Database versions prior to 12c, create the function using CREATE FUNCTION clause and then used it in the query.
For the Equality Predicate, SELECTIVITY is 1/Num_Distinct and CARDINALITY = SELECTIVITY X NUM_ROWS. After calculating these, we will then run the query on this table to validate the actual number or rows.
## EQUALITY PREDICATE
SQL> select 1/&&ndv Selectivity from dual;
old 1: select 1/&&ndv Selectivity from dual
new 1: select 1/ 1096 Selectivity from dual
SELECTIVITY
-----------
.000912409
SQL> select round(&&selective*800000,0) cardinality from dual;
old 1: select round(&&selective*800000,0) cardinality from dual
new 1: select round(.000912409*800000,0) cardinality from dual
CARDINALITY
-----------
730
SQL> set autot trace
SQL> select cust_id, amount_sold, promo_id from test_selectivity
2 where time_id=to_date('11-DEC-2015','DD-MON-YYYY');
704 rows selected.
Elapsed: 00:00:00.99
Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 730 | 15330 | 1088 (20)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST_SELECTIVITY | 730 | 15330 | 1088 (20)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TIME_ID"=TO_DATE(' 2015-12-11 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
SQL> set autot off
Assumption – 730 Rows and Actual – 704 Rows (Nearly Accurate).
Next, we run a query with Less Than Predicate. The SELECTIVITY in this case will be Required_Range/Available_Range (see part I for the exact formula – Required_Range will be computed as Required_Date – Low_Value). CARDINALITY is again SELECTIVITY x NUM_ROWS
## LESS THAN PREDICATE
## Required_Range - 850 Days
SQL> select to_date('11-DEC-2015','DD-MON-YYYY')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') req_range
2 from dual;
old 1: select to_date('11-DEC-2015','DD-MON-YYYY')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') req_range
new 1: select to_date('11-DEC-2015','DD-MON-YYYY')-to_date('13-AUG-2013 00:00:00','DD-MON-YYYY HH24:MI:SS') req_range
REQ_RANGE
----------
850
## Available_Range - 1095 Days
SQL> select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') avl_range
2 from dual;
old 1: select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') avl_range
new 1: select to_date('12-AUG-2016 00:00:00','DD-MON-YYYY HH24:MI:SS')-to_date('13-AUG-2013 00:00:00','DD-MON-YYYY HH24:MI:SS') avl_range
AVL_RANGE
----------
1095
## Selectivity - Required_Range/Available_Range
SQL> select &&r_range/&&a_range Selectivity from dual;
old 1: select &&r_range/&&a_range Selectivity from dual
new 1: select 850/ 1095 Selectivity from dual
SELECTIVITY
-----------
.776255708
## Assumed Cardinality
SQL> select round(&&selective*800000,0) cardinality from dual;
old 1: select round(&&selective*800000,0) cardinality from dual
new 1: select round(.776255708*800000,0) cardinality from dual
CARDINALITY
-----------
621005
SQL> select cust_id, amount_sold, promo_id from test_selectivity
2 where time_id<to_date('11-DEC-2015','DD-MON-YYYY');
620764 rows selected.
Elapsed: 00:00:06.06
Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 621K| 12M| 1125 (23)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST_SELECTIVITY | 621K| 12M| 1125 (23)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TIME_ID"<TO_DATE(' 2015-12-11 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
Assumption – 621k Rows and Actual – 620k Rows (Nearly Accurate).
Next, the query with Greater Than Predicate and the SELECTIVITY will be again Required_Range/Available_Range. The difference, in this case, will be that High_Value will be used to calculate the Required_Range.
## GREATER THAN PREDICATE
## Required_Range - 245 Days
SQL> select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('11-DEC-2015','DD-MON-YYYY') req_range
2 from dual;
old 1: select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('11-DEC-2015','DD-MON-YYYY') req_range
new 1: select to_date('12-AUG-2016 00:00:00','DD-MON-YYYY HH24:MI:SS')-to_date('11-DEC-2015','DD-MON-YYYY') req_range
REQ_RANGE
----------
245
## Available_Range - 1095 Days
SQL> select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') avl_range
2 from dual;
old 1: select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') avl_range
new 1: select to_date('12-AUG-2016 00:00:00','DD-MON-YYYY HH24:MI:SS')-to_date('13-AUG-2013 00:00:00','DD-MON-YYYY HH24:MI:SS') avl_range
AVL_RANGE
----------
1095
## Selectivity - Required_Range/Available_Range
SQL> select &&r_range/&&a_range Selectivity from dual;
old 1: select &&r_range/&&a_range Selectivity from dual
new 1: select 245/ 1095 Selectivity from dual
SELECTIVITY
-----------
.223744292
## Assumed Cardinality
SQL> select round(&&selective*800000,0) cardinality from dual;
old 1: select round(&&selective*800000,0) cardinality from dual
new 1: select round(.223744292*800000,0) cardinality from dual
CARDINALITY
-----------
178995
SQL> set autot trace
SQL> select cust_id, amount_sold, promo_id from test_selectivity
2 where time_id>to_date('11-DEC-2015','DD-MON-YYYY');
178532 rows selected.
Elapsed: 00:00:02.01
Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 178K| 3670K| 1099 (21)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST_SELECTIVITY | 178K| 3670K| 1099 (21)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TIME_ID">TO_DATE(' 2015-12-11 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
Assumption – 178k Rows and Actual – 178k Rows (Accurate).
For Time_ID Column, the expected and actual cardinality were nearly accurate. Let’s shift our focus to the other column – AMOUNT_SOLD. We will run similar three queries – Equality, Less Than and Greater Than.
Before we execute the queries against this column, lets check the statistics (Density, Low_Value and High_Value).
## FOR AMOUNT_SOLD COLUMN select column_name, num_distinct, num_nulls, density, histogram from dba_tab_columns where owner='SCOTT' and table_name='TEST_SELECTIVITY' and column_name='AMOUNT_SOLD'; COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM ------------------ ------------ ---------- ---------- -------------------- AMOUNT_SOLD 636 0 .001572327 NONE with function get_number(n_raw in raw) return number as l_number number; begin dbms_stats.convert_raw_value(n_raw,l_number); return l_number; end; select column_name, get_number(low_value) lo_value, get_number(high_value) hi_value from dba_tab_columns where owner='SCOTT' and table_name='TEST_SELECTIVITY' and column_name='AMOUNT_SOLD'; / COLUMN_NAME LO_VALUE HI_VALUE ------------------ ---------- ---------- AMOUNT_SOLD 6 1783
All the calculation are same for this column as well.
## EQUALITY PREDICATE
SQL> select 1/&&ndv Selectivity from dual;
old 1: select 1/&&ndv Selectivity from dual
new 1: select 1/ 636 Selectivity from dual
SELECTIVITY
-----------
.001572327
SQL> select round(&&selective*800000,0) cardinality from dual;
old 1: select round(&&selective*800000,0) cardinality from dual
new 1: select round(.001572327*800000,0) cardinality from dual
CARDINALITY
-----------
1258
SQL> set autot trace
SQL> select cust_id, amount_sold, promo_id from test_selectivity
2 where amount_sold=1500;
122 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1258 | 16354 | 1136 (24)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST_SELECTIVITY | 1258 | 16354 | 1136 (24)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("AMOUNT_SOLD"=1500)
Assumption – 1258 Rows and Actual – 122 Rows (Out by 10 times).
## LESS THAN PREDICATE
## Required_Range
SQL> select (1500-&&min_value) req_range from dual;
old 1: select (1500-&&min_value) req_range from dual
new 1: select (1500- 6) req_range from dual
REQ_RANGE
----------
1494
## Available_Range
SQL> select (&&max_value-&&min_value) avl_range from dual;
old 1: select (&&max_value-&&min_value) avl_range from dual
new 1: select ( 1783- 6) avl_range from dual
AVL_RANGE
----------
1777
## Selectivity - Required_Range/Available_Range
SQL> select &&r_range/&&a_range Selectivity from dual;
old 1: select &&r_range/&&a_range Selectivity from dual
new 1: select 1494/ 1777 Selectivity from dual
SELECTIVITY
-----------
.840742825
## Assumed Cardinality
SQL> select round(&&selective*800000,0) Cardinality from dual;
old 1: select round(&&selective*800000,0) Cardinality from dual
new 1: select round(.840742825*800000,0) Cardinality from dual
CARDINALITY
-----------
672594
SQL> set autot trace
SQL> select cust_id, amount_sold, promo_id from test_selectivity
2 where amount_sold<1500;
791950 rows selected.
Elapsed: 00:00:07.57
Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 672K| 8538K| 1136 (24)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST_SELECTIVITY | 672K| 8538K| 1136 (24)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("AMOUNT_SOLD"<1500)
Assumption – 672k Rows and Actual – 791k Rows (Inaccurate).
## GREATER THAN PREDICATE
## Required_Range
SQL> select (&&max_value-1500) req_range from dual;
old 1: select (&&max_value-1500) req_range from dual
new 1: select ( 1783-1500) req_range from dual
REQ_RANGE
----------
283
## Available_range
SQL> select (&&max_value-&&min_value) avl_range from dual;
old 1: select (&&max_value-&&min_value) avl_range from dual
new 1: select ( 1783- 6) avl_range from dual
AVL_RANGE
----------
1777
## Selectivity - Required_Range/Available_Range
SQL> select &&r_range/&&a_range Selectivity from dual;
old 1: select &&r_range/&&a_range Selectivity from dual
new 1: select 283/ 1777 Selectivity from dual
SELECTIVITY
-----------
.159257175
## Assumed Cardinality
SQL> select round(&&selective*800000,0) Cardinality from dual;
old 1: select round(&&selective*800000,0) Cardinality from dual
new 1: select round(.159257175*800000,0) Cardinality from dual
CARDINALITY
-----------
127406
SQL> select cust_id, amount_sold, promo_id from test_selectivity
2 where amount_sold>1500;
7928 rows selected.
Elapsed: 00:00:00.12
Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 127K| 1617K| 1136 (24)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST_SELECTIVITY | 127K| 1617K| 1136 (24)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("AMOUNT_SOLD">1500)
Assumption – 127k Rows and Actual – 7928 Rows (Significantly Out).
Finally, we move to our last column of interest i.e. PROMO_ID. The distinct values in this column are 4 and the data distribution is as under:
## FOR PROMO_ID Column
select promo_id,
count(*) cnt,
round(ratio_to_report(count(*)) over()*100,2) "%age"
from test_selectivity
group by promo_id
order by 2;
PROMO_ID CNT %age
---------- ---------- ----------
33 2074 .26
351 2245 .28
350 17978 2.25
999 777703 97.21
select column_name,
num_distinct,
num_nulls,
density,
histogram
from dba_tab_columns
where owner='SCOTT'
and table_name='TEST_SELECTIVITY'
and column_name='PROMO_ID'
order by 1;
COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM
------------------ ------------ ---------- ---------- --------------------
PROMO_ID 4 0 .25 NONE
For this column, the we will execute 4 queries and each of these will be EQUALITY Predicates. For Equality Predicates, the calculation for SELECTIVITY is simple, which is 1/NDV or DENSITY. From DBA_TAB_COLUMNS, we can see that the DENSITY for this column is 0.25 (1/4).
## FOR PROMO_ID
## Selectivity
SQL> select 1/&&ndv selectivity from dual;
old 1: select 1/&&ndv selectivity from dual
new 1: select 1/ 4 selectivity from dual
SELECTIVITY
-----------
.25
## Assumed Cardinality
SQL> select round(&&selective*800000,0) cardinality from dual;
old 1: select round(&&selective*800000,0) cardinality from dual
new 1: select round( .25*800000,0) cardinality from dual
CARDINALITY
-----------
200000
## VALUE 999
SQL> select cust_id, amount_sold, promo_id from test_selectivity where promo_id=999;
777703 rows selected.
Elapsed: 00:00:06.89
Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 200K| 2539K| 1112 (22)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST_SELECTIVITY | 200K| 2539K| 1112 (22)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("PROMO_ID"=999)
Assumption – 200k Rows and Actual – 777k Rows (Out by 4 time).
## Value 350
SQL> select cust_id, amount_sold, promo_id from test_selectivity where promo_id=350;
17978 rows selected.
Elapsed: 00:00:00.27
Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 200K| 2539K| 1112 (22)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST_SELECTIVITY | 200K| 2539K| 1112 (22)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("PROMO_ID"=350)
Assumption – 200k Rows and Actual – 17978 Rows (Significantly Out).
## VALUE 33
SQL> select cust_id, amount_sold, promo_id from test_selectivity where promo_id=33;
2074 rows selected.
Elapsed: 00:00:00.22
Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 200K| 2539K| 1112 (22)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST_SELECTIVITY | 200K| 2539K| 1112 (22)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("PROMO_ID"=33)
Assumption – 200k Rows and Actual – 2074 Rows (Significantly Out).
## VALUE 351
SQL> select cust_id, amount_sold, promo_id from test_selectivity where promo_id=351;
2245 rows selected.
Elapsed: 00:00:00.11
Execution Plan
----------------------------------------------------------
Plan hash value: 4083831454
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 200K| 2539K| 1112 (22)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST_SELECTIVITY | 200K| 2539K| 1112 (22)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("PROMO_ID"=351)
Assumption – 200k Rows and Actual – 2245 Rows (Significantly Out).
Summary – So far
- TIME_ID – Assumptions v/s Actuals √
- AMOUNT_SOLD – Assumptions v/s Actuals Χ
- PROMO_ID – Assumptions v/s Actuals Χ
Let’s make a simple change in the TIME_ID column. For this, I will create another table, which will be a replica of TEST_SELECTIVITY. We will make this change in the new table, so that, we do not disturb the Original Table.
create table test_selectivity_m as
select * from test_selectivity;
update test_selectivity_m set time_id=to_date('31-Dec-2050','DD-MON-YYYY')
where rownum=1;
exec dbms_stats.gather_table_stats(user,'TEST_SELECTIVITY_M');
select table_name, num_rows, blocks, partitioned from dba_tables
where table_name in ('TEST_SELECTIVITY','TEST_SELECTIVITY_M');
TABLE_NAME NUM_ROWS BLOCKS PAR
------------------------------ ---------- ---------- ---
TEST_SELECTIVITY_M 800000 3478 NO
TEST_SELECTIVITY 800000 3478 NO
I created another table TEST_SELECTIVITY_M and updated a single row with a future date i.e.31st December 2050. Lets see, whether this minor change has any impact on the Optimizer Assumptions v/s Actuals.
with function get_date(n_raw in raw) return date as l_date date; begin dbms_stats.convert_raw_value(n_raw,l_date); return l_date; end; select column_name, num_distinct, num_nulls, density, histogram, get_date(low_value) lo_value, get_date(high_value) hi_value from dba_tab_columns where owner='SCOTT' and table_name='TEST_SELECTIVITY_M' and data_type='DATE' order by 1; / COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM LO_VALUE HI_VALUE ------------------------------ ------------ ---------- ---------- -------------------- -------------------- -------------------- TIME_ID 1097 0 .000911577 NONE 13-AUG-2013 00:00:00 31-DEC-2050 00:00:00
Now, lets run the three queries on TIME_ID column against this table and see the results.
## EQUALITY PREDICATE
## Selectivity
SQL> select 1/&&ndv Selectivity from dual;
old 1: select 1/&&ndv Selectivity from dual
new 1: select 1/ 1097 Selectivity from dual
SELECTIVITY
-----------
.000911577
## Assumed Cardinality
SQL> select round(&&selective*800000,0) cardinality from dual;
old 1: select round(&&selective*800000,0) cardinality from dual
new 1: select round(.000911577*800000,0) cardinality from dual
CARDINALITY
-----------
729
SQL> set autot trace
SQL> select cust_id, promo_id, amount_sold from test_selectivity_m
2 where time_id=to_date('11-DEC-2015','DD-MON-YYYY');
704 rows selected.
Elapsed: 00:00:00.57
Execution Plan
----------------------------------------------------------
Plan hash value: 3843949181
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 729 | 15309 | 1088 (20)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST_SELECTIVITY_M | 729 | 15309 | 1088 (20)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TIME_ID"=TO_DATE(' 2015-12-11 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
Assumption – 729 Rows and Actual – 704 Rows (Nearly Accurate).
## LESS THAN PREDICATE
## Required_Range - 850 Days
SQL> select to_date('11-DEC-2015','DD-MON-YYYY')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') req_range
2 from dual;
old 1: select to_date('11-DEC-2015','DD-MON-YYYY')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') req_range
new 1: select to_date('11-DEC-2015','DD-MON-YYYY')-to_date('13-AUG-2013 00:00:00','DD-MON-YYYY HH24:MI:SS') req_range
REQ_RANGE
----------
850
## Available_Range - 13654 Days
SQL> select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') avl_range
2 from dual;
old 1: select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') avl_range
new 1: select to_date('31-DEC-2050 00:00:00','DD-MON-YYYY HH24:MI:SS')-to_date('13-AUG-2013 00:00:00','DD-MON-YYYY HH24:MI:SS') avl_range
AVL_RANGE
----------
13654
## Selectivity - Required_Range/Available_Range
SQL> select &&r_range/&&a_range Selectivity from dual;
old 1: select &&r_range/&&a_range Selectivity from dual
new 1: select 850/ 13654 Selectivity from dual
SELECTIVITY
-----------
.06225282
## Assumed Cardinality
SQL> select round(&&selective*800000,0) cardinality from dual;
old 1: select round(&&selective*800000,0) cardinality from dual
new 1: select round( .06225282*800000,0) cardinality from dual
CARDINALITY
-----------
49802
SQL> select cust_id, promo_id, amount_sold from test_selectivity_m
2 where time_id<to_date('11-DEC-2015','DD-MON-YYYY');
620764 rows selected.
Elapsed: 00:00:06.41
Execution Plan
----------------------------------------------------------
Plan hash value: 3843949181
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 49802 | 1021K| 1091 (21)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST_SELECTIVITY_M | 49802 | 1021K| 1091 (21)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TIME_ID"<TO_DATE(' 2015-12-11 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
Assumption – 49k Rows and Actual – 620k Rows (Significantly Out).
## GREATER THAN PREDICATE
## Required_Range - 12804 Days
SQL> select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('11-DEC-2015','DD-MON-YYYY') req_range
2 from dual;
old 1: select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('11-DEC-2015','DD-MON-YYYY') req_range
new 1: select to_date('31-DEC-2050 00:00:00','DD-MON-YYYY HH24:MI:SS')-to_date('11-DEC-2015','DD-MON-YYYY') req_range
REQ_RANGE
----------
12804
## Available_range - 13654 Days
SQL> select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') avl_range
2 from dual;
old 1: select to_date('&&max_value','DD-MON-YYYY HH24:MI:SS')-to_date('&&min_value','DD-MON-YYYY HH24:MI:SS') avl_range
new 1: select to_date('31-DEC-2050 00:00:00','DD-MON-YYYY HH24:MI:SS')-to_date('13-AUG-2013 00:00:00','DD-MON-YYYY HH24:MI:SS') avl_range
AVL_RANGE
----------
13654
## Selectivity - Required_range/Available_range
SQL> select &&r_range/&&a_range Selectivity from dual;
old 1: select &&r_range/&&a_range Selectivity from dual
new 1: select 12804/ 13654 Selectivity from dual
SELECTIVITY
-----------
.93774718
## Assumed Cardinality
SQL> select round(&&selective*800000,0) cardinality from dual;
old 1: select round(&&selective*800000,0) cardinality from dual
new 1: select round( .93774718*800000,0) cardinality from dual
CARDINALITY
-----------
750198
SQL> select cust_id, promo_id, amount_sold from test_selectivity_m
2 where time_id>to_date('11-DEC-2015','DD-MON-YYYY');
178532 rows selected.
Elapsed: 00:00:02.15
Execution Plan
----------------------------------------------------------
Plan hash value: 3843949181
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 750K| 15M| 1133 (24)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST_SELECTIVITY_M | 750K| 15M| 1133 (24)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TIME_ID">TO_DATE(' 2015-12-11 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
Assumption – 750k Rows and Actual – 178k Rows (Out by 4 Times).
Final Summary:
- TIME_ID – Assumptions v/s Actuals √
- TIME_ID – Assumptions v/s Actuals (minor change) Χ
- AMOUNT_SOLD – Assumptions v/s Actuals Χ
- PROMO_ID – Assumptions v/s Actuals Χ
A minor change in the data had changed our Summary on Time_ID column. In case of our TEST_SELECTIVITY Table, the Optimizer estimation for Time_ID was nearly accurate, whereas, for other 2 columns, it was way out. What could be the reason ? Remember, accurate Selectivity and Cardinality is critical as it can have an impact on Access Path and Access Order respectively. Any discrepancy can cause a sub-optimal plan. Optimizer is a piece of code and completely depends on the Statistics that we gather and provide to it as an input. Therefore, the solution to these discrepancies has to be with us. In the next blog, which will be Part III of this series, we will cover the problem and the solution to fix this discrepancy, but that fix will cause problems for few cases and we shall cover those as well.
Hello Vivek sir,
Amazingly demonstrated that how the optimizer calculates the cardinality in the plan (which I always wanted to know) thanks you for that.
Had one minor doubt though that in the blog while finding the low and high value for TEST_SELECTIVITY_M table the predicate table_name in the select query of DBA_tab_columns is (table_name=’TEST_SELECTIVITY’)
Shouldn’t it be TEST_SELECTIVITY_M
Also,
While updating the table TEST_SELECTIVITY_M , you’ve used (
where rownum exec dbms_stats.gather_table_stats(user,’TEST_SELECTIVITY_M’);) , I assume this gathers stats while updating each row on the table so no need to gather stats on the table again?
And, i was wondering will generating histogram help in this case to solve the cardinality estimation ?
Yes…My mistake.it should be TEST_SELECTIVITY_M..have modified it
Vivek sir,
where rownum exec dbms_stats.gather_table_stats(user,’ TEST_SELECTIVITY_M’);
This structure can you please explain, shed some light on it please. As mentioned the update is been done on only one row. How this limits to only one row?
Where it comes to mind using rownum <=1 can be the predicate to update one row, but this new structure of using GATHER_TABLE_STATS how it will limit the update to one row in where clause along with rownum? And how the stats of table will update.
Thank you for sharing your knowledge.
Regards,
Paras
I tried it in 11g, but didn’t work for me , getting an “ORA-00920: invalid relational operator” [for exec]. Not sure what i’m doing wrong.
Thanks 🙂
I have rectified the issue here. It was an issue with html replacing the predicates.
Hi vivek,
Thank you so much for the blog , Excellent Demonstration. I had so many doubts when i listen to your session at HYD aioug . But after reading this blog Part I and Part II many of them were clarified .
Waiting for Next Part III. Hope very soon.
Thanks once again.
I thought so too 🙂 later tried to delete my stupid comment but couldn’t do so 🙂
Pingback: Optimizer – Part III (Frequency & Height Balanced) | Real Life Database / SQL Experiences : An Oracle Blog from Vivek Sharma
Pingback: Optimizer – Part IV (12c Enhancements) | Real Life Database / SQL Experiences : An Oracle Blog from Vivek Sharma