External Partition ! Oracle Autonomous Database
July 4, 2022 1 Comment
Autonomous Database was released around March 2018. The first was a Data Warehouse Implementation that supports business Intelligence and Analytical Workload. It was Autonomous Data Warehouse. Next, the transaction workload was added to this family, called as Autonomous Transaction Processing followed by Autonomous JSON and Autonomous APEX.
One of the coolest feature of Autonomous Data Warehouse (ADW) is easy integration with Data Lake, which means, while we can store our data natively within our Autonomous Database – for faster processing, this feature allows us to store and access our data from an external storage. On Cloud, when we talk about external storage, we usually refer to an Object Storage. This is by way of creating an External Table. In this, the Table Metadata is stored in the Data Dictionary and the actual data is stored externally. This allows us to save on Storage cost. Refer to Oracle Documentation for more details on this feature.
External Table can be Non-Partitioned, Partitioned and Hybrid Partitioned. As with Oracle Partitioning that improves Query performance by pruning the partitions that are not needed for our Query. This same improvement applies to External Tables as well. With Partitioned External Table, you get the benefit of Partition Pruning thus scanning only required external data. Remember, Queries on External Data will not be as fast as queries on database tables. However, if I have an Archived Table that is in-frequently accessed and is not performance sensitive, I can always move these to External Storage. And, to improve the query performance on External Storage, External Partitioning is highly recommended. As mentioned earlier, the other benefit of this feature is COST. Hybrid Partitioning is basically mix of both i.e. I can have few historical partitions on an External Storage and latest performance sensitive data within the database.
In this blog, I will talk about External Partition Tables, and subsequent blogs would take this topic further.
Assuming you have a large table with around 5 year of Data. Users frequently query latest data. The table is currently non-partitioned. As the data is growing, I want to ensure consistent performance. How do I do that?
There are multiple strategies, like :
- Archive the Data to a New Table and Purge it from the Original Table. Reports requiring latest data would run on the Original Table and Reports on the Historical Data would access the Archived Table.
- Partition the Table and leverage the benefits of Partition Pruning. This is the most common strategy.
My topic for today is on Strategy#1 i.e. Archiving to a New Table. This solution can also be applied if a table has only historical data. Anyways, many customers do implement strategy#1 and move the table to some other database, called as Archival Database. With Autonomous Database, We don’t need to do that and we will see this in action. My subsequent blogs will cover both the strategies that can be optimized further on Autonomous Databases.
For this demonstration, I will use LINEORDER table which is under SSB (Sample Star Benchmark) Schema. This schema is created by-default when you provision an Autonomous Database. It’s a huge table. Therefore, I will create another table as Sample 2% and will use it as my Source table. The very first step would be to identify the partition strategy. In this case, monthly partition on LO_ORDERDATE column is a first choice. However, the other requirement is to have sub-partition it on LO_TAX column, based on Tax Codes. So, what we need here is an External Table with Monthly partitions and Sub-Partitions on LO_TAX.
OT OWNER NAME NUM_ROWS BLOCKS --- -------------------- ------------------------------ ---------- ---------- 1T ADMIN LINEORDER 119993317 644744
Next step is to move the data on to the External Storage (an Object Storage). This data will be moved keeping our partitioning strategy in mind. Before I move to the next step, it is important to know that Autonomous Database support many different file formats and structure. On Autonomous Database you can create partitioned external tables from HIVE style partitioned data or from simple folder partitioned data stored on your Cloud Object Store. For example, a Hive format partitioned file will look like :
table/partition1=partition1_value/partition2=partition2_value/data_file.csv
The advantage of this approach is the partitioned information is available in the data file path name itself and therefore, the partitioned columns need not be part of the actual data. Partition pruning is done directly based on the path name. So, in our case the path name would be :
LINEORDER/MONTH_YEAR=JUN1993/TAX=2/filename.gz
In this case, the files will be compressed so that these occupy less space. MONTH_YEAR is a derived column, but TAX is LO_TAX column of LINEORDER Table. Since these columns and column values are part of my file path name, my CSV files need not contain these columns and column values. With Simple Folder partitioned data, the columns and column values of the partitioned columns have to be in the data files. So, in this case since I am using HIVE style partitioned data, I will dynamically create folders with column name and column values.
In order to access my Object Storage, the very first mandatory requirements is to create a credentials. Alternatively, you can use OCI Resource Principals. I am assuming you have all the required credentials. If not, please follow the documentation.
Next, I will use dbms_cloud.export_data api to export the data into compressed CSV files on to my Object Storage. To speed up the process, I created a CONTROL_TABLE with distinct MONYYYY values from LINEORDER and then scheduled a pl/sql block from three different sessions. This pl/sql block will export the data for a specified Month and a Tax Code. The data inserted into the control table has a condition to_char(lo_orderdate,’MONYYYY’)!=’AUG1998′. I will explain this later.
create table control_table (
partition_name varchar2(100),
partition_number number,
status char(1)
);
insert into control_table(partition_name, status)
SELECT DISTINCT TO_CHAR(LO_ORDERDATE,'MONYYYY') MONTH_YEAR, 'N' FROM LINEORDER WHERE to_char(lo_orderdate,'MONYYYY')!='AUG1998';
COMMIT;
UPDATE control_table set partition_number=rownum;
commit;
DECLARE
FILE_URI_O VARCHAR2(1000);
FILE_URI_F VARCHAR2(128);
FILE_URI VARCHAR2(1000);
L_STATEMENT VARCHAR2(1000);
BEGIN
FOR OUTER_LOOP IN (SELECT PARTITION_NAME MONTH_YEAR, PARTITION_NUMBER FROM CONTROL_TABLE WHERE PARTITION_NUMBER>=1 AND PARTITION_NUMBER<=25 AND STATUS='N' ORDER BY PARTITION_NUMBER)
LOOP
FOR INNER_LOOP IN (SELECT DISTINCT LO_TAX TAX FROM LINEORDER WHERE TO_CHAR(LO_ORDERDATE,'MONYYYY')=OUTER_LOOP.MONTH_YEAR)
LOOP
L_STATEMENT:='select lo_orderkey, lo_linenumber, lo_custkey, lo_partkey, lo_suppkey, lo_orderdate, lo_orderpriority, lo_shippriority, lo_quantity, lo_extendedprice, lo_ordtotalprice, lo_discount, lo_revenue, lo_supplycost, lo_commitdate, lo_shipmode from lineorder where to_char(lo_orderdate,''MONYYYY'')='||''''||OUTER_LOOP.MONTH_YEAR||''''||' and lo_tax='||INNER_LOOP.TAX||'';
FILE_URI_F:='MONTH_YEAR='||OUTER_LOOP.MONTH_YEAR||'/TAX='||INNER_LOOP.TAX||'/';
FILE_URI_O:='https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_tenancy/b/External_Partitions/o/LINEORDER/'||FILE_URI_F;
FILE_URI:=FILE_URI_O||OUTER_LOOP.MONTH_YEAR||'_'||INNER_LOOP.TAX;
DBMS_CLOUD.EXPORT_DATA(CREDENTIAL_NAME => 'MY_CREDENTIAL', FILE_URI_LIST => FILE_URI, QUERY=>L_STATEMENT, FORMAT=> JSON_OBJECT('TYPE' VALUE 'CSV', 'quote' value '"', 'COMPRESSION' VALUE 'GZIP', 'MAXFILESIZE' VALUE '20485760'));
END LOOP;
UPDATE CONTROL_TABLE SET STATUS='Y' WHERE PARTITION_NUMBER=OUTER_LOOP.PARTITION_NUMBER;
COMMIT;
END LOOP;
END;
/
L_STATEMENT variable has the query that is used to export the data. I have the required columns in the SELECT Clause alongwith the relevant WHERE predicate. Each run of the statement will create a CSV file for a required partition and sub-partition. Further, I dynamically create the folders. If the folders doesn’t exist, the api will create it for us else it will create the relevant file within it’s folder. The folder name has the partition column name and it’s value. For example, I will have folder names as:

Now, I have all the data on to my Object Storage and these are HIVE Format. I have compressed CSV files for each Month and Tax. I will now create an External Partition Table. This is done using dbms_cloud.create_external_part_table api.
BEGIN
DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE (
table_name => 'LINEORDER_EXT_PART',
credential_name => 'MY_CREDENTIAL',
file_uri_list => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_tenancy/b/External_Partitions/o/LINEORDER/*.gz',
column_list =>
'LO_ORDERKEY NUMBER,
LO_LINENUMBER number,
LO_CUSTKEY NUMBER,
LO_PARTKEY NUMBER,
LO_SUPPKEY NUMBER,
LO_ORDERDATE DATE,
LO_ORDERPRIORITY CHAR(15),
LO_SHIPPRIORITY CHAR(1),
LO_QUANTITY NUMBER,
LO_EXTENDEDPRICE NUMBER,
LO_ORDTOTALPRICE NUMBER,
LO_DISCOUNT NUMBER,
LO_REVENUE NUMBER,
LO_SUPPLYCOST NUMBER,
LO_COMMITDATE NUMBER,
LO_SHIPMODE CHAR(10),
MONTH_YEAR VARCHAR2(12),
TAX NUMBER',
format => '{"type":"csv", "skipheaders":0, "dateformat":"DD-MON-YYYY HH24:MI:SS", "compression":"gzip", "partition_columns":["MONTH_YEAR","TAX"]}');
END;
/
This creates our external partition table. The data resides on our Object Storage and the metadata is stored in the data dictionary. Now, you can run your queries against this table and the data will be retrieved from the Object Storage.
Time to run some queries and check the run time plan. Run time pan will help us validate whether the queries are able to leverage the benefits of partition pruning. We will join this external partitioned table to another table with data stored natively and from the run time plan, we see that Adaptive Optimization kicks in as well.
# Query 1
select lo_custkey, lo_orderdate, lo_shipmode, lo_quantity from lineorder_ext_part where month_year='AUG1993' and tax=2 and lo_orderkey=3517992324;
LO_CUSTKEY LO_ORDERDATE LO_SHIPMOD LO_QUANTITY
---------- -------------------- ---------- -----------
29663183 10-AUG-1993 00:00:00 MAIL 9
Elapsed: 00:00:01.75
SQL> select * from table(dbms_xplan.display_cursor(format=>'ALLSTATS LAST -bytes +predicate +note +cost +adaptive'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID dmak3yzpw71a2, child number 0
-------------------------------------
select lo_custkey, lo_orderdate, lo_shipmode, lo_quantity from
lineorder_ext_part where month_year='AUG1993' and tax=2 and
lo_orderkey=3517992324
Plan hash value: 1982497926
--------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | Cost (%CPU)|
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 3 (100)|
| 1 | PARTITION LIST SINGLE | | 1 | 3 (0)|
|* 2 | EXTERNAL TABLE ACCESS FULL| LINEORDER_EXT_PART | 1 | 3 (0)|
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("MONTH_YEAR"='AUG1993' AND "TAX"=2 AND
"LO_ORDERKEY"=3517992324))
# Query 2
select lo_custkey, c_name, lo_orderdate, lo_shipmode, lo_quantity
from lineorder_ext_part,
vivek.customer
where month_year='DEC1997'
and tax=0
and lo_orderkey=3187415872
and c_custkey = lo_custkey;
LO_CUSTKEY C_NAME LO_ORDERDATE LO_SHIPMOD LO_QUANTITY
---------- ------------------------- -------------------- ---------- -----------
11640322 Customer#011640322 04-DEC-1997 00:00:00 FOB 6
Elapsed: 00:00:02.27
SQL> SQL> @apc
SQL> set echo on
SQL> set lines 200
SQL> set pages 1000
SQL> select * from table(dbms_xplan.display_cursor(format=>'ALLSTATS LAST -bytes +predicate +note +cost +adaptive'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 6pn53c37c2dfb, child number 0
-------------------------------------
select lo_custkey, c_name, lo_orderdate, lo_shipmode, lo_quantity from
lineorder_ext_part, vivek.customer where month_year='DEC1997' and
tax=0 and lo_orderkey=3187415872 and c_custkey = lo_custkey
Plan hash value: 1739419959
--------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | Cost (%CPU)|
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 5 (100)|
|- * 1 | HASH JOIN | | 1 | 5 (0)|
| 2 | NESTED LOOPS | | 1 | 5 (0)|
| 3 | NESTED LOOPS | | 1 | 5 (0)|
|- 4 | STATISTICS COLLECTOR | | | |
| 5 | PARTITION LIST SINGLE | | 1 | 3 (0)|
| * 6 | EXTERNAL TABLE ACCESS FULL| LINEORDER_EXT_PART | 1 | 3 (0)|
| * 7 | INDEX UNIQUE SCAN | CUSTOMER_PK | 1 | 1 (0)|
| 8 | TABLE ACCESS BY INDEX ROWID | CUSTOMER | 1 | 2 (0)|
|- 9 | TABLE ACCESS FULL | CUSTOMER | 1 | 2 (0)|
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("C_CUSTKEY"="LO_CUSTKEY")
6 - filter(("MONTH_YEAR"='DEC1997' AND "TAX"=0 AND
"LO_ORDERKEY"=3187415872))
7 - access("C_CUSTKEY"="LO_CUSTKEY")
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1
- this is an adaptive plan (rows marked '-' are inactive)
- Warning: basic plan statistics not available. These are only collected when:
* hint 'gather_plan_statistics' is used for the statement or
* parameter 'statistics_level' is set to 'ALL', at session or system level
A query against all the sub-partitions or partitions works as well.
SQL> select lo_orderkey, lo_orderdate, lo_shipmode, lo_quantity from lineorder_ext_part where month_year='AUG1993' and lo_custkey=29663183;
LO_ORDERKEY LO_ORDERDATE LO_SHIPMOD LO_QUANTITY
----------- -------------------- ---------- -----------
3517992324 10-AUG-1993 00:00:00 MAIL 9
2251158885 21-AUG-1993 00:00:00 REG AIR 29
Elapsed: 00:00:12.28
SQL> @apc
SQL> set echo on
SQL> set lines 200
SQL> set pages 1000
SQL> select * from table(dbms_xplan.display_cursor(format=>'ALLSTATS LAST -bytes +predicate +note +cost +adaptive'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 5m709tc22ua7u, child number 0
-------------------------------------
select lo_orderkey, lo_orderdate, lo_shipmode, lo_quantity from
lineorder_ext_part where month_year='AUG1993' and lo_custkey=29663183
Plan hash value: 146054153
--------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | Cost (%CPU)|
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 14 (100)|
| 1 | PARTITION LIST ITERATOR | | 7 | 14 (15)|
|* 2 | EXTERNAL TABLE ACCESS FULL| LINEORDER_EXT_PART | 7 | 14 (15)|
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("MONTH_YEAR"='AUG1993' AND "LO_CUSTKEY"=29663183))
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1
- Warning: basic plan statistics not available. These are only collected when:
* hint 'gather_plan_statistics' is used for the statement or
* parameter 'statistics_level' is set to 'ALL', at session or system level
27 rows selected.
From the queries that we executed against this External Partitioned Table, we can clearly see that for the application it doesn’t matter whether the data is fetched from within the database storage or from an External Storage. It is completely transparent to the application. One important point to note here is that these External Tables are READ ONLY and the Data is not managed by the Database. Now, lets add some more data to this External Partition Table. While generating thh CSV files, I excluded the data for AUGUST 1998. I will not generate the CSV files for AUG1998 and we will see whether I am able to query the data.
DECLARE
FILE_URI_O VARCHAR2(1000);
FILE_URI_F VARCHAR2(128);
FILE_URI VARCHAR2(1000);
L_STATEMENT VARCHAR2(1000);
BEGIN
FOR INNER_LOOP IN (SELECT DISTINCT LO_TAX TAX FROM LINEORDER WHERE TO_CHAR(LO_ORDERDATE,'MONYYYY')='AUG1998')
LOOP
L_STATEMENT:='select lo_orderkey, lo_linenumber, lo_custkey, lo_partkey, lo_suppkey, lo_orderdate, lo_orderpriority, lo_shippriority, lo_quantity, lo_extendedprice, lo_ordtotalprice, lo_discount, lo_revenue, lo_supplycost, lo_commitdate, lo_shipmode from lineorder where to_char(lo_orderdate,''MONYYYY'')='||''''||'AUG1998'||''''||' and lo_tax='||INNER_LOOP.TAX||'';
FILE_URI_F:='MONTH_YEAR=AUG1998/TAX='||INNER_LOOP.TAX||'/';
FILE_URI_O:='https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_tenancy/b/External_Partitions/o/LINEORDER/'||FILE_URI_F;
FILE_URI:=FILE_URI_O||'AUG1998_'||INNER_LOOP.TAX;
DBMS_CLOUD.EXPORT_DATA(CREDENTIAL_NAME => 'MY_CREDENTIAL', FILE_URI_LIST => FILE_URI, QUERY=>L_STATEMENT, FORMAT=> JSON_OBJECT('TYPE' VALUE 'CSV', 'quote' value '"', 'COMPRESSION' VALUE 'GZIP', 'MAXFILESIZE' VALUE '20485760'));
END LOOP;
END;
/
Now, let me run a query against this newly added CSV file. While the files are uploaded and available in the correct HIVE format folders, you can see that Oracle could not access the data and the run time plan says PARTITION LIST EMPTY.
SQL> select lo_orderkey, lo_orderdate, lo_shipmode, lo_quantity from lineorder_ext_part where month_year='AUG1998' and tax=6 and lo_custkey=16287940;
no rows selected
Elapsed: 00:00:00.11
SQL> @apc
SQL> set lines 200
SQL> set pages 1000
SQL> select * from table(dbms_xplan.display_cursor(format=>'ALLSTATS LAST -bytes +predicate +note +cost +adaptive'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 5s7wagh8za9bx, child number 1
-------------------------------------
select lo_orderkey, lo_orderdate, lo_shipmode, lo_quantity from
lineorder_ext_part where month_year='AUG1998' and tax=6 and
lo_custkey=16287940
Plan hash value: 3670407871
--------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | Cost (%CPU)|
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 2 (100)|
| 1 | PARTITION LIST EMPTY | | 1 | 2 (0)|
|* 2 | EXTERNAL TABLE ACCESS FULL| LINEORDER_EXT_PART | 1 | 2 (0)|
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("MONTH_YEAR"='AUG1998' AND "TAX"=6 AND
"LO_CUSTKEY"=16287940))
SQL> select count(*) from lineorder_ext_part where month_year='AUG1998';
COUNT(*)
----------
0
As I said earlier, External Tables are READ Only and the Data is not managed by Oracle Database. For this, we need to Refresh the External Table using SYNC_EXTERNAL_PART_TABLE procedure of dbms_cloud. Once done, the newly added CSV files would be considered as a new partition and the data can then be queried.
BEGIN
DBMS_CLOUD.SYNC_EXTERNAL_PART_TABLE(table_name => 'LINEORDER_EXT_PART');
END;
/
SQL> select lo_orderkey, lo_orderdate, lo_shipmode, lo_quantity from lineorder_ext_part where month_year='AUG1998' and tax=6 and lo_custkey=16287940;
LO_ORDERKEY LO_ORDERDATE LO_SHIPMOD LO_QUANTITY
----------- -------------------- ---------- -----------
3004306725 02-AUG-1998 00:00:00 REG AIR 14
Elapsed: 00:00:42.34
SQL> @apc
SQL> set echo on
SQL> set lines 200
SQL> set pages 1000
SQL> select * from table(dbms_xplan.display_cursor(format=>'ALLSTATS LAST -bytes +predicate +note +cost +adaptive'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 5s7wagh8za9bx, child number 1
-------------------------------------
select lo_orderkey, lo_orderdate, lo_shipmode, lo_quantity from
lineorder_ext_part where month_year='AUG1998' and tax=6 and
lo_custkey=16287940
Plan hash value: 917046205
----------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | Cost (%CPU)|
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 2 (100)|
| 1 | PX COORDINATOR | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 2 (0)|
| 3 | PX BLOCK ITERATOR | | 1 | 2 (0)|
|* 4 | EXTERNAL TABLE ACCESS FULL| LINEORDER_EXT_PART | 1 | 2 (0)|
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(("MONTH_YEAR"='AUG1998' AND "TAX"=6 AND
"LO_CUSTKEY"=16287940))
Note
-----
- automatic DOP: Computed Degree of Parallelism is 16 because of degree limit
- Warning: basic plan statistics not available. These are only collected when:
* hint 'gather_plan_statistics' is used for the statement or
* parameter 'statistics_level' is set to 'ALL', at session or system level
31 rows selected.
Elapsed: 00:00:00.08
SQL> select count(*) from lineorder_ext_part where month_year='AUG1998';
COUNT(*)
----------
99688
Elapsed: 00:00:01.44
SQL>
Wow! a cool feature and easy to use. Isn’t it? Last, with the latest enhancement to this feature, two new hidden columns are added to the External Table. These are file$path and file$name. These columns can be queried to check the Source File Path and Name.
SQL> select lo_orderkey, file$path, file$name from lineorder_ext_part where month_year='AUG1993' and tax=2 and lo_custkey=29663183; LO_ORDERKEY FILE$PATH FILE$NAME ----------- -------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------- 3517992324 https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_tenancy/b/External_Partitions/o/LINEORDER/MONTH_YEAR=AUG1993/TAX=2 AUG1993_2_10_20220704T111509Z.csv.gz
This is very easy to implement on Autonomous Database and is a Cost Saving option. Performance insensitive and historical data can be easily moved to an external storage. Accessing this data is as simple as accessing natively stored data. Do try it out and let me know how it works.