Migrate/Move Data to Autonomous AI Database

Assume following scenario’s:

  1. I want to refresh my UAT Database from an existing Production Data.
  2. I want to move an Oracle Database to an Autonomous AI Database. This Oracle Database could be anything – an OnPremise or on any Cloud.
  3. I want to move a Non-Oracle Database to an Autonomous AI Database

For 1 above, Autonomous AI Database provides Database Cloning option. It is just few clicks and the job is done. For 2, there are multiple methods and all of these are documented in Autonomous AI Database Documentation. Each of these methods are based on the migration scenario, database size etc. Few months back, Oracle also released a much awaited Tablespace migration option. For 3, there are multiple migration tools available that takes the data from Non-Oracle Sources, converts the data into Oracle format and then loads. All well…

But, what if the scenario is changed a bit i.e.What if, I have to refresh only a subset of Data from Production? Say, my Production is 50 TB and I just need few schema’s that are around 5 TB in size? Cloning an entire 50 TB doesn’t make sense.

One of my customers, a SaaS provider, with 20 to 30 Schema’s, had a requirement to refresh only few schema’s. They were testing few application changes and wanted to test these changes only for these schema’s. Cloning was not a solution here, as they wanted to avoid overriding other schema’s. A new clone and then dropping other schema’s means additional cost, time and efforts. The option for them was to – Take an EXPDP of these Schema’s on to Object Storage and then Import it on UAT. Manual effort and continuous monitoring.

How about a simplified way of achieving this, with an option to stop and resume from where it left? This blog is about a feature released few days back that achieves this, with just few steps. No manual efforts required. No continuous monitoring required and top of it, resilient to failures. What else do we need? Click here for the Documentation Link to know more about this feature.

In this blog, I will walk you through the steps that achieve this with just few steps. Underneath, it uses DBMS_PARALLEL_EXECUTE API to distribute an object into multiple chunks, and process these chunks individually in parallel sessions simulating parallelism. In the past, I have written many blogs about Manual Parallelism using DBMS_PARALLEL_EXECUTE. I call it manual parallelism, as I have a control on the number of threads, I can increase or reduce the number of threads at run-time and I also have an option to just process a failed thread instead of processing an entire object again. Just search on dbms_parallel_execute or manual parallelism on this blog search and you would find much of my challenging tasks accomplished using this API.

With Autonomous AI Database, you just need to run few steps and internally, all the steps needed to create multiple chunks, scheduling these are done automatically for you. So, lets get into the steps.

In this case, I want to move a Schema called MAKROOCT25 from one of my Autonomous AI Database to another Autonomous Database. The schema is in ATPAI Database (Source) and is to be moved to ATPMAY2026 Database (Target).

The steps are :

  1. On my Target, I created a Directory called ATPAI_WALLET. I will download cwallet.sso file of ATPAI Database into this directory.
  2. Then, created a credential that stores the Username and Password of a privileged user from my Source. My process (on Target) will connect to this User via Database Link, to process the chunks.
  3. Finally, we then run an API DBMS_CLOUD_IMPORT and pass the relevant parameters. Internally, this API will create a Database Link for us, connect to specified service, read the remote object(s) and then load the data into our Target.

From Oracle to Oracle, the process will create the required Indexes and Constraints as well.

create directory ATPAI_WALLET as ‘ATPAI_WALLET_DIR’;

begin
dbms_cloud.get_object(
credential_name =>’VIVEK_OBJ_CRED’,
object_uri =>'<Object_Storage_URL>/cwallet.sso’,
directory_name=>’ATPAI_WALLET’);
end;
/

SELECT
‘OBJECT_NAME=’ || OBJECT_NAME || CHR(10) ||
‘BYTES=’ || BYTES || CHR(10) ||
‘CREATED=’ || CREATED || CHR(10) ||
‘LAST_MODIFIED’ || LAST_MODIFIED OUTPUT
FROM DBMS_CLOUD.LIST_FILES(‘ATPAI_WALLET’);

OUTPUT

OBJECT_NAME=cwallet.sso
BYTES=6349
CREATED=17-MAY-26 02.35.37.000000 PM +00:00
LAST_MODIFIED17-MAY-26 02.35.37.000000 PM +00:00

Create the Credentials with the Username and Password of the Source Database.

BEGIN
DBMS_CLOUD.CREATE_CREDENTIAL(
credential_name => ‘DB_LINK_ATPAI’,
username => ‘VIVEK’,
password => ‘<User_Password>’);
END;
/

Create an Import Task to Import the data. In this case, I am importing entire MAKROOCT25 Schema from ATPAI to ATPMAY2026.

BEGIN
DBMS_CLOUD_IMPORT.CREATE_IMPORT_TASK(
task_name => ‘MAKROOCT25_4M_ATPAI’,
hostname => ‘host_name_get_it_from_tnsnames.ora’,
port => ‘1522’,
directory_name => ‘ATPAI_WALLET’,
service_name => ‘service_name_get_it_from_tnsnames.ora’,
ssl_server_cert_dn => NULL,
credential_name => ‘DB_LINK_ATPAI’,
schema_list => ‘[“MAKROOCT25”]’
);
END;
/

HostName and ServiceName can be extracted from OCI Console (Under Connections Tab for the Source Database) or from TNSNAMES.ORA.

Once this block DBMS_CLOUD_IMPORT is executed, the API will create a database link MAKROOCT25_4M_ATPAI_DB_LINK, connect to the Username that is provided as a part of DBMS_CLOUD.CREATE_CREDENTIAL and then will schedule an Import Job on Target.

What I have observed is, even if I specify HIGH or MEDIUM Service, the data is read using LOW Service Name and this makes sense. DBMS_PARALLEL_EXECUTE itself creates multiple chunks and processes these chunks using parallel threads. This means, a table is read and loaded by these parallel threads. For each parallel thread, a database link network channel is opened. More the number of CPUs, more the number of parallel threads.

You can monitor the progress from DBA_DATA_IMPORT_TASK_STATUS. Sample output is attached.

SELECT
‘TASK_NAME=’ || TASK_NAME || CHR(10) ||
‘JOB_STATUS=’ || JOB_STATUS || CHR(10) ||
‘SUMMARY=’ || SUMMARY || CHR(10) ||
‘ELLAPSED_TIME’ ||ELLAPSED_TIME || chr(10) ||
‘CURRENTLY_LOADING’ || CURRENTLY_LOADING
FROM DBA_DATA_IMPORT_TASK_STATUS;

TASK_NAME=MAKROOCT25_4M_ATPAI
JOB_STATUS=RUNNING
SUMMARY=
ELLAPSED_TIME0d 0h 0m

TASK_NAME=MAKROOCT25_4M_ATPAI
JOB_STATUS=RUNNING
SUMMARY={“total_schemas”:1,”completed_schemas”:0,”schema_completion_percentage”:
0,”total_tables”:64,”completed_tables”:17,”table_completion_percentage”:26.56,”l
oading_tables”:1,”failed_tables”:0,”chunking_tables”:0,”fetching_stats_tables”:0
}
ELLAPSED_TIME0d 0h 5m 59.963216s
CURRENTLY_LOADING[{“schema”:”MAKROOCT25″,”table”:”GLP_HIST_ACTIVE_SESS_HISTORY”,
“task_name”:”629331387″,”total_chunks”:118,”completed_chunks”:95,”completion_per
centage”:80.51}]

TASK_NAME=MAKROOCT25_4M_ATPAI
JOB_STATUS=RUNNING
SUMMARY={“total_schemas”:1,”completed_schemas”:0,”schema_completion_percentage”:
0,”total_tables”:64,”completed_tables”:28,”table_completion_percentage”:43.75,”l
oading_tables”:1,”failed_tables”:0,”chunking_tables”:0,”fetching_stats_tables”:0
}
ELLAPSED_TIME0d 0h 7m 2.779188s
CURRENTLY_LOADING[{“schema”:”MAKROOCT25″,”table”:”GLP_HIST_SQLTEXT”,”task_name”:
“3497203576”,”total_chunks”:24,”completed_chunks”:0,”completion_percentage”:0}]

TASK_NAME=MAKROOCT25_4M_ATPAI
JOB_STATUS=SUCCEEDED
SUMMARY={“total_schemas”:1,”completed_schemas”:1,”schema_completion_percentage”:
100,”total_tables”:64,”completed_tables”:64,”table_completion_percentage”:100,”l
oading_tables”:0,”failed_tables”:0,”chunking_tables”:0,”fetching_stats_tables”:0
}
ELLAPSED_TIME0d 0h 12m 5.7349288s
CURRENTLY_LOADING[]

In case the job fails or doesn’t start, you can also validate whether the Database Link that was created implicitly by the API is working as expected. For this, you can run the following query. The DBLink is created as <TASKNAME>_DB_LINK.

select sysdate from dual@MAKROOCT25_4M_ATPAI_DB_LINK;

If the job fails in-between, you can check it from


SELECT CHUNK_ID, TASK_OWNER, TASK_NAME, STATUS, START_ROWID, END_ROWID, ERROR_MESSAGE FROM DBA_PARALLEL_EXECUTE_CHUNKS;

The best part of this approach is, you can suspend the job and resume it later. For example, if you are moving some large data and it coincides with some critical batch job, you can suspend it temporarily and resume it later. The job will resume from where it left. It is also resilient to failures, meaning, by any chance if the instance crashes or is terminated, the failed job can be restarted and it will load the data from the point it failed.

I don’t want to make this blog too long, hence would conclude here. The idea here was to post about this brilliant feature and the underneath logic that it uses. Is it restricted to only Autonomous AI Database? Answer is NO. You can very well use this logic on Non-Autonomous Databases as well. I wrote about DBMS_PARALLEL_EXECUTE in the past. You can use it for processing large amount of data, in a controlled, efficient manner and with same resiliency.

As a final note, please note that this feature does more than why I posted. You can Migrate/Move data from Non-Oracle Databases to Autonomous AI Databases as well. Autonomous Database will do all the Data-type conversion. You do not need any migration tool to accomplish this. At the time of writing this blog, this feature supports MySQL, PostgreSQL and Amazon Redshift Databases. I am sure, more Non-Oracle Databases would be added to this list. Read the Documentation for more information about this feature.

This is a personal blog. The thoughts and opinions expressed here are entirely my own and do not necessarily represent those of my employer, team, or any organization associated with me. Any code, scripts, or technical examples are shared for learning purposes and should be used with appropriate caution.

dbms_parallel_execute ! still unused feature of Oracle Database 11g

I still find many developers unaware of dbms_parallel_execute, which is a better and efficient way of distributing the data across multiple chunks. Recently, I came across one such implementation, where the Developers introduced their own way of creating multiple chunks using ROWNUM pseudo column and then distributing these records into required streams. In this case, the Developers intended to spawn 64 Streams. Further, they created an Index on the some of the columns to Update the records that were processed. The drawback of this implementation was :

  1. Block Level Contention for Tables, as each Streams can request and process Blocks from other Streams
  2. Index Block Contention, as multiple Streams would update same Block. A Better way would be to select ROWID and UPDATE based on ROWID.

This Implemention required around 50 Million Rows to be processed in parallel streams of 64. Due to the issues mentioned above, the process took abnormal time and was kept on hold for production movement. Our advice was sought on the Optimization and we could see that the Developers still unware of a fantastic and readymade approach, called dbms_parallel_execute. I had one of my previous Blog on the same topic and wanted them to go through it for some examples. We also explained them the benefit of this approach over their current implementation. They readily accepted this and thought of giving it a try before moving it on production. I was expecting some queries from the Developers, as this was an overall new concept for them and Yes, the first concern raised was over the CHUNK_SIZE=>64 value in dbms_parallel_execute.create_chunks_by_rowid proceedure. The Developers expected this procedure to create 64 Chunks of the entire table, which can then be processed by 64 parallel jobs. In this case, since the table was huge, it created around 5000+ Chunks. This was a bit surprising for them. Further, even if it created more number of chunks, the Challenge here was building of an additional logic to run only 64 Streams at any given point in time. Further, the logic should also take care of the finished jobs and to spawn a new one automatically.

First, let us address and explain the CHUNK_SIZE argument of create_chunks_by_rowid procedure. This value works with BY_ROWS argument. If BY_ROWS is specified as TRUE, the number of chunks are created based on the approximate number of rows divided into each stream. For example, in case of CHUNK_SIZE=>64 and BY_ROWS=>TRUE, each stream will have approximately 64 or less rows. With CHUNK_SIZE=>64 and BY_ROWS=>FALSE, the division is based on BLOCKS, with each stream containing approximately 64 or less blocks. Further, each stream will not span across multiple extents and therefore, if a table has smaller extents, you may see a high count of CHUNKS created (as 5000+ in the case of above mentioned example).

The Start & End ROWID of each of the Chunks are stored in user_parallel_execute_chunks view. Since this is a SYS Object and we do not have much control over this. I usually create my own table, where I Insert the data from USER_PARALLEL_EXECUTE_CHUNKS. Once the data into our own table is Inserted, it is easier to update our table with additional information required for analysis. Let us walk through a simple example.

## Create our own table to track the status of each job.

create table process_status (
process_name            varchar2(30),
job_id                  number,
s_rowid                 rowid,
e_rowid                 rowid,
start_time              date,
end_time                date,
processed_rows          number,
completed_yn            varchar2(1),
status                  varchar2(1000));

## Create a Parallel Job Task
exec  dbms_parallel_execute.create_task('VIVEK_JOB');

## Create Chunks, with the approximate Blocks / Rows (if BY_ROWS=>FALSE, it will distribute on Blocks else on Rows)
exec  dbms_parallel_execute.create_chunks_by_rowid('VIVEK_JOB','FLEX','CH_OD_AGREEMENT',false, 100);

## Insert the Chunks in our process_status table
insert into process_status(process_name, job_id, s_rowid, e_rowid)
select 'VIVEK_JOB', chunk_id, start_rowid, end_rowid from user_parallel_execute_chunks
where task_name='VIVEK_JOB' order by 1;
commit;

## Our Procedure that will run in parallel. We need to pass start_rowid and end_rowid as the Input parameters

create or replace procedure FLEX.vivek_proc (st_rowid in rowid, ed_rowid in rowid) as
l_cnt      number;
begin
update FLEX.process_status set start_time=sysdate, status='RUNNING' 
 where s_rowid=st_rowid;  
 commit;  
 select count(*) into l_cnt from FLEX.CH_OD_AGREEMENT 
 where rowid between st_rowid and ed_rowid;  
 update FLEX.process_status set status='COMPLETED', end_time=sysdate, processed_rows=l_cnt 
 where s_rowid=st_rowid;  
 commit; 
end; 
/

VIVEK_PROC is the procedure that is required to be run in parallel streams. The Input Parameters are START & END ROWID. I have also modified the procedure to update the PROCESS_STATUS table, with the start_time, status and at the completion of the main business logic, the end_time, Processed_rows and status. Further this data can be maintained as a History (by inserting into history table before the next run). Due to this flexibility, I choose creation of PROCESS_STATUS table.

The total number of chunks created by dbms_parallel_execute for my test table was around 3500 and I wanted to spawn only 64 streams. Many Developers chose restricting job_queue_processes to 64 and schedule all the 3500 jobs together. Restricting job_queue_processes to 64 will ensure only 64 jobs running. This is achievable, but would be a very crude way of doing so. Further, this would only work if there are no other jobs scheduled during this time, else it will also be queued. A best option is again using dbms_parallel_execute. There is a procedure called RUN_TASK and an argument PARALLEL_LEVEL limits the number of parallel jobs to be run at any given point of time. No additional logic is required to check for the completion of a job and scheduling of a new job. Oracle will do it for us.

## Run the Task, with number of parallel job queue processes. This is controlled using PARALLEL_LEVEL

begin
      dbms_parallel_execute.run_task
      ( task_name      => 'VIVEK_JOB',
        sql_stmt       => 'begin FLEX.vivek_proc( :start_id, :end_id ); end;',
        language_flag  => DBMS_SQL.NATIVE,
        parallel_level => 64 );
      commit;
end;
/
## Once the Jobs are done, drop the Job Task
exec dbms_parallel_execute.drop_task('VIVEK_JOB');

DROP_TASK will remove the entries from USER_PARALLEL_EXECUTE_CHUNKS table.