PL/SQL Parallelism made easier in 11g
September 3, 2010 4 Comments
Today was the first day of the two day AIOUG Event Sangam 2010. Jonathan Lewis was the Key Speaker this year and I presented a session on “Developing a Scalable Application”. One of my slide mentioned about PL/SQL Parallelism. This is in continuation to my previous blog where I posted about Manual Parallelism that takes the ROWID’s of the table and distributes it across multiple chunks. The limitation with Manual Parallelism is that even though a pl/sql block needs to process 5% to 10% of the rows from a huge table, the Manual Parallelism will select and split the ROWIDs of the entire table. This means, many of the chunks may process very less data or almost zero data.
Oracle Database 11g Release 2 introduced a package dbms_parallel_execute, that makes it easier to split a table into multiple chunks, based on ROWID’s or SQL or by Number Column. During my AIOUG session, I demonstrated the use of this package to split a table into multiple chunks and schedule a batch process in parallel. The best part of this package is that the chunks can be split based on a condition and only the chunks that contain to-be-processed data will be created. Thus the rows to be processed by each of the chunks will have uniform distribution.
As mentioned in my earlier blog, I will reiterate the benefit of pl/sql parallelism, which is, optimal use of available resources. In this blog, I will post the test case that I executed during the AIOUG session and this should help you reproduce this in your environment.
In this example, consider a Nightly Batch Job NIGHTLY_JOB and this job processes the rows from VIVEK_TEST Table using an approach which is widely implemented by the developers or the dba’s. This is single job processing the required rows and thus is a very slow and an inefficient approach. There are 7200 rows in this table to be processed and assuming that the business logic in the pl/sql takes atleast 1 second and then moves on to another row. Which means, at a minimum, each row to be processed will take atleast 1 second, thus the entire process will take atleast 7200 second (2 Hours). This is the code of a batch process :
create table vivek_test pctfree 50 pctused 50 as
select * from all_objects a
where rownum=1000 then
commit;
l_limit:=0;
end if;
end loop;
end;
/
exec nightly_batch;
Once the job is executed, you can monitor the progress of this batch process from another session. V$SESSION will report the progress in MODULE column, as posted by dbms_application_info procedure.
The changes required to parallize the process is pasted below. To monitor the progress and maintain the log of each of the CHUNKS, I have created a process_status table. One thing worth noticing is the PCTFREE and PCTUSED setting, which is set to 50. The reason I have this setting is that the number of chunks to be created is dependant on the number of extents in the table and I wanted multiple extents for my table. In a production environment, number of extents should not be an issue.
create table process_status (
process_name varchar2(30),
job_id number,
s_rowid rowid,
e_rowid rowid,
start_time date,
end_time date,
processed_rows number,
completed_yn varchar2(1),
status varchar2(1000));
create or replace procedure parallel_job(chks in number) is
l_statement varchar2(1000);
l_jobno number;
l_retry number;
l_status number;
l_chunkid number;
l_srowid rowid;
l_erowid rowid;
l_anyrows boolean;
l_errcode pls_integer;
begin
execute immediate 'truncate table process_status';
/* CREATE A PARALLEL TASK */
dbms_parallel_execute.create_task('NIGHT_JOB');
dbms_parallel_execute.create_chunks_by_rowid('NIGHT_JOB','VIVEK','VIVEK_TEST',true, chks);
/* INSERT THE CHUNKS INTO PROCESS_STATUS TO LOG THE STATUS OF EACH JOB */
insert into process_status(process_name, job_id, s_rowid, e_rowid)
select 'NIGHT_JOB', chunk_id, start_rowid, end_rowid from user_parallel_execute_chunks
where task_name='NIGHT_JOB' order by 1;
commit;
/* Run the Procedure here */
LOOP
-- Get a chunk to process; if there is nothing to process, then exit the
-- loop;
DBMS_PARALLEL_EXECUTE.GET_ROWID_CHUNK('NIGHT_JOB',l_chunkid, l_srowid, l_erowid,l_anyrows);
IF (l_anyrows = false) THEN EXIT; END IF;
BEGIN
l_statement:='nightly_batch('||l_chunkid||','||''''||l_srowid||''''||','||''''||l_erowid||''''||');';
dbms_job.submit(l_jobno,l_statement);
update process_status set completed_yn='P', start_time=sysdate where job_id=l_chunkid;
commit;
DBMS_PARALLEL_EXECUTE.SET_CHUNK_STATUS('NIGHT_JOB',l_chunkid,DBMS_PARALLEL_EXECUTE.PROCESSED);
EXCEPTION WHEN OTHERS THEN
DBMS_PARALLEL_EXECUTE.SET_CHUNK_STATUS('NIGHT_JOB', l_chunkid,
DBMS_PARALLEL_EXECUTE.PROCESSED_WITH_ERROR, SQLCODE, SQLERRM);
l_errcode:=sqlcode;
update process_status set completed_yn='F', end_time=sysdate, status='FAILED WITH ERROR '||l_errcode||'.' where job_id=l_chunkid;
END;
commit;
END LOOP;
dbms_parallel_execute.drop_task('NIGHT_JOB');
end;
/
The only change required in our NIGHTLY_BATCH is the introduction of INPUT parameters and update of the process_status at the completion or failure of a chunk.
create or replace procedure nightly_batch(jobid number, low_rowid rowid, high_rowid rowid) is
l_count number:=0;
l_module varchar2(30);
l_limit number:=0;
l_errcode pls_integer;
cursor c_main is
select rowid, owner, object_name, object_type from vivek_test
where nvl(processed_flag,'N')='N'
and rowid between low_rowid and high_rowid;
begin
for i in c_main
loop
l_count:=l_count+1;
l_limit:=l_limit+1;
l_module:=l_count||' out of 7200/40=180';
dbms_application_info.set_module('PL_SQL',l_module);
dbms_lock.sleep(1); /* Assuming Processing of a row takes 1 second */
update vivek_test set processed_flag='Y' where rowid = i.rowid;
if l_limit>=1000 then
commit;
l_limit:=0;
end if;
end loop;
update process_status set completed_yn='Y', end_time=sysdate, processed_rows=l_count
where job_id = jobid;
commit;
exception when others then
l_errcode:=sqlcode;
update process_status set completed_yn='F', status='FAILED WITH '||l_errcode||'.' where job_id = jobid;
commit;
end;
/
exec parallel_job(100);
Monitor the progress of this job and you would see a significant improvement as the job completes in less than 5 minutes, as against 2 Hours, and utilizes the hardware resources optimally.
Hi Vivek,
It was nice to see you in action @ Sangam. I wish you get more hours during such seminars, i have already written heavily on this in feedback.
I have 2 questions on this DBMS_PARALLEL_EXECUTE
1)Can you please let me know whether there are any operating system level requirements to exploit this DBMS_PARALLEL_EXECUTE. i.e In case to enable parallel DML in 10g and before the Operating system should have more than 1 CPU and parallel processing ability. Do this feature also has such restrictions please let us know.
2) Is this available as part of Standard Edition of 11g Release 2 or only in Enterprise edition.
Thanks in advance.
Hello Vivek,
Thanks for the above solution.
I have two questions.
1. Can we run this code during Business hours in Production Environment?
2. Am trying to use this code. But am getting junk values in start_rowid and end_rowid. Because of this, my job is failing. Can you please help me in this?
Thanks in advance!!!
Hi Gnana,
What is your Oracle Version ?
Regards
Vivek
Hi Vivek,
Thanks for the great info !
The DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID , indeed provides a very high degree of parallelism.
But I have seen that there are many chunks with zero records .
For example,
If a table has 10K records,
There are 100 chunks being created when the chunk size =>1000 ;
But we have seen that only very few chunks have records in them and the 90% of chunks have 0 records.
I presume that this is because of ROWID address locations scattered .
This affects the load on the parallel jobs , when we have millions of records in a table .
Have you seen this issue before ?
Or is there any strategy where we can perform DBMS_PARALLEL_EXECUTE so that the chunks created have equal number of records.
Thanks