Uncategorized | Real Life Database / SQL Experiences : An Oracle Blog from Vivek Sharma

Time to restart blogging…

May 12, 2026 Leave a comment

It’s been two years since I last blogged. Staying connected with the community is important to me, and blogging helps me stay in touch with my readers.

Over these two years, I’ve had the opportunity to learn several new technologies, and I’d now like to start writing about those learnings as well.

So, it’s time to restart… see you soon with new blogs and new learnings.

Filed under General, Uncategorized

SELECT AI ! Generative AI on Autonomous Database

June 28, 2024 1 Comment

Imagine, we communicate with our Database in Business Language and Database responding with a result that matches the Business criteria. Natural Language Processing (NLP) a branch of Artificial Intelligence has this capability of generating a matching result based on the question asked.

SELECT AI, released early this year on Autonomous Database, allow business users to interact with the database and then convert it into Oracle SQL. Under the hood, it leverages generative AI with Large Language Models (LLM) to covert the user input into SQL.

In this blog, I will not write much about what it is, how to configure it and so on…I would request reading Oracle Autonomous Database Documentation that goes through these details (like, Creating Credentials, Creating Profiles for various AI providers etc.) and is regularly updated.

In this blog, I will walk you through few Interesting examples or analysis on Indian Premier League (IPL). For my Indian Readers, IPL is not something that will need an introduction. Most of the Indians are Cricket Fans. In fact, all Cricket playing nation may know about IPL. If not, then a brief introduction is here.

Before I get into some analysis, I thought of sharing something important i.e.How is SELECT AI able to generate a meaningful query, especially if the column names are a bit cryptic ? Remember, LLM’s are models that we need to train and to train these models, we need to provide it all the relevant details. Let us go through two examples.

In this case, I have a Table TEST_VIVEK with two columns X and Y. X has the Customer Names and Y contains Company Names.

CREATE TABLE TEST_VIVEK (
X VARCHAR2(32),
Y VARCHAR2(64));

INSERT INTO TEST_VIVEK VALUES(‘VIVEK’,’ABC CORPORATION’);
INSERT INTO TEST_VIVEK VALUES(‘TOM’,’XYZ INC.’);
INSERT INTO TEST_VIVEK VALUES(‘DICK’,’LLM PRIVATE LIMITED’);
INSERT INTO TEST_VIVEK VALUES(‘HARRY’,’AI INDIA’);
COMMIT;

Now, How do I provide meaningful names to these columns (X & Y)? One way is to create a View.

CREATE OR REPLACE VIEW TEST_VIVEK_VW AS
SELECT X CUSTOMER, Y COMPANY FROM TEST_VIVEK;

Now, I create a PROFILE using DBMS_CLOUD_AI API.

begin
DBMS_CLOUD_AI.create_profile(
profile_name => ‘MY_DEMO_AI_PROFILE’,
attributes => ‘{“provider”:”OCI”,
“credential_name”:”MY_CREDENTIAL”,
“oci_runtimetype”:”COHERE”,
“object_list”: [{“owner”:”admin”,”name”:”TEST_VIVEK_VW”}]
}’);
end;
/

While creating a profile, I provide TEST_VIVEK_VW as the Object to be used. Let us ask few questions (using SQL Developer).

And the Output

The other method is using an in-built database feature COMMENTS. These basically allows you to provide meaningful description and notes about TABLE(S) and COLUMNS. Marty Gubar has a nice blog with an explanation on how to make use of COMMENTS to ensure LLM’s come-up with right query. See this blog.

Time to get into some Analysis on Indian Premier League. I downloaded the data from one of the cricket websites. The files were in CSV format. I loaded it into my ADB using Data Transform. There are two tables IPL_MATCHES and IPL_DETAILS. IPL_MATCHES is a kind of Primary table with 1 Row per match and IPL_DETAILS is a Child Table with Ball-by-Ball Data. Both these tables has a column MATCH_ID. I created a Primary Key on this column on IPL_MATCHES Table and then a Foreign Key on IPL_DETAILS.

The COMMENTS for both the tables and all the relevant columns:

comment on table IPL_MATCHES is ‘contains IPL Season, City, date, player of the match, stadium, winner, target runs, target overs, match result’;
comment on column IPL_MATCHES.MATCH_ID is ‘Match ID. Join this column to other Tables’;
comment on column IPL_MATCHES.IPL_SEASON is ‘IPL Season. Each IPL season will have multiple matches’;
comment on column IPL_MATCHES.CITY is ‘City where the IPL match was played’;
comment on column IPL_MATCHES.MATCH_DATE is ‘Date when the IPL Match was played’;
comment on column IPL_MATCHES.MATCH_TYPE is ‘Type of the IPL Match’;
comment on column IPL_MATCHES.PLAYER_OF_THE_MATCH is ‘Name of the Player who won the player of the Match award’;
comment on column IPL_MATCHES.VENUE is ‘Stadium where the IPL match was played’;
comment on column IPL_MATCHES.TEAM1 is ‘Team 1 playing the ipl match’;
comment on column IPL_MATCHES.TEAM2 is ‘Team 2 playing the ipl match’;
comment on column IPL_MATCHES.TOSS_WINNER is ‘Team that won the Toss’;
comment on column IPL_MATCHES.TOSS_DECISION is ‘Decision taken by the Team winning the toss’;
comment on column IPL_MATCHES.WINNER is ‘Winner of the IPL Match’;
comment on column IPL_MATCHES.RESULT is ‘Result of the IPL Match’;
comment on column IPL_MATCHES.RESULT_MARGIN is ‘Margin by which a team won the match’;
comment on column IPL_MATCHES.TARGET_RUNS is ‘Target Runs or Runs required to win’;
comment on column IPL_MATCHES.TARGET_OVERS is ‘Target Overs’;
comment on column IPL_MATCHES.SUPER_OVER is ‘super over’;
comment on column IPL_MATCHES.METHOD is ‘Win Type’;
comment on column IPL_MATCHES.UMPIRE1 is ‘Umpire 1’;
comment on column IPL_MATCHES.UMPIRE2 is ‘Umpire 2’;

comment on table IPL_DETAILS is ‘Contains ball by ball Detail of all the ipl matches, Batsman Name, Bowler Name, runs scored, player dismissed’;
comment on column IPL_DETAILS.MATCH_ID is ‘Match ID. Join this column to other Tables’;
comment on column IPL_DETAILS.INNING is ‘Inning Number’;
comment on column IPL_DETAILS.BATTING_TEAM is ‘Team Batting’;
comment on column IPL_DETAILS.BOWLING_TEAM is ‘Team Bowling’;
comment on column IPL_DETAILS.OVER_NUMBER is ‘Over Number’;
comment on column IPL_DETAILS.BALL_NUMBER is ‘Ball Number’;
comment on column IPL_DETAILS.BATTER is ‘Batsman’;
comment on column IPL_DETAILS.BOWLER is ‘Bowler’;
comment on column IPL_DETAILS.NON_STRIKER is ‘Non Striker’;
comment on column IPL_DETAILS.RUNS_OFF_BAT is ‘Runs scored’;
comment on column IPL_DETAILS.EXTRA_RUNS is ‘Extra Runs’;
comment on column IPL_DETAILS.TOTAL_RUNS is ‘Total Runs on a Ball’;
comment on column IPL_DETAILS.EXTRAS_TYPE is ‘Extras’;
comment on column IPL_DETAILS.IS_WICKET is ‘Wicket Taken. 1 is Yes and 0 is No’;
comment on column IPL_DETAILS.PLAYER_DISMISSED is ‘Batsman Dismissed, Player Dismissed’;
comment on column IPL_DETAILS.DISMISSAL_KIND is ‘Dismissal Type’;
comment on column IPL_DETAILS.FIELDER is ‘Fielder’;

Object List in the CREATE_PROFILE API has the two Tables IPL_MATCHES and IPL_DETAILS. One additional parameter that I have is COMMENTS:TRUE. Once done, we can go through some interesting analysis.

begin
DBMS_CLOUD_AI.create_profile(
profile_name => ‘VIVEK_AI_PROFILE’,
attributes => ‘{“provider”:”OCI”,
“credential_name”:”MY_CREDENTIAL”,
“comments”:”true”,
“oci_runtimetype”:”COHERE”,
“object_list”: [{“owner”:”admin”,”name”:”ipl_matches”}, {“owner”:”admin”,”name”:”ipl_details”}]
}’);
end;
/

begin
dbms_cloud_ai.set_profile(
profile_name => ‘VIVEK_AI_PROFILE’
);
end;
/

Indian Premier League started in 2008 and the latest League concluded on 26th May 2024.

First Ever Match was played on 18th April 2008. Latest Season of IPL was 2024 and the final match was played between Kolkata Night Riders and Rajasthan Royals on 26th May 2024.

Next, let us get the count of matches played in each of the IPL Season’s. This is typically a count of matches group by Season.

Number of matches in 2011, 2012 and 2013 were more than the other previous seasons. Then these count dropped before going up in 2022. Any specific reasons for this? Let’s see.

We can now see that the number of Teams in 2011, 2012 and 2013 were more than previous seasons. The format of this league is that each team has to play against each other. Hence, as a new team is added to the league, the number of matches go up. In 2014, number of teams reduced back to 8. Season 2022 onwards, we again see an increase in this number.

Next, let’s see the top 5 Batsman with most runs. This will be computed from IPL_DETAILS table as the data here is ball by ball. This means, we have to take a sum of run scored by a batsman on each ball. Now. if you again scroll up and see the column names in IPL_DETAILS, we do not have a column called BATSMAN, nor we have a column called Runs Scored. We do have BATTER and RUNS_OFF_BAT. We passed the relevant information via COMMENTS.

The Query SELECT AI came up with:

As per the analysis, Virat Kohli (V Kohli) is the top batsman with highest number of runs. Next, let’s check top 5 Bowler’s having dominated and dismissed Virat Kohli the most.

Seasonwise IPL Winners and Player of the Final Match.

Let’s ask a bit complex question. I want to calculate the Batting Strike Rate. This is basically computed as the Sum of Runs Scored divided by the Number of Balls faced. The numbers are per 100 ball i.e.if the strike rate is 150, this means, at an average a Batsman Scores 150 runs in 100 Balls. Now, the way our data is stored, we have Runs_Off_Bat and Ball_Number column. This means, I will have to take a sum of Runs_off_Bat, divide it by Count of Ball_Number and this value will then have to be multiplied by 100. To be honest, I had to play around with this question to get it right and after 3 or 4 attempts, I was able to get it.

In a game of Cricket, Strike Rate doesn’t change every few minutes. It only changes during or after a match. Therefore, the next task that I would want to try is creating a Materialized View with a pre-computed Strike Rate, adding it to the object list and then running a simple query to get the Strike Rate. Once I am done, will post the results as an update to this blog. We can play around with this data and perform some more analysis. I actually did many more, but to make this blog short thought of publishing only few of them. I actually wanted to demonstrate the SELECT AI capabilities and how easily it was able to get us the required details. Hope you enjoy this.

Filed under Autonomous database, Uncategorized Tagged with Autonomous database, Autonomous DB, cricket, database, GENERATIVE AI, india, ipl, SELECT AI, SQL

AIOUG ! Oracle Community Yatra 2022

July 6, 2022 Leave a comment

My session on “Database Performance Analytics using Python”

Last 2 years, we all connected over the Virtual World. It was a great experience. However, when it comes to connecting with the speaker and even for the speaker to connect with the participants, virtual world can’t replace physical, in-person sessions. It gives us an opportunity to collaborate with each other and enhance our social world. The good news here is – All India Oracle User Group is back with physical and in-person events, with Oracle Community Yatra 2022. This is scheduled across 6 cities in India, starting 23rd July 2022. I am speaking as well and to know about my session, please continue reading…

For registration, click here

DBA’s – Don’t Worry about your future. With Databases on Cloud, and especially with Autonomous Database gaining much importance, many DBA’s fear that their value would come down as most of their tasks would be automated. This is true, if you are still stuck in performing regular and mundane jobs. Why not use your existing technical skillset and move to a next level? You had been responsible for maintaining health of your Application and Database. Let’s reuse the same skill and knowledge, add some spice to it and elevate from a Performance Analyst to a Performance Analytics field. In this session, I will walk you through some important concepts of Database Performance and then take you to the field of Analytics. During the session, I will also speak about some real life scenario’s and how this Analytics helps us in providing a global view of the entire database performance. It will be a collaboration between my performance skills and python. See you soon.

Filed under Uncategorized Tagged with aioug, Performance

Database Vault ! Autonomous Database

March 10, 2020 2 Comments

Availability of Database Vault on Autonomous Database (Shared Infrastructure) was announced in February 2020. Database Vault does not require any introduction as this feature has been available since Oracle 10g. As a brief, it provides powerful security controls to help protect sensitive application data from Unauthorised access and comply with privacy and regulatory requirements. This means, a superuser (or a user with DBA privilege) can access the sensitive application data. Oracle Database Vault implements this security at the database layer.

Database Vault provides additional security to the Autonomous Database Family and is just few steps. This blog demonstrates the steps needed to implement Database Vault on Autonomous Data Warehouse. Before I go through the steps, it is important to know a concept of REALMS. Realm is a grouping of database schemas, database objects and roles that must be secured for a given application. It is a kind of a Zone of Protection for your database objects. So, one of the steps required will be to create a Realm.

Every Autonomous Database has an ADMIN user, which is a super user. ADMIN user can query any table within the Database (by default). All the steps required for implementing Database Vault is to be executed using ADMIN user. Once done, ADMIN user will continue to be a super user, but will not be able to query the application data. In my case, assume VIVEK is an application schema containing all the tables accessed by my application and one such table is CREDIT_CARD.


- Connect to ADMIN User and Query the CREDIT_CARD Table. ADMIN is able to query the data.

CONNECT ADMIN/xxxxxxxxxxxxx@ADWxxxxxx_low

SELECT * FROM VIVEK.CREDIT_CARD;

CC_NO				 CC_NAME			  EXPIRY_DT
-------------------------------- -------------------------------- --------------------
1234 5678 8765 4321		 ADB_1				  29-FEB-2024 00:00:00
5678 8765 1234 4321		 ADB_2				  31-DEC-2021 00:00:00

- Check whether Data Vault is Configured and Enabled. It is not configured.

SELECT * FROM DBA_DV_STATUS;

NAME		    STATUS
------------------- ----------------------------------------------------------------
DV_CONFIGURE_STATUS FALSE
DV_ENABLE_STATUS    FALSE

- Create Database Vault Owner and Database Vault Account Manager User.

CREATE USER ADB_DBV_OWNER IDENTIFIED BY "xxxxxxxxxxxxx";
CREATE USER ADB_DBV_ACCTMGR IDENTIFIED BY "xxxxxxxxxxxxx";
GRANT CREATE SESSION, SELECT ANY TABLE TO ADB_DBV_OWNER;

- Check whether we can select the data from CREDIT_CARD table, before implementing Database Vault
- Database Audit Vault User has SELECT ANY TABLE privilege.

CONNECT ADB_DBV_OWNER/xxxxxxxxxxxxx@ADWxxxxxx_low

SELECT * FROM VIVEK.CREDIT_CARD;

CC_NO				 CC_NAME			  EXPIRY_DT
-------------------------------- -------------------------------- --------------------
1234 5678 8765 4321		 ADB_1				  29-FEB-2024 00:00:00
5678 8765 1234 4321		 ADB_2				  31-DEC-2021 00:00:00


- Configure and Enable Database Vault

CONNECT ADMIN/xxxxxxxxxxxxx@ADWxxxxxx_low


EXEC DBMS_CLOUD_MACADM.CONFIGURE_DATABASE_VAULT('ADB_DBV_OWNER', 'ADB_DBV_ACCTMGR');

- Database Vault Configured, but not Enabled

SELECT * FROM DBA_DV_STATUS;

NAME		    STATUS
------------------- ----------------------------------------------------------------
DV_CONFIGURE_STATUS TRUE
DV_ENABLE_STATUS    FALSE

- Enable Database Vault. For this to come into effect, we need to restart the database

EXEC DBMS_CLOUD_MACADM.ENABLE_DATABASE_VAULT;

- Restart the Database. Check after Database Restart
- Database Vault Configured and Enabled

SELECT * FROM DBA_DV_STATUS;

NAME		    STATUS
------------------- ----------------------------------------------------------------
DV_CONFIGURE_STATUS TRUE
DV_ENABLE_STATUS    TRUE

- Admin can still Query the Data

SELECT * FROM VIVEK.CREDIT_CARD;

CC_NO				 CC_NAME			  EXPIRY_DT
-------------------------------- -------------------------------- --------------------
1234 5678 8765 4321		 ADB_1				  29-FEB-2024 00:00:00
5678 8765 1234 4321		 ADB_2				  31-DEC-2021 00:00:00

- Create REALM, Add Object to the Realm 

BEGIN
 DBMS_MACADM.CREATE_REALM(
  realm_name    => 'Vivek Application Schema', 
  description   => 'Realm created for Vivek Schema', 
  enabled       => DBMS_MACUTL.G_YES, 
  audit_options => DBMS_MACUTL.G_REALM_AUDIT_FAIL + DBMS_MACUTL.G_REALM_AUDIT_SUCCESS,
  realm_type    => 1,
  realm_scope   => DBMS_MACUTL.G_SCOPE_LOCAL);
END; 
/

BEGIN
 DBMS_MACADM.ADD_OBJECT_TO_REALM(
  realm_name   => 'Vivek Application Schema', 
  object_owner => 'VIVEK', 
  object_name  => '%', 
  object_type  => '%'); 
END;
/

- Add Authorised User, which is the User that can Query the Application Table
- In this case, VIVEK is the application owner

BEGIN
 DBMS_MACADM.ADD_AUTH_TO_REALM(
  realm_name   => 'Vivek Application Schema', 
  grantee      => 'VIVEK', 
  auth_options => DBMS_MACUTL.G_REALM_AUTH_OWNER);
END;
/

- Check whether ADMIN can Query the sensitive Data from Vivek Schema

CONNECT ADMIN/xxxxxxxxxxxxx@ADWxxxxxx_low

SELECT * FROM VIVEK.CREDIT_CARD;
                    *
ERROR at line 1:
ORA-01031: insufficient privileges

- Vivek being an Application Owner, can Query the data. No other Users allowed to Query

CONNECT VIVEK/xxxxxxxxxxxxx@ADWxxxxxx_low

SELECT * FROM VIVEK.CREDIT_CARD;

CC_NO				 CC_NAME			  EXPIRY_DT
-------------------------------- -------------------------------- --------------------
1234 5678 8765 4321		 ADB_1				  29-FEB-2024 00:00:00
5678 8765 1234 4321		 ADB_2				  31-DEC-2021 00:00:00

As can be seen, the steps are pretty simple. This is a simplified version of Database Vault configuration. Refer to Oracle Documentation on the various configuration options available with Database Vault. Hope this helps.

Filed under Autonomous database, Uncategorized Tagged with Autonomous, Autonomous database, Database Vault

Oracle Groundbreaker Yatra ! July 2019..

May 19, 2019 Leave a comment

After Sangem 2018, now it’s time for another OTN Yatra, now renamed to Oracle Groundbreaker Yatra. Sangam is hosted every year and had been either in Bangalore or Hyderabad. It is a One City Event. Oracle Groundbreaker Yatra on the other hand is a Multi-City Tour. I assume, last year it was a 6 City Tour. This year, it is scheduled to be hosted in Mumbai, Chennai, Delhi, Bengaluru, Hyderabad, Kolkata, Pune, Ahmedabad, Visakhapatnam and Thiruvananthapuram.

Registration for the Groundbreaker would open soon. Many prominent speakers would be travelling and it should be a never-miss event for the Oracle Database Community (DBA’s, Developers, Architects, Data Scientists etc). Sandesh Rao, Connor Mcdonald, Roy Swonger and Gurmeet Goindi would be speaking on some interesting topic (as always). Roy Swonger from Oracle US would be visiting India for the first time. Every time you attend the sessions from these experts, you tend to learn something new and I am sure, this time as well, you will come out with bags full of Knowledge.

I am also one of the speakers for the Groundbreaker Yatra and would be travelling to 4 cities, apart from Mumbai (as it is my base location). I have opted for the locations, where I am either travelling for the first time for an User Group event or have visited only once or twice. Now, you can guess it :). I will be in Ahmedabad, Kolkata, Visakhapatnam, Thiruvananthapuram and Mumbai. I assume the agenda for some of the locations is already published and for the other locations, is in the final stage.

Keep a tab on this link Groundbreaker Yatra 2019, so that you do not miss on the registration.

Now, on my session. I am presenting a Session on “Database and Application Performance ! Then and Now.” It is a 1 hour session. I would be covering some of the Performance Challenges the DBA’s or Developers use to face and still face and how these are automatically taken care in Autonomous Databases. I will cover Optimizer, Parallelism, Oracle 19c and walk through some of the topics around Reinforcement Learning and Data Science. Usually, my sessions are supported by live demo’s. But for a 1 hour session, I am not sure whether I will be able to, but shall try to make it more interesting. See you all in July 2019.

Filed under AIOUG, Autonomous database, Exadata, Optimizer, Performance, Uncategorized Tagged with aioug, Autonomous database, Connor Mcdonald, Oracle Groundbreaker Yatra, Sandesh Rao

Python – Error Installing cx_Oracle on Mac

April 10, 2019 2 Comments

Just wanted to share a solution that worked for me while resolving an issue installing cx_Oracle on my MacBook. I wanted to connect Python to an Oracle Database. I installed Oracle Instant Client 18c from here. I clicked on “Instant Client for Mac OS X (Intel x86)(32 bit and 64 bit). Then I downloaded following :

- Basic Package - All files required to run OCI, OCCI and JDBC-OCI applications
- The other 2 Downloads under Tools - optional packages.

On Mac, by default these downloads are unzipped. I copied all the files into a Single Folder and updated the .bash_profile with ORACLE_HOME. I made sure that I am able to execute sqlplus. Once done, I started the installation of cx_Oracle. Received the following error :

python -m pip install -U cx_oracle
Collecting cx_oracle
  Using cached https://files.pythonhosted.org/packages/4b/aa/99e49d10e56ff0263a8927f4ddb7e8cdd4671019041773f61b3259416043/cx_Oracle-7.1.2.tar.gz
Building wheels for collected packages: cx-oracle
  Building wheel for cx-oracle (setup.py) ... error
  Complete output from command /Users/vivek/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-install-kt4hhttb/cx-oracle/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-wheel-5rm6ofqh --python-tag cp36:
  running bdist_wheel
  running build
  running build_ext
  building 'cx_Oracle' extension
  creating build
  creating build/temp.macosx-10.7-x86_64-3.6
  creating build/temp.macosx-10.7-x86_64-3.6/src
  creating build/temp.macosx-10.7-x86_64-3.6/odpi
  creating build/temp.macosx-10.7-x86_64-3.6/odpi/src
  gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/vivek/anaconda3/include -arch x86_64 -I/Users/vivek/anaconda3/include -arch x86_64 -DCXO_BUILD_VERSION=7.1.2 -Iodpi/include -Iodpi/src -I/Users/vivek/anaconda3/include/python3.6m -c src/cxoBuffer.c -o build/temp.macosx-10.7-x86_64-3.6/src/cxoBuffer.o
  xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for cx-oracle
  Running setup.py clean for cx-oracle
Failed to build cx-oracle
Installing collected packages: cx-oracle
  Running setup.py install for cx-oracle ... error
    Complete output from command /Users/vivek/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-install-kt4hhttb/cx-oracle/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-record-08000fay/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_ext
    building 'cx_Oracle' extension
    creating build
    creating build/temp.macosx-10.7-x86_64-3.6
    creating build/temp.macosx-10.7-x86_64-3.6/src
    creating build/temp.macosx-10.7-x86_64-3.6/odpi
    creating build/temp.macosx-10.7-x86_64-3.6/odpi/src
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/vivek/anaconda3/include -arch x86_64 -I/Users/vivek/anaconda3/include -arch x86_64 -DCXO_BUILD_VERSION=7.1.2 -Iodpi/include -Iodpi/src -I/Users/vivek/anaconda3/include/python3.6m -c src/cxoBuffer.c -o build/temp.macosx-10.7-x86_64-3.6/src/cxoBuffer.o
    xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
    error: command 'gcc' failed with exit status 1
    
    ----------------------------------------
Command "/Users/vivek/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-install-kt4hhttb/cx-oracle/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-record-08000fay/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-install-kt4hhttb/cx-oracle/

The relevant error here was “xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools)”, which is related to Xcode. To be honest, searched few solutions on google and finally got a solution which was to run download and install Xcode.

xcode-select --install

Just Download, Install Xcode and the installation went smooth. I can now connect python to an Oracle Database. Task accomplished 🙂

Viveks-MacBook-Air:~ vivek$ xcode-select --install
xcode-select: note: install requested for command line developer tools
Viveks-MacBook-Air:~ vivek$ python -m pip install -U cx_oracle
Collecting cx_oracle
  Using cached https://files.pythonhosted.org/packages/4b/aa/99e49d10e56ff0263a8927f4ddb7e8cdd4671019041773f61b3259416043/cx_Oracle-7.1.2.tar.gz
Building wheels for collected packages: cx-oracle
  Building wheel for cx-oracle (setup.py) ... done
  Stored in directory: /Users/vivek/Library/Caches/pip/wheels/5f/b1/97/adf672ad4ab215e227ba939604abccc086e16c91b04cccdcd7
Successfully built cx-oracle
Installing collected packages: cx-oracle
Successfully installed cx-oracle-7.1.2

One important point to remember is that while it says cx_oracle installed, while importing the module in Python Prompt or on Jupyter, it is cx_Oracle.

Filed under Python, Uncategorized Tagged with cx_Oracle, Error, xcode-select --install, xcrun: error: invalid active developer path

Autonomous Database ! ADW and ATP. Why do I need both ?

September 27, 2018 Leave a comment

Autonomous Data Warehouse (ADW) was announced in March 2018 and then Autonomous Transaction Processing (ATP) is the latest addition to the Autonomous Database family. ATP is for Mixed Workload and therefore provides 4 Service Names. These are :

HIGH :Highest level of CPU and IO Resources. Concurrency with this Service Name depends upon the number of OCPUs and will scale accordingly.

MEDIUM :Lowest level of CPU and IO Resources. Concurrency depends on the number of CPU’s and will scale accordingly.

LOW :Least amount of CPU and IO Resources. Again, Concurrency depends on the number of CPU’s and the Scaling is directly proportional to the number of OCPU’s.

ADW provides these three Service Names as well. However, unlike ADW (see my previous blog), the definition of these Services are different. In ADW, HIGH provides highest level of Resources but less concurrency, whereas, in ATP, HIGH do provide highest level of Resources but also provides Highest level of Concurrency. Since ATP is designed for Mixed Workload, it provides an additional Service Name PARALLEL. It provides high level of resources but lowest concurrency and is useful for Batch Processes or Heavy Reporting.

Now, many of the technical folks have raised this question on “Why do we need Both? Can’t we use ATP to take care of Data Warehouse kind of a load?”. ATP very well does that by providing an option to run Mixed Workload. However, there are many implementations that are purely Data Warehouse. The problem is that the amount of data that we generate, Warehouse Databases have grown massive in size supporting large amount of complex queries involving many nested joins and expecting sub-second response time. Further, these Databases house data from many different sources. Optimization Strategy for an OLTP and Data Warehouses are completely different. Therefore, an OLTP kind of a setup cannot (in most of the cases) provide the response time needed for DWH Queries.

Row Store v/s Columnar Store

Now, coming back to ATP v/s ADW argument. In an ADW, the data loaded is by default compressed to Oracle Hybrid Columnar Compression, whereas ATP uses a Row Store Format. What this means is that in an ADW, the data stored at disk is in a Columnar format, which is a proven format for speeding up Analytics kind of a Workload where we query few columns. All Data Warehouse Implementations, like Redshift, Google Big Query etc. store data in Columnar format. On the other hand, ATP stores the data in Row format which is ideal for OLTP that demands Single or few row(s) updates or deletes or queries small number of rows.

Since the data in ADW is stored Compressed in Columnar format, it allows scanning of Billions of Rows per CPU cycle. The data for a particular column is stored contiguously making it possible to scan through large number of rows in a single CPU cycles. This technique is called SIMD (Single Instruction Multiple Dataset) processing. To meet a sub-second response time, this technique is plays a significant role.

Such large scans (scanning billion of rows per second) is not possible with Row Store and one of the reasons for this can be found in one of my blog published in 2015 which is still very much relevant. Skipping of Columns consume CPU Cycles which is demonstrated in the blog post that can be found here.

Through this short blog, I wanted to clear the confusion around ADW and ATP. Both have their own use cases and thus are optimized to manage different workloads. Both of these Autonomous Databases have some exciting features. Stay tuned for more on these.

Filed under Autonomous database, Uncategorized Tagged with ADW, ATP, Autonomous database, Autonomous DB

Resource Manager – Autonomous Data Warehouse

September 14, 2018 4 Comments

I recently presented a Full Day Tech Event on Autonomous databases (ADB) for the North India Chapter of All India Oracle User Group. Since most of the attendees were Database Administrators, it was important to cover technical aspects of ADB. While Oracle doesn’t publicize the Internals of ADB, I spoke about some of my experiences based on many of POC’s.

Many DBA’s still assume that 18c Database is Autonomous? Therefore it is Important to know What is Autonomous Database and What it is not? First thing to note is that 18c is not Autonomous. 18c is just like any other Oracle Database. Autonomous uses the features of Oracle Database 18c, plus 12c, plus 11g, plus 10g and so on. Autonomous Database ensures that all these features work together. So, 18c is one of the Building Blocks of 18c. Autonomous uses all the features that Oracle Developed for almost 2 decades. The other important underlying technology is Oracle Exadata, which is the fastest Oracle Database Machine with redundancy built-in at each layer. Again, Oracle had been working on many features that makes this machine a powerful database machine, in terms of Performance, Reliability and Scalability. The Journey had been long from Smart Scans to Direct-to-wire protocol to Smart Fusion Block Transfer and the enhancements continue. One important thing to note is – Customers do have an option to move to 18c on Exadata within their own Data Centre. Then does it mean, this combination is Autonomous? The answer is NO. The two components are critical, but what makes it Autonomous is the Cloud Automation with Machine Learning.

During the session, I covered the Performance features and the way Oracle controls resource utilization. The topic that came up was Oracle Database Resource Managers. Database Administrators were more inquisitive to know more about this and how it is been implemented in ADB.

Resource Manager-

Resource Managers or Consumer Groups or Service names are the integral part of Autonomous Databases as they manage the workload based on the performance characteristics of the Application. For example, in an Autonomous Data Warehouse there are 3 pre-created Service Names. These are –

HIGH – Highest Resources and Lowest Concurrency. Maximum Queries that can run are 3.

MEDIUM – Less Resources and High Concurrency. Maximum Queries would depend upon the number of OCPU’s.

LOW – Least resources and Highest Concurrency. Maximum Queries – twice the number of OCPU’s and therefore, would depend upon the number of OCPU’s.

Autonomous Transaction Processing (ATP) introduced a new Service – PARALLEL. In the context of ATP, the definition for HIGH, MEDIUM and LOW is different that in ADW. I will cover these in other blog post.

Now, the game becomes more interesting with HIGH and MEDIUM Consumer Groups. While the maximum number of Queries that can run from HIGH is 3 but with concurrency i.e. with concurrent Queries running from MEDIUM and LOW, this number can be less than 3. Same is the case with MEDIUM. With MEDIUM, the maximum numbers would depend on the number of OCPU’s, but actual concurrent queries that can run below the maximum allowed would depend upon the queries running from other consumer groups. For LOW, it is twice the number of OCPU’s and is irrespective of number of queries running from other consumer groups. All these are managed efficiently by pre-configured Resource Managers. Resource Manager, as the name implies, manages the resources. When the load on the system is light, there is very little need for managing the resources. However, as the load increases and system becomes more busy, managing the resources become more important. One of the Primary Goal of Autonomous Database is to ensure consistent performance (always). Therefore, these restrictions. However, as mentioned earlier, other than HIGH, the maximum for other consumer groups depends upon the number of OCPU’s. So, if you have a highly concurrent environment or during peak load with high concurrency, you can scale up more OCPU’s and scale down as well. By the way, Scaling OCPU’s UP and DOWN is Online and is irrespective of the Storage. Let me digress for a moment here. In case of Autonomous Database, if you need additional Storage, you can add it Online and this increase does not mandate adding OCPU’s. Same with the OCPU’s, you don’t need to add Storage. Both are independent. This is not the case with the competition. For example, with AWS Redshift or Microsoft SQL Data Warehouse, since these technologies work on Shared Nothing Architecture, additional Storage means adding more OCPU’s as well. What this means is – Even if you have good amount of CPU resources available and need is to add only the Storage, you will have to add a compute node as well. A storage without a compute is like a DEAD duck. It does nothing on disk unless connected to a Compute Mode.

Coming back to the Resource Manager. As mentioned, resource manager is automatically created and the Application owner has to only make sure that the application queries or the reports are connected to one of the consumer groups based on workload patterns or application requirements.

Why Resource Manager (RM)?

As mentioned earlier, the primary goal of a RM is to Manage System Resources. If resources were unlimited, there would be no need to manage them. Without these, all database connections are considered equal and under heavy load all sessions are impacted equally. There are various components of RM, like Consumer Groups – RM allocate resources to Consumer Groups and not to Sessions, Plan Directives – Resource allocations are actually assigned to Plan Directives and a Consumer group is then assigned a Directive, then finally Resource Plan – a collection of Directives that determines how the resources are to be allocated and managed. Resource Plan is set at the Database or System Level using RESOURCE_MANAGER_PLAN.

In terms of mapping rules, all user sessions (except SYS and SYSTEM) are mapped to a Consumer Group. By default, these are mapped to OTHER_GROUPS, which is a mandatory consumer group. There are 2 types of Session Attributes – Login Attributes and Runtime Attributes. Runtime Attributes is assigned at Runtime and can ne changed anytime during the life cycle of the session. This is basically decided by the application. Login Attributes is tied to the connection and cannot be changed during the lifecycle.

Resource Manager, in ADB uses Login Attribute. Therefore, the application connection is to a particular Service Name has to be pre-decided based on the application requirement.

The maximum number of Queries that can be executed using MEDIUM and LOW Service Names scales linearly. As per Oracle Documentation, for a 16 OCPU Machine, the maximum with MEDIUM is 20 and LOW is 32 (twice the number of OCPU’s). As you scale-up, say to 32 OCPU’s, the maximum for these will Scale as well to 40 for MEDIUM and 64 for LOW.

At present, will restrict this blog post to Resource Manager. Will write on something interesting soon.

Filed under AIOUG, Autonomous database, Uncategorized Tagged with Autonomous database, Autonomous DB, Resource Manager

Autonomous Database Tech Day ! Gurgaon

August 19, 2018 Leave a comment

#Autonomous #AIOUG Presenting a Full Day Event on 8th September 2018 for North India Oracle User Group in Gurgaon. This is on Oracle Autonomous Database. Would be covering some interesting technical capabilities of Autonomous Databases. I am covering the 2 offerings i.e. Autonomous Data Warehouse and Autonomous Transaction Processing. For Registration, click on the following link :

Meraevents Link

This being an Oracle User Group Session, focus would be on the Technical Capabilities of ADW / ATP, like Parallel Processing, Concurrency, Optimizer Enhancements and Behaviour and most importantly, Competition.

So, North India Folks : See you all on 8th September 2018.

Filed under AIOUG, Autonomous database, Optimizer, Performance, Uncategorized Tagged with ADW, ATP, Autonomous database, Autonomous DB

Consistent Gets for an Index Scan

November 15, 2017 2 Comments

This question was raised by a participant during my User Group Session on “Indexing : Fact & Myth Session”. Therefore, I thought of writing about it with the same example that I demonstrated..

I was demonstrating on a Myth that “High Cardinality Column should be a Leading Column of an Index” and for this, I created following table with 2 Indexes.

create table t1 as
select * from all_objects;

exec dbms_stats.gather_table_stats(user,'T1');

SQL> select owner, num_rows, blocks from dba_tables where table_name='T1' and owner='SCOTT';

OWNER                  NUM_ROWS     BLOCKS
-------------------- ---------- ----------
SCOTT                     68605       1377

SQL> select column_name, num_distinct, num_nulls from dba_tab_columns
where   owner='SCOTT'
and     table_name='T1'
and     column_name in ('OBJECT_ID','TEMPORARY')
order by 1;

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS
------------------------------ ------------ ----------
OBJECT_ID                             68605          0
TEMPORARY                                 2          0

SQL> create index t1_ot on t1(object_id, temporary);

Index created.

SQL> create index t1_to on t1(temporary,object_id);

Index created.

SQL> select index_name, blevel, leaf_blocks, clustering_factor from dba_indexes
where   table_name='T1'
order by 1;

INDEX_NAME                         BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ ---------- ----------- -----------------
T1_OT                                   1         171              1458
T1_TO                                   1         171              1494

So, I have a table with 68605 rows. Object_ID is 100% Distinct and Temporary has 2 Distinct Values. I have 2 indexes on it, which are on object_id & temporary. T1_OT is on (Object_ID, Temporary) and T1_TO is on (Temporary, Object_ID). The naming convention stands for the first letter of the columns in the order they are in the Index. So, for T1_OT O->Object_id and T->Temporary.

The Index Statistics shows that the two indexes are almost same, in terms of Height (BLEVEL), number of Leaf Blocks. A minor different in the Clustering_factor though.

I than executed the queries to demonstrate that the I/O’s done by queries using any of the 2 indexes is exactly same. For this, I executed the query and it used Index T1_OT and and then hinted the query to make use of T1_TO Index. The Cost and IO’s for both the queries are exactly same, which leads to a conclusion that cardinality doesn’t matter when designing a Index Strategy. Index Columns should be based on Application Queries and the Columns. The queries were executed twice to ensure that the consistent gets that we are post the hard parsing.


## IO's for the Query using an Index T1_OT

select owner, object_name from t1
where      object_id = 58
and        temporary='N';

OWNER                OBJECT_NAME
-------------------- ------------------------------
SYS                  I_CCOL2

set autot trace
select owner, object_name from t1
where      object_id = 58
and        temporary='N';

Execution Plan
----------------------------------------------------------
Plan hash value: 3109227855

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |     1 |    48 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    48 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_OT |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=58 AND "TEMPORARY"='N')

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets 
          0  physical reads
          0  redo size
...
...
          1  rows processed

## IO's with Index T1_TO

SQL> select /*+ index(t1,t1_to) */ owner, object_name from t1
     where      object_id = 58
     and        temporary='N';

OWNER                OBJECT_NAME
-------------------- ------------------------------
SYS                  I_CCOL2

Elapsed: 00:00:00.00
SQL> pause;

set autot trace
select /*+ index(t1,t1_to) */ owner, object_name from t1
where      object_id = 58
and        temporary='N';


Execution Plan
----------------------------------------------------------
Plan hash value: 1129381402

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |     1 |    48 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    48 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_TO |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TEMPORARY"='N' AND "OBJECT_ID"=58)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
...
...
          1  rows processed

While I have 2 Indexes on the same columns only the ordered changed, optimizer chose an Index on Object_ID and Temporary. As my sessions are interactive (and this sometimes mean that my sessions take more time than alloted :)), I asked the participants the reason behind this optimizer decision and there were lot many assumptions. Anju Garg came out with the correct guess. However, will disclose this later.

At this point, the question raised was, why there are 4 Consistent I/O’s? The assumption here was that it should be 3 (BLEVEL + LEAF BLOCK Access + Table Block).

Next, I dropped any one of the Index and re-created it as a Unique Index. Remember, Object_ID is 100% Distinct. So, I will drop and re-create T1_OT.

SQL> drop index t1_ot;

Index dropped.

Elapsed: 00:00:00.04
SQL> create unique index t1_ot_uq on t1(object_id, temporary);

Index created.

SQL> select index_name, blevel, leaf_blocks, clustering_factor, uniqueness from dba_indexes
where   table_name='T1'
order by 1;
  2    3
INDEX_NAME                         BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR UNIQUENES
------------------------------ ---------- ----------- ----------------- ---------
T1_OT_UQ                                1         162              1457 UNIQUE
T1_TO                                   1         171              1493 NONUNIQUE

I will then execute the queries to check for the consistent gets. This time, the consistent gets for the execution plan with Unique Index is 3 (as against 4 for the same non-unique index).

select owner, object_name from t1
where      object_id = 58
and        temporary='N';

Execution Plan
----------------------------------------------------------
Plan hash value: 1959391432

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |    48 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     1 |    48 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | T1_OT_UQ |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=58 AND "TEMPORARY"='N')


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads

There is a difference in the Consistent Gets with Unique and Non-Unique Index. The height of both these Indexes are exactly same. This difference was important to get to original question. So, I generated a 10046 trace for the 2 Queries (with Unique and Without Unique Scan) and the relevant portion from the trace is as under, which will explain the reason behind 4 Consistent Gets.

## 10046 portion for Non-Unique index. Please see the BOLD and UNDERLINED portion. The extra cr=1 for FETCH#18446604434610142176.
## In this case, once the Blocks are fetched from an Index (cr=2) and Table (cr=1) Total cr=3, there is an extra cr immediately after 
## SQL*Net message to client. So, the total cr = 2 + 1 + 1 (extra) = 4.

PARSE #18446604434610142176:c=118,e=119,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1259244381,tim=7093069466
EXEC #18446604434610142176:c=88,e=93,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1259244381,tim=7093070117
WAIT #18446604434610142176: nam='SQL*Net message to client' ela= 8 driver id=1413697536 #bytes=1 p3=0 obj#=0 tim=7093070421
WAIT #18446604434610142176: nam='db file scattered read' ela= 2292 file#=12 block#=275312 blocks=8 obj#=77545 tim=7093073241
WAIT #18446604434610142176: nam='db file sequential read' ela= 49 file#=12 block#=255875 blocks=1 obj#=77543 tim=7093073719
FETCH #18446604434610142176:c=2088,e=3043,p=9,cr=3,cu=0,mis=0,r=1,dep=0,og=1,plh=1259244381,tim=7093073891
WAIT #18446604434610142176: nam='SQL*Net message from client' ela= 624 driver id=1413697536 #bytes=1 p3=0 obj#=77543 tim=7093074878
FETCH #18446604434610142176:c=104,e=104,p=0,cr=1,cu=0,mis=0,r=0,dep=0,og=1,plh=1259244381,tim=7093075164
STAT #18446604434610142176 id=1 cnt=1 pid=0 pos=1 obj=77543 op='TABLE ACCESS BY INDEX ROWID T1 (cr=4 pr=9 pw=0 str=1 time=3109 us cost=2 size=48 card=1)'
STAT #18446604434610142176 id=2 cnt=1 pid=1 pos=1 obj=77545 op='INDEX RANGE SCAN T1_TO (cr=3 pr=8 pw=0 str=1 time=2835 us cost=1 size=0 card=1)'
WAIT #18446604434610142176: nam='SQL*Net message to client' ela= 6 driver id=1413697536 #bytes=1 p3=0 obj#=77543 tim=7093076331

## 10046 portion for Unique index. In this case, once the Blocks are fetched from an Index (cr=2) and Table (cr=1), there is no extra cr.

PARSE #18446604434610123376:c=134,e=135,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1959391432,tim=7120639467
EXEC #18446604434610123376:c=89,e=90,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1959391432,tim=7120640009
WAIT #18446604434610123376: nam='SQL*Net message to client' ela= 8 driver id=1413697536 #bytes=1 p3=0 obj#=0 tim=7120640248
WAIT #18446604434610123376: nam='db file scattered read' ela= 5875 file#=12 block#=260496 blocks=8 obj#=77546 tim=7120646397
WAIT #18446604434610123376: nam='db file sequential read' ela= 64 file#=12 block#=255875 blocks=1 obj#=77543 tim=7120646786
FETCH #18446604434610123376:c=1407,e=6569,p=9,cr=3,cu=0,mis=0,r=1,dep=0,og=1,plh=1959391432,tim=7120646947
STAT #18446604434610123376 id=1 cnt=1 pid=0 pos=1 obj=77543 op='TABLE ACCESS BY INDEX ROWID T1 (cr=3 pr=9 pw=0 str=1 time=6549 us cost=2 size=48 card=1)'
STAT #18446604434610123376 id=2 cnt=1 pid=1 pos=1 obj=77546 op='INDEX UNIQUE SCAN T1_OT_UQ (cr=2 pr=8 pw=0 str=1 time=6240 us cost=1 size=0 card=1)'
WAIT #18446604434610123376: nam='SQL*Net message from client' ela= 583 driver id=1413697536 #bytes=1 p3=0 obj#=77543 tim=7120652705
FETCH #18446604434610123376:c=15,e=15,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=1959391432,tim=7120652925
WAIT #18446604434610123376: nam='SQL*Net message to client' ela= 5 driver id=1413697536 #bytes=1 p3=0 obj#=77543 tim=7120653074

From this, I can assume that the steps carried out for a Non-Unique Index is as under (as it an Index-Range Scan).

-	Read the Root Block to get the address of the Leaf Block (IO = 1). 
- 	Read the Leaf Block to check for the matching Object_ID and Temporary Column. Get the ROWID for the table block. (IO = 2).
-	Go to the Table Block to read the other required columns listed in the SELECT list. (IO=3).
-	Go back to the Index Block to check for any other Object_ID and Temporary Values. (IO=4).
-	It is here that it gets to know that there are no more rows.

Bullet Point 4 is not required for a Unique Scan as Oracle is aware that it is a Unique Index and therefore, there will be no additional read required. Further, to confirm this, I executed a query on OBJECT_ID using an Unique Index. Remember, while Object_Id is 100% Unique, but the Unique Index is on the 2 columns and I am not referring the 2nd column in the query, which will change the plan from UNIQUE SCAN to RANGE SCAN.

select owner, object_name from t1
where      object_id = 58;

Execution Plan
----------------------------------------------------------
Plan hash value: 1834913555

------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |     1 |    46 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1       |     1 |    46 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_OT_UQ |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=58)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets

So, this answers the question raised during the session on the rationale behind 4 Consistent Reads. Unique and Non-Unique Indexes are internally same with only a difference in the way these are accessed. With Index Range Scan, the behaviour of both the indexes are exactly same.

Now, for the another question on why the optimizer used T1_OT (by default) and not T1_TO ? The reason was a TIE between the 2 indexes, which is usually very rare with Cost Based Optimization and due to the TIE, optimizer preferred an Index in an alphabetical order. TI_OT comes before TI_TO. To demonstrate this, I dropped and recreated T1_TO as T1_1TO and optimized started using T1_1TO by default.

Filed under AIOUG, Performance, Uncategorized Tagged with Index, Performance

← Older posts

Real Life Database / SQL Experiences : An Oracle Blog from Vivek Sharma

Time to restart blogging…

Like this:

SELECT AI ! Generative AI on Autonomous Database

Like this:

AIOUG ! Oracle Community Yatra 2022

Like this:

Database Vault ! Autonomous Database

Like this:

Oracle Groundbreaker Yatra ! July 2019..

Like this:

Python – Error Installing cx_Oracle on Mac

Like this:

Autonomous Database ! ADW and ATP. Why do I need both ?

Like this:

Resource Manager – Autonomous Data Warehouse

Like this:

Autonomous Database Tech Day ! Gurgaon

Like this:

Consistent Gets for an Index Scan

Like this:

Blog Subscription on Email

Search on this Blog

Recent Posts

Archives

Top Clicks

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

@vivek_oracle

Like this:

Blog Subscription on Email

Search on this Blog

Recent Posts

Archives

Top Clicks