SELECT AI ! Generative AI on Autonomous Database

Imagine, we communicate with our Database in Business Language and Database responding with a result that matches the Business criteria. Natural Language Processing (NLP) a branch of Artificial Intelligence has this capability of generating a matching result based on the question asked.

SELECT AI, released early this year on Autonomous Database, allow business users to interact with the database and then convert it into Oracle SQL. Under the hood, it leverages generative AI with Large Language Models (LLM) to covert the user input into SQL.

In this blog, I will not write much about what it is, how to configure it and so on…I would request reading Oracle Autonomous Database Documentation that goes through these details (like, Creating Credentials, Creating Profiles for various AI providers etc.) and is regularly updated.

In this blog, I will walk you through few Interesting examples or analysis on Indian Premier League (IPL). For my Indian Readers, IPL is not something that will need an introduction. Most of the Indians are Cricket Fans. In fact, all Cricket playing nation may know about IPL. If not, then a brief introduction is here.

Before I get into some analysis, I thought of sharing something important i.e.How is SELECT AI able to generate a meaningful query, especially if the column names are a bit cryptic ? Remember, LLM’s are models that we need to train and to train these models, we need to provide it all the relevant details. Let us go through two examples.

In this case, I have a Table TEST_VIVEK with two columns X and Y. X has the Customer Names and Y contains Company Names.

CREATE TABLE TEST_VIVEK (
X VARCHAR2(32),
Y VARCHAR2(64));

INSERT INTO TEST_VIVEK VALUES(‘VIVEK’,’ABC CORPORATION’);
INSERT INTO TEST_VIVEK VALUES(‘TOM’,’XYZ INC.’);
INSERT INTO TEST_VIVEK VALUES(‘DICK’,’LLM PRIVATE LIMITED’);
INSERT INTO TEST_VIVEK VALUES(‘HARRY’,’AI INDIA’);
COMMIT;

Now, How do I provide meaningful names to these columns (X & Y)? One way is to create a View.

CREATE OR REPLACE VIEW TEST_VIVEK_VW AS
SELECT X CUSTOMER, Y COMPANY FROM TEST_VIVEK;

Now, I create a PROFILE using DBMS_CLOUD_AI API.

begin
DBMS_CLOUD_AI.create_profile(
profile_name => ‘MY_DEMO_AI_PROFILE’,
attributes => ‘{“provider”:”OCI”,
“credential_name”:”MY_CREDENTIAL”,
“oci_runtimetype”:”COHERE”,
“object_list”: [{“owner”:”admin”,”name”:”TEST_VIVEK_VW”}]
}’);
end;
/

While creating a profile, I provide TEST_VIVEK_VW as the Object to be used. Let us ask few questions (using SQL Developer).

And the Output

The other method is using an in-built database feature COMMENTS. These basically allows you to provide meaningful description and notes about TABLE(S) and COLUMNS. Marty Gubar has a nice blog with an explanation on how to make use of COMMENTS to ensure LLM’s come-up with right query. See this blog.

Time to get into some Analysis on Indian Premier League. I downloaded the data from one of the cricket websites. The files were in CSV format. I loaded it into my ADB using Data Transform. There are two tables IPL_MATCHES and IPL_DETAILS. IPL_MATCHES is a kind of Primary table with 1 Row per match and IPL_DETAILS is a Child Table with Ball-by-Ball Data. Both these tables has a column MATCH_ID. I created a Primary Key on this column on IPL_MATCHES Table and then a Foreign Key on IPL_DETAILS.

The COMMENTS for both the tables and all the relevant columns:

comment on table IPL_MATCHES is ‘contains IPL Season, City, date, player of the match, stadium, winner, target runs, target overs, match result’;
comment on column IPL_MATCHES.MATCH_ID is ‘Match ID. Join this column to other Tables’;
comment on column IPL_MATCHES.IPL_SEASON is ‘IPL Season. Each IPL season will have multiple matches’;
comment on column IPL_MATCHES.CITY is ‘City where the IPL match was played’;
comment on column IPL_MATCHES.MATCH_DATE is ‘Date when the IPL Match was played’;
comment on column IPL_MATCHES.MATCH_TYPE is ‘Type of the IPL Match’;
comment on column IPL_MATCHES.PLAYER_OF_THE_MATCH is ‘Name of the Player who won the player of the Match award’;
comment on column IPL_MATCHES.VENUE is ‘Stadium where the IPL match was played’;
comment on column IPL_MATCHES.TEAM1 is ‘Team 1 playing the ipl match’;
comment on column IPL_MATCHES.TEAM2 is ‘Team 2 playing the ipl match’;
comment on column IPL_MATCHES.TOSS_WINNER is ‘Team that won the Toss’;
comment on column IPL_MATCHES.TOSS_DECISION is ‘Decision taken by the Team winning the toss’;
comment on column IPL_MATCHES.WINNER is ‘Winner of the IPL Match’;
comment on column IPL_MATCHES.RESULT is ‘Result of the IPL Match’;
comment on column IPL_MATCHES.RESULT_MARGIN is ‘Margin by which a team won the match’;
comment on column IPL_MATCHES.TARGET_RUNS is ‘Target Runs or Runs required to win’;
comment on column IPL_MATCHES.TARGET_OVERS is ‘Target Overs’;
comment on column IPL_MATCHES.SUPER_OVER is ‘super over’;
comment on column IPL_MATCHES.METHOD is ‘Win Type’;
comment on column IPL_MATCHES.UMPIRE1 is ‘Umpire 1’;
comment on column IPL_MATCHES.UMPIRE2 is ‘Umpire 2’;

comment on table IPL_DETAILS is ‘Contains ball by ball Detail of all the ipl matches, Batsman Name, Bowler Name, runs scored, player dismissed’;
comment on column IPL_DETAILS.MATCH_ID is ‘Match ID. Join this column to other Tables’;
comment on column IPL_DETAILS.INNING is ‘Inning Number’;
comment on column IPL_DETAILS.BATTING_TEAM is ‘Team Batting’;
comment on column IPL_DETAILS.BOWLING_TEAM is ‘Team Bowling’;
comment on column IPL_DETAILS.OVER_NUMBER is ‘Over Number’;
comment on column IPL_DETAILS.BALL_NUMBER is ‘Ball Number’;
comment on column IPL_DETAILS.BATTER is ‘Batsman’;
comment on column IPL_DETAILS.BOWLER is ‘Bowler’;
comment on column IPL_DETAILS.NON_STRIKER is ‘Non Striker’;
comment on column IPL_DETAILS.RUNS_OFF_BAT is ‘Runs scored’;
comment on column IPL_DETAILS.EXTRA_RUNS is ‘Extra Runs’;
comment on column IPL_DETAILS.TOTAL_RUNS is ‘Total Runs on a Ball’;
comment on column IPL_DETAILS.EXTRAS_TYPE is ‘Extras’;
comment on column IPL_DETAILS.IS_WICKET is ‘Wicket Taken. 1 is Yes and 0 is No’;
comment on column IPL_DETAILS.PLAYER_DISMISSED is ‘Batsman Dismissed, Player Dismissed’;
comment on column IPL_DETAILS.DISMISSAL_KIND is ‘Dismissal Type’;
comment on column IPL_DETAILS.FIELDER is ‘Fielder’;

Object List in the CREATE_PROFILE API has the two Tables IPL_MATCHES and IPL_DETAILS. One additional parameter that I have is COMMENTS:TRUE. Once done, we can go through some interesting analysis.

begin
DBMS_CLOUD_AI.create_profile(
profile_name => ‘VIVEK_AI_PROFILE’,
attributes => ‘{“provider”:”OCI”,
“credential_name”:”MY_CREDENTIAL”,
“comments”:”true”,
“oci_runtimetype”:”COHERE”,
“object_list”: [{“owner”:”admin”,”name”:”ipl_matches”}, {“owner”:”admin”,”name”:”ipl_details”}]
}’);
end;
/

begin
dbms_cloud_ai.set_profile(
profile_name => ‘VIVEK_AI_PROFILE’
);
end;
/

Indian Premier League started in 2008 and the latest League concluded on 26th May 2024.

First Ever Match was played on 18th April 2008. Latest Season of IPL was 2024 and the final match was played between Kolkata Night Riders and Rajasthan Royals on 26th May 2024.

Next, let us get the count of matches played in each of the IPL Season’s. This is typically a count of matches group by Season.

Number of matches in 2011, 2012 and 2013 were more than the other previous seasons. Then these count dropped before going up in 2022. Any specific reasons for this? Let’s see.

We can now see that the number of Teams in 2011, 2012 and 2013 were more than previous seasons. The format of this league is that each team has to play against each other. Hence, as a new team is added to the league, the number of matches go up. In 2014, number of teams reduced back to 8. Season 2022 onwards, we again see an increase in this number.

Next, let’s see the top 5 Batsman with most runs. This will be computed from IPL_DETAILS table as the data here is ball by ball. This means, we have to take a sum of run scored by a batsman on each ball. Now. if you again scroll up and see the column names in IPL_DETAILS, we do not have a column called BATSMAN, nor we have a column called Runs Scored. We do have BATTER and RUNS_OFF_BAT. We passed the relevant information via COMMENTS.

The Query SELECT AI came up with:

As per the analysis, Virat Kohli (V Kohli) is the top batsman with highest number of runs. Next, let’s check top 5 Bowler’s having dominated and dismissed Virat Kohli the most.

Seasonwise IPL Winners and Player of the Final Match.

Let’s ask a bit complex question. I want to calculate the Batting Strike Rate. This is basically computed as the Sum of Runs Scored divided by the Number of Balls faced. The numbers are per 100 ball i.e.if the strike rate is 150, this means, at an average a Batsman Scores 150 runs in 100 Balls. Now, the way our data is stored, we have Runs_Off_Bat and Ball_Number column. This means, I will have to take a sum of Runs_off_Bat, divide it by Count of Ball_Number and this value will then have to be multiplied by 100. To be honest, I had to play around with this question to get it right and after 3 or 4 attempts, I was able to get it.

In a game of Cricket, Strike Rate doesn’t change every few minutes. It only changes during or after a match. Therefore, the next task that I would want to try is creating a Materialized View with a pre-computed Strike Rate, adding it to the object list and then running a simple query to get the Strike Rate. Once I am done, will post the results as an update to this blog. We can play around with this data and perform some more analysis. I actually did many more, but to make this blog short thought of publishing only few of them. I actually wanted to demonstrate the SELECT AI capabilities and how easily it was able to get us the required details. Hope you enjoy this.

Unknown's avatarAbout Vivek Sharma
I am an Oracle Professional from Mumbai (India). I blog on the issues that I feel is Interesting for my readers. Some of these are my real life examples, which I hope, you would find interesting. Comments are always a welcome. The Technical Observations & Views here are my own and not necessarily those of Oracle or its affiliates. These are purely based on my understandings, learnings and resolutions of various customer issues.

One Response to SELECT AI ! Generative AI on Autonomous Database

  1. Guruswamy Dhandapani's avatar Guruswamy Dhandapani says:

    Hi Vivek,

    Very nice to see the pains you have taken to populate this data and make some good and easy examples most of the readers can relate to from IPL.

    Excellently described and your blog is a motivation for me to read further on this AI and get this implemented in our application.

    Thank you for sharing your knowledge. Great work !!

Leave a Reply

Discover more from Real Life Database / SQL Experiences : An Oracle Blog from Vivek Sharma

Subscribe now to keep reading and get access to the full archive.

Continue reading