Sunday, 21 September 2025

Oracle23ai, sql_history, and some options, including for v19

TL;DR: The V$SQL_HISTORY is Very Cool Idea, but it still has some limitations. Also, it is not available in older versions... so I examined a few primitive workarounds and re-took an old trick setting "module".

(and I would say: Dont Try this in Production, yet. But you might be able to use some of it in testing...)

Image: How to prevent my sql-history, my footsteps, from disappearing...


Background: I had a suspicion, and I have older versions.

In a previous blog (link) I discovered V$HISTORY in Oracle23ai, and how it proved its worth in reporting the "Heaviest" SQL from a (short) program or session. On first attempts, primitive unit-tests against 23ai(free), that worked Nicely. It was like checking v$sqlarea, but for just-my-session.

Even with rather primitive code quickly inserted in some python-programs, I got SQL_IDs from slow-ish programs, and discovered some SQL that had remained "below the radar".

One "adjustment" was necessary: Avoid-Errors when running against version 19.x. But that was fixable with a check on v$version and on v$parameter (I'll put links to the code in appendix below).

More Tricky was: When I ran some non-trivial tests, some SQL that I had expected, that I _knew_ was in there,... Didnt show up. So I had a suspicion that all was not well in sql_history-land.


Problem statement: I want the SQL-History... ;-) 

For my current problem-case, I would like to focus on "What did This Session Do". I would prefer each (python)programs to report on Itself, at the end of its own run, using just SQL and stdout.

I'm deliberately excluding SQL-Trace, and I'm also excluding the running of reports for AWR and ASH against the database". Those are possible, but they represent too much work for programs that run (ir)regularly and frequently, during day and night.

And of course, in future, I will Recommend to save some of the logging-data to tables... But that has to wait for now.


But Some SQL was missing...

After the first Haleluja-feeling with v$sql_history, some things didnt add up.  Here is a typical run of a test-program:

At this point, the program has run 114 loops of something, and it will normally have done 114 inserts of the type /* t3/4 indv */, and many other SQLs. But the report only shows 59 executes? And I happen to know that before these 114-loops, there were a few other, probably heavier statements.

With some old scripts to check on v$sqlarea, I was able to verify: yep, the other, heavier statements were in the shared-pool. And the nr executions of the statements were higher than reported by v$sql_history... 

After a bit of poking around, this was the best summary of my finding:

The Feature of SQL_HISTORY, at this point (version 23.9....), will only show 60 items per SID. And that explained the missing SQL.

Only Sixty Records... A typical program will run 1000s, sometimes millions of executes. But this feature will only "keep" the last 60 ??? 

A very quick python-hack had proven useful in Unit Testing, but with a 60-record limit, this would not help a lot in "Production". And I had wanted to Proudly-Present the "heavy-hitter-SQL" at the end of each run... 


So... Can we think of something ? 

Aim is to have a (python-) program report on the SQL-statements done by that program, and tell the Dev or Ops or whoever is looking, which SQL-stmnts consumed most time.

Finding the missing SQLs in V$SQLAREA or V$SQL was the first clue: Can we link the stmnts in those v$views to the running program or individual session ? 

Spoiler: No, we can not (yet)... But we can try to get close.

Let's examine a few (old...) ideas.

Option1: List All SQL. At the end of a program, just report on the heaviest SQL in the shared-pool. A known trick. And rather similar to running an AWR-report or any SQL*Plus script after the program to see what was happening in the database-instance.

On a test-system with only one or few programs running, this might provide relevant information. On a rather busy production system, the data from "just my run" will be lost in the total.

Option2: Use SET_MODULE (always a good practice). A program can identify itself by setting the MODULE property of the session to the program-name. This module-name will then appear in the V$ views, and in the AWR and ASH reports. And it will identify the statements that were used by that program, provided that program was the first to load them into the Shared_pool.

Option3: SET_MODULE to a Unique Name for the SESSION (typically the pair of SID, SERIAL#). As a result, all statements that got loaded-first by the session got tagged. But statements that were already executed by earlier programs, e.g. statements that were already in the shared_pool, did Not get Identified. The result was that any 2nd run or subsequent run of the same program missed most of the statements in the list-of-my-statements.

These Options can be improved by doing flush shared_pool; at the start of a run. But but in serious Production, the flushing of sh-pool is Not recommended. Not At All.

You can think of several more options. For example, I am a big fan of watermarking queries with a  /* wm42 comment */ in the statement-code. However, marking statements for individual sessions might provide a huge overhead in parsing. Not Recommended either (but nothing will stop you from trying... ). And another problem with /* watermarking */ is that you can not watermark the SQL in code that you dont control.

Others will point out v$active_session_history and ASH-reports. The ASH (link) is a good tool to have! But for this case, the problem is the 1sec sample-interval. Picking SQL from the ASH-view will only find the sampled-ones, and in our case it tends to miss or to under-represent the high-frequency SQL.

.... and I have to stop discussing options at some point, 

I'll just kick the rest of the ideas into an appendix below...


In Concept (this works with 19c as well)... 

After pondering over the options, and revisiting some old ideas, I decided to try one python function to cater for the three options above.

Key tricks would be:

1. Watermark the session by module name.

2. Query v$sqlarea for all SQL with that module_name.

The python driver (doc here - link ) allows for easy setting of the module. And we can provide several levels of detail.

In this case, using python, we can set it in two ways. The simple way:

connection.module = 'some_program_name'

or, we can provide more detail in the module:

connection.module = 'some_program_name' + ':(sid,serial#)'

In the second, more detailed case, the module-value will represent the Unique Session in the instance, e.g. "this very run of the program".

Once the session has the module set, every new SQL that gets hard-parsed by the session will have the module-property set to the value of this session.

Following this, we can query the v$sqlarea. The simplest query, corresponding to Option1 above, would be:

select * from v$sqlarea 

where module like '%' 

order by elapsed_time ;

This will return all SQL-statements in the shared_pool. 
This is what we have been doing for years to find the heavy SQL. But the resulting SQL is instance-wide, it can be a very, very long list, and doesnt focus on our program or session at all. In a busy Prod system this will generally tell you only what you already know from the AWR reports.

To find more specific SQL for the current program, we can query for the program_name:

where s.module like 'some_program_name%'

This will give us the equivalent of Option2 above: all SQL that was ever initiated by some run of this program. This trick goes back to any old versio of oracle where we could and did used to set the module-value to the name of the program.

This option would be my recommended usage.

But to go very specific, we can ask for the SQL that was first-parsed by this very session:

where s.module like 'some_program_name:(SID,SERIAL#)%'

This will only list the SQL that was issued by the current session, and was issues for the first time (e.g. was parsed by this session).

There are pros and cons to each of the three options. But an advantage of this concept is: A Single SQL, with a like-condition, can cover each of the options, depending on what we but in the LIKE-clause.


Let's recap the options...

Option1: By asking for Everything: 

Where s.module like '%'

we will see a long list of SQL. On unit-test runs, or on systems where the shared-pool was flushed, the SQL may be relevant to the running program. But on a true Production-system, it is unlikely that the heaviest-statements happen to come exactly from the running-program.

Option3: When we ask for SQL from precisely this program-session:

where s.module like 'program:(sid,serial)%'

we will get only the statements from this program that were parsed, or issued-first, by this very run of the program. This is mainly useful when the program is running for the 1st-time against this instance of the database. One effect is that on 2nd and subsequent runs, we miss a lot of statements bcse they were parsed by previous runs and carry the module-name from those previous runs.

I came to the conclusion that I prefer Option2,  To list All SQL that was ever initiated by this program.

But you have options to choose from, even though I consider none of them to be "Ideal"... 


In python code (link to code in appedix)...

(note: you also use this concept in SQLcl, Java(Script) or in any other SQL-client, but I'm currently working with python....).

The function that picks up the statements starts like this:

If there is no module specified: the default for the_module='%' and will fetch all of v$sqlarea.

If the module-name is specified, including the (sid,serial#), it picks the very precise statements from This Session.

But in most cases, I would specify just the program name, and take into account that the resultset also contains the statements from previous run by the same module.

The self-test includes all three calls to the function like this:

The first call, INSTANCE, will list the Heaviest Statements from the sh-pool, regardless of which module they belong to. I had to include a limit of 100 stmnts to prevent excessive scrolling...

The second call, MODULE, we give it the module-name, and this will list the stmnts that belong to the module designated as g_module_name with no (sid,session#) specified. I recommend this type of usage as probably the most useful, but your situation may differ.

The third call, SESSION, we add (sid,serial#) to the module-name, and it will only list the statements parsed by this particular session.

On any initial-run of a program, the 2nd and 3rd call will yield more or less identical lists. But on subsequent runs of the same program, the 3rd call, the precise-one, will skip the statements that have already been parsed on earlier run.

Hence the 2nd option, with just the module name, no (sid,seria#), is my recommended way to use this.

But You can Choose which for yourself how you want to use this, if at all...


The Demo (actually a self-test)...

For practical reasons, I limit the demo to Select-only, so you can run it from any user, provided you have access to the dictionary (grant advisor, grant any select any dictionary, see below).

The easiest demo/test you can run is the self-test of the source-file, ora_login.py. Try it:

$ python3 ora_login.py

If you run it more than once, then at the end of the program you should see... 

Notice in this (partial) output: The first list are all SQLs ever done (hard-parsed) by this program. In my case, a total of 7 statements, and your result should be similar. 

But the second list, the search for SQL stmnts "watermarked" by this very session, is empty. That is because all the stmnts had already been parsed in previous runs, and were thus marked by module-names from those previous runs. The current-running module, with name 'ora_login_selftest:(248,64897)' has not added any new stmnts to the shared-pool, and thus that query returned ... 0 stmnts. 

If you run the self-test for the first time, or on a "fresh" instance, or after flush of shared_pool, you will see that the 2nd list actually does contain records. And on DEV/TEST systems, which often re-start, this can bring in some usable results.... 

Now I think you can see why I would use this with just the module, and not use the additional (sid,serial#) to watermark the module name.

But even then, it is quite possible that my program has called on queries that were parsed earlier by other modules. Hence there is no certainty that I catch every SQL done by this program. 

It works, sort of, but The Ideal solution is not yet found... 


The Downsides... 

The Concept of having every program "Report on its own SQL" is not quite feasible, or at least, Not Yet. 

The buffer for v$sql_history is too small to be of use, hence I came up with this old-fashioned and somewhat twisted-looking alternative using V$SQLAREA.

If you decide to try some if this, Notably Beware of the following:

D1. Creating many module names can get in the way of other systems that use the v$sqlares, notably your AWR reports will look more chaotic. Your DBA may get confused.

D2. Excessive queries on v$sqlarea can cost you in CPU-load and latch-contention. I am not sure how bad this still is in latest versions, but beware. Especially on Large (prod) systems with huge Shared_pool.

D3. You may still not catch Exactly the queries you want. Only an SQL-Trace (the 10048) can catch all and exactly what a session does.

... And I'm sure the more knowledgable oracle-gurus can point out other, more detailed problems of tweaking items in the shared-pool.

Note that some of this trickery feels like "back to square one". Again I revert back to the advice of "flush shared_pool", or even  "startup-force the Instance" or restart of the container to have a clean Instance on relevant testing. One of the reason these rather primitive tricks work so well in DEV/OPS environments is because a lot of the DEV databases run in containers and are re-started for nearly every test-cycle. Hence the sh-pool is always "clean" and the top-SQL is often relevant.


Summary: Pick a SQL-History solution for your use-case

If you want a program to report on its SQL, and V$HISTORY does not (yet) work for your situation (bcse you are on v19, or bcse you cannot set the parameter, or bcse you do more than 60 SQL-statements...), You have some Alternatives.

However, short from doing an sql-10048-trace, there is no easy Ideal Solution. None of the options described is able to pick out Exactly the statements done by an individual session (yet). 

A good compromise is to have each program set the MODULE to the name of the program. This will allow to identify statements in the shared-pool, and in AWR reports, and attribute them to specific programs. 

Going one step further, and setting module to a name that is session specific is probably overkill but may be useful in some situations.

You can choose from the above, or devise a variety on the options to suit your situation. And there are some more ideas in the notes in appendix below.

But you will have to realise the shortcomings, and the risks, of each method.

I will end with a shoutout to Big Red: The V$HISTORY feature of Oracle23ai has Great Potential. But it needs a bit of improvement still.


-- -- -- -- -- End of this blogpost, for now -- -- -- -- --

Appendix: links to scripts and components...


ora_login.py :  contains all relevant functions.

duration.py : Time-measurement, needed as "stopwatch" for some functions.

.env : edit to include your own  scott / tiger @ host : port / service


Also needed, these python components:

pip oracledb : install the oracle driver

pip dotenv : install dotenv (used to store credentials)


Additionally: the grants...

To get to your v$ views, ... If you can get it past your DBA and your security-folks, I can recommend these two:

grant ADVISOR to scott ; 

grant SELECT ANY DICTIONARY to scott ; 

Your account still wont be able to see data in other schemas, but you will see the v$stuff, all the relevant views.. 

note: I am close to creating a separate github-repo for each blog... 


-- -- -- -- -- End of this blogpost, keep some notes below -- -- -- -- --

More Ideas (just a collection of notes...): 

i_1. Sql-trace to trace-file, expose via view, examine details... Overhead, extra-work.

i_2: Watermark SQL-stmnts: not practical, no total-control, excessive parsing...

i_3: ASH-run right-after. Not practical in our case. And ash-interval too long.

i_4: More v$sql_history, 60+, backport etc.. ; need to ask Oracle...

i_5: Tweak python driver, produce tracefile  on app-server? complicated ? (too lazy)

i_6: Use set_action as well as module.. put (sid,serial) in set_action?

i_7: Flush shared-pool (Not Recommended on busy systems), keep as option ?

i_8: Insert/merge the contents of v$sql_history to a table. This would require frequent saves as SQL can sometimes be fired at rates of multiple per ms.

i_9: detect the DEV / TEST / PROD databases from the service_name, and act accordingly: on DEV always list the whole sql-area?


pffwt.. You Guessed it: You can experiment and play with this forever...

-- -- -- -- -- End of this blogpost, for Real -- -- -- -- --

No comments: