Sunday, 28 September 2025

Oracle23ai, python, and v$history for your developers...

TL;DR: We got the v$sql_history feature configured for use in development. Python programs can now report on the SQL statements done Exactly by Them.
Will this replace sql_trace one day ? 

Image: (copied from some "History" page...) Keeping History is Important. 


Background: What SQL you did spent your time on...?

After tackling most of the latency-problems, the "suspicion of slow SQL" came back. Not in the last place, because most python programs started reporting their "App-time", "Network-time" and "DB-time" components. So now we need to find the slow-SQL again...

From AWR and ASH, we have a fair idea what the most over-all time-consuming SQL stmnts are. And they are "known good": they need to be running, and they have little room left for improvement.

But what else is there? Which stmnts remain "under the radar". Which New stmnts needed attention, and which ones might be performing erratic.

Hence the wish: Let Each program report on "What it Did in the Database", and notably, what were the most time-consuming SQL-stmts. 

This is what v$sql_history looked Ideal for...


Earlier attempts...soso.

In a previous blog (link), I described how I hit the limits of sql_history: only on 23ai, and only when the parameter was set. I made a feeble attempt (used an old trick), to report on SQL done by specific programs or modules, based on "module" in the session-properties. This worked to a point, but it wasnt what I wanted. Not after I had seen the possibilities of v$sql_history.

additional note + disclaimer: I've tweaked a lot of Oracle databases and I have also recently experimented with the  active_session_history that was built into Yugabyte,  and did some experimenting with a the feature. I guess I am in hobby-obsession-territory here (link to sibling blog).

Oracle wise, I would expect a (hidden) parameter to overcome that limit of 60? But searching the view x$ksppi didnt seem to give me any candidates, so I started asking around...  


Friends (from oracle) to the rescue...

Martin Berger, from Oracle RWP-Group, pointed me to his blog (link) : Yes, there was a hidden parameter to help increase the nr of stmnts kept in v$sql_history (and it turned out I had queried for hidden parameters against a v19 database... sigh)

And a week later, Connor also "answered the question" in his office-hours session of 24 Sept 2025. Thx Connor! (link).

Of course you need the, non-default, enabler:

Alter system set sql_history_enabled=True;

And then this (hidden)parameter to allow more history than the dflt 50:

Alter system set "_sql_history_buffers"=10000 ;

Setting those parameters had my problem fixed, to some point. Development happens mostly on laptops, the Dev's very own machines, using the 23ai containers from Gerald Venzl (link). Hence, I could have the feature enabled and buffers set for developers fairly easily (those containers have a very easy provision to allow scripts at db-create-time and/or at db-startup time. Great tools for simple-devs!).


The (python) code.

Firstly, I experimented with SQL*plus, and the result was the script mysqlhist.sql  (link below). I'll probably improve that over time as it gets used. Notice that I report "aggregated" values per SQL. In some cases you might want to report the complete history, all statements in order of execution. That list can get Very Long Very Quicly... I was more interested in which of the statements took most time overall, e.g. accumulated.

But my original search into history was for the python programs...
 
A function got includee into my generic py-file with oracle-stuff,  ora_login.py. This import-able file begun life with the login-function, and gradually grew to contain my oracle-toolkit.  Link below.

I included the checks on version and parameters, just in case, so when someone accidentally runs this code against the wrong database, nothing bad happens:


Notice: this test was against v19, and it only produces the friendly 3-line message. The same message will appear if history-enabled is not set.

But when a call to the function ora_sess_hist ( ora_conn)  is done at the end of a python program that is connected to an "enabled" database, it will use the ora_conn to query v$sql_history. And output will looks like this: 


Notice how, at the end of the run, this program called two functions: first to report history and do the listing x lines of SQL (limited to 100, to keep it readable). We notice 8 different SQLs, one of which was executed 1000 times totalling some 236495 microsec. But the "biggest SQL" in this run was the one that was executed 5 times, totalling 5.6 seconds.

After listing the history, this program also calls the "ora time spent" function to report how much time it has spent in the various components: in App, in DB, and (a guesstimate) in Network doing RoundTrips. You can combine this time-reporting with the linux time command to double-check (see previous blogs as well).

Notice that the DB-time, listed as 6.612 sec, and it corresponds roughly with the total of the sum of the elapsed time values (which are in microseconds!). Consider this a useful double-check.

This example of course contains just some bogus SQL for demo-reasons. The self-test of ora_login.py will yield similar limited results.
But please feel free to test this yourself and let me know. 


What we dont quite know yet... 

The main uncertainty is the behaviour of sessions with the new parameters. This is probably why the feature is default not enabled (yet), and why the value of "buffers" is (IMHO) rather low at 50 or 60.

On our current dev-container-version, v23.9.0.25.07, the default value for sql_history_enabled is False, and when enabled the dflt buffers seems set to 50. If this parameter ever hits "Production" with high values, what will be the effect on PGA and behaviour ? Where are the practical limits and what do they depend on....?

There are a few other anomalies to note, possibly to investigate and keep an eye on in future releases:

Buffers? 
The actual nr of records in the view that came out was mostly just a few more than the setting of "buffers", say 8-10 more lines more. Not sure why/how.
(future test: various buffer-settings, and the actual impact on PGA with/out history)

Session-history private?
Sessions by the same user could see each others records. This may be why the SID is part of the view. Not sure if this is intentional or not. I'm curious to see how this feature will end up in Prod.
(future check: can system or sysdba also see all history from every session ?  )

Double counting?
Rows 2 and 6 from sql_hist in the screen-print above: The pl/sql block from line 6 contains the create-stmnt from line2. I suspect there is a double count in there. Not a big problem, but I'd like to verify that some time. There is probably some way to avoid that using top_level_sql_id... (Later!)


Summary: Got it Working.

This now works for Dev, and can work in production if/when we every get this feature in a Prod database. We can call a python function to list the v$sql_history, or at least the last 10k statements from that program.
With some checks, that function-code can now also be run without causing errors in Prod, on v19 or other database, or on a system with the history parameters "disabled", as they are by default.

I would still be careful to use this feature "in anger" as the effect on PGA-memory is not really known or tested yet.

And if needed, we can always revert to searching statements from v$sqlarea, as demonstrated in previous blog (link).

In future, I might even enhance the python-function to save the history to a table before exiting the program. Since history is currently only taken in DEV, on machines and databases that are very volatile (many re-starts, re-creates...), saving that data is not practical or useful, yet. 

Once history is also available in pre-prod and Production systems, I might want to persist history before exiting program. Until then, Prod will have to rely on AWR and ASH to find problematic SQL.

And I can see how this feature may replace sql_trace ?


Wishlist... Yes!
(Someone at Oracle is probably also thinking about this ...) 

Session: I would like to be able to set the parameters on a Session basis and only consume resource for specific sessions that Need/Want to keep their History. A bit like sql-trace, but without the file-interface.

AWR/ASH: I expect the v$sql_history to be available in combination with AWR or ASH in the near future.


-- -- -- -- -- End of this blogpost, for now -- -- -- -- --

Appendix: links to source-files.

mysqlhist.sql :  for use in SQL*Plus or SQLcl

tst_hist.py : short demo program.

ora_login.py : contains the actual funtion ora_sess_hist ( conn )

.env : used by dotenv, edit to add your own credentials.

Also needed are these python components:

pip oracledb : install the oracle driver

pip dotenv : install dotenv (used to store credentials)

Additionally: the grants...

To get to your v$ views, ... If you can get it past your DBA and your security-folks,... I can recommend these two:

grant ADVISOR to scott ; 

grant SELECT ANY DICTIONARY to scott ; 

Your account still wont be able to see data in other schemas, but you will see the v$stuff, all the relevant views.. 

-- -- -- -- -- End of this blogpost, For Real -- -- -- -- --

Sunday, 21 September 2025

Oracle23ai, sql_history, and some options, including for v19

TL;DR: The V$SQL_HISTORY is Very Cool Idea, but it still has some limitations. Also, it is not available in older versions... so I examined a few primitive workarounds and re-took an old trick setting "module".

(and I would say: Dont Try this in Production, yet. But you might be able to use some of it in testing...)

Image: How to prevent my sql-history, my footsteps, from disappearing...


Background: I had a suspicion, and I have older versions.

In a previous blog (link) I discovered V$HISTORY in Oracle23ai, and how it proved its worth in reporting the "Heaviest" SQL from a (short) program or session. On first attempts, primitive unit-tests against 23ai(free), that worked Nicely. It was like checking v$sqlarea, but for just-my-session.

Even with rather primitive code quickly inserted in some python-programs, I got SQL_IDs from slow-ish programs, and discovered some SQL that had remained "below the radar".

One "adjustment" was necessary: Avoid-Errors when running against version 19.x. But that was fixable with a check on v$version and on v$parameter (I'll put links to the code in appendix below).

More Tricky was: When I ran some non-trivial tests, some SQL that I had expected, that I _knew_ was in there,... Didnt show up. So I had a suspicion that all was not well in sql_history-land.


Problem statement: I want the SQL-History... ;-) 

For my current problem-case, I would like to focus on "What did This Session Do". I would prefer each (python)programs to report on Itself, at the end of its own run, using just SQL and stdout.

I'm deliberately excluding SQL-Trace, and I'm also excluding the running of reports for AWR and ASH against the database". Those are possible, but they represent too much work for programs that run (ir)regularly and frequently, during day and night.

And of course, in future, I will Recommend to save some of the logging-data to tables... But that has to wait for now.


But Some SQL was missing...

After the first Haleluja-feeling with v$sql_history, some things didnt add up.  Here is a typical run of a test-program:

At this point, the program has run 114 loops of something, and it will normally have done 114 inserts of the type /* t3/4 indv */, and many other SQLs. But the report only shows 59 executes? And I happen to know that before these 114-loops, there were a few other, probably heavier statements.

With some old scripts to check on v$sqlarea, I was able to verify: yep, the other, heavier statements were in the shared-pool. And the nr executions of the statements were higher than reported by v$sql_history... 

After a bit of poking around, this was the best summary of my finding:

The Feature of SQL_HISTORY, at this point (version 23.9....), will only show 60 items per SID. And that explained the missing SQL.

Only Sixty Records... A typical program will run 1000s, sometimes millions of executes. But this feature will only "keep" the last 60 ??? 

A very quick python-hack had proven useful in Unit Testing, but with a 60-record limit, this would not help a lot in "Production". And I had wanted to Proudly-Present the "heavy-hitter-SQL" at the end of each run... 


So... Can we think of something ? 

Aim is to have a (python-) program report on the SQL-statements done by that program, and tell the Dev or Ops or whoever is looking, which SQL-stmnts consumed most time.

Finding the missing SQLs in V$SQLAREA or V$SQL was the first clue: Can we link the stmnts in those v$views to the running program or individual session ? 

Spoiler: No, we can not (yet)... But we can try to get close.

Let's examine a few (old...) ideas.

Option1: List All SQL. At the end of a program, just report on the heaviest SQL in the shared-pool. A known trick. And rather similar to running an AWR-report or any SQL*Plus script after the program to see what was happening in the database-instance.

On a test-system with only one or few programs running, this might provide relevant information. On a rather busy production system, the data from "just my run" will be lost in the total.

Option2: Use SET_MODULE (always a good practice). A program can identify itself by setting the MODULE property of the session to the program-name. This module-name will then appear in the V$ views, and in the AWR and ASH reports. And it will identify the statements that were used by that program, provided that program was the first to load them into the Shared_pool.

Option3: SET_MODULE to a Unique Name for the SESSION (typically the pair of SID, SERIAL#). As a result, all statements that got loaded-first by the session got tagged. But statements that were already executed by earlier programs, e.g. statements that were already in the shared_pool, did Not get Identified. The result was that any 2nd run or subsequent run of the same program missed most of the statements in the list-of-my-statements.

These Options can be improved by doing flush shared_pool; at the start of a run. But but in serious Production, the flushing of sh-pool is Not recommended. Not At All.

You can think of several more options. For example, I am a big fan of watermarking queries with a  /* wm42 comment */ in the statement-code. However, marking statements for individual sessions might provide a huge overhead in parsing. Not Recommended either (but nothing will stop you from trying... ). And another problem with /* watermarking */ is that you can not watermark the SQL in code that you dont control.

Others will point out v$active_session_history and ASH-reports. The ASH (link) is a good tool to have! But for this case, the problem is the 1sec sample-interval. Picking SQL from the ASH-view will only find the sampled-ones, and in our case it tends to miss or to under-represent the high-frequency SQL.

.... and I have to stop discussing options at some point, 

I'll just kick the rest of the ideas into an appendix below...


In Concept (this works with 19c as well)... 

After pondering over the options, and revisiting some old ideas, I decided to try one python function to cater for the three options above.

Key tricks would be:

1. Watermark the session by module name.

2. Query v$sqlarea for all SQL with that module_name.

The python driver (doc here - link ) allows for easy setting of the module. And we can provide several levels of detail.

In this case, using python, we can set it in two ways. The simple way:

connection.module = 'some_program_name'

or, we can provide more detail in the module:

connection.module = 'some_program_name' + ':(sid,serial#)'

In the second, more detailed case, the module-value will represent the Unique Session in the instance, e.g. "this very run of the program".

Once the session has the module set, every new SQL that gets hard-parsed by the session will have the module-property set to the value of this session.

Following this, we can query the v$sqlarea. The simplest query, corresponding to Option1 above, would be:

select * from v$sqlarea 

where module like '%' 

order by elapsed_time ;

This will return all SQL-statements in the shared_pool. 
This is what we have been doing for years to find the heavy SQL. But the resulting SQL is instance-wide, it can be a very, very long list, and doesnt focus on our program or session at all. In a busy Prod system this will generally tell you only what you already know from the AWR reports.

To find more specific SQL for the current program, we can query for the program_name:

where s.module like 'some_program_name%'

This will give us the equivalent of Option2 above: all SQL that was ever initiated by some run of this program. This trick goes back to any old versio of oracle where we could and did used to set the module-value to the name of the program.

This option would be my recommended usage.

But to go very specific, we can ask for the SQL that was first-parsed by this very session:

where s.module like 'some_program_name:(SID,SERIAL#)%'

This will only list the SQL that was issued by the current session, and was issues for the first time (e.g. was parsed by this session).

There are pros and cons to each of the three options. But an advantage of this concept is: A Single SQL, with a like-condition, can cover each of the options, depending on what we but in the LIKE-clause.


Let's recap the options...

Option1: By asking for Everything: 

Where s.module like '%'

we will see a long list of SQL. On unit-test runs, or on systems where the shared-pool was flushed, the SQL may be relevant to the running program. But on a true Production-system, it is unlikely that the heaviest-statements happen to come exactly from the running-program.

Option3: When we ask for SQL from precisely this program-session:

where s.module like 'program:(sid,serial)%'

we will get only the statements from this program that were parsed, or issued-first, by this very run of the program. This is mainly useful when the program is running for the 1st-time against this instance of the database. One effect is that on 2nd and subsequent runs, we miss a lot of statements bcse they were parsed by previous runs and carry the module-name from those previous runs.

I came to the conclusion that I prefer Option2,  To list All SQL that was ever initiated by this program.

But you have options to choose from, even though I consider none of them to be "Ideal"... 


In python code (link to code in appedix)...

(note: you also use this concept in SQLcl, Java(Script) or in any other SQL-client, but I'm currently working with python....).

The function that picks up the statements starts like this:

If there is no module specified: the default for the_module='%' and will fetch all of v$sqlarea.

If the module-name is specified, including the (sid,serial#), it picks the very precise statements from This Session.

But in most cases, I would specify just the program name, and take into account that the resultset also contains the statements from previous run by the same module.

The self-test includes all three calls to the function like this:

The first call, INSTANCE, will list the Heaviest Statements from the sh-pool, regardless of which module they belong to. I had to include a limit of 100 stmnts to prevent excessive scrolling...

The second call, MODULE, we give it the module-name, and this will list the stmnts that belong to the module designated as g_module_name with no (sid,session#) specified. I recommend this type of usage as probably the most useful, but your situation may differ.

The third call, SESSION, we add (sid,serial#) to the module-name, and it will only list the statements parsed by this particular session.

On any initial-run of a program, the 2nd and 3rd call will yield more or less identical lists. But on subsequent runs of the same program, the 3rd call, the precise-one, will skip the statements that have already been parsed on earlier run.

Hence the 2nd option, with just the module name, no (sid,seria#), is my recommended way to use this.

But You can Choose which for yourself how you want to use this, if at all...


The Demo (actually a self-test)...

For practical reasons, I limit the demo to Select-only, so you can run it from any user, provided you have access to the dictionary (grant advisor, grant any select any dictionary, see below).

The easiest demo/test you can run is the self-test of the source-file, ora_login.py. Try it:

$ python3 ora_login.py

If you run it more than once, then at the end of the program you should see... 

Notice in this (partial) output: The first list are all SQLs ever done (hard-parsed) by this program. In my case, a total of 7 statements, and your result should be similar. 

But the second list, the search for SQL stmnts "watermarked" by this very session, is empty. That is because all the stmnts had already been parsed in previous runs, and were thus marked by module-names from those previous runs. The current-running module, with name 'ora_login_selftest:(248,64897)' has not added any new stmnts to the shared-pool, and thus that query returned ... 0 stmnts. 

If you run the self-test for the first time, or on a "fresh" instance, or after flush of shared_pool, you will see that the 2nd list actually does contain records. And on DEV/TEST systems, which often re-start, this can bring in some usable results.... 

Now I think you can see why I would use this with just the module, and not use the additional (sid,serial#) to watermark the module name.

But even then, it is quite possible that my program has called on queries that were parsed earlier by other modules. Hence there is no certainty that I catch every SQL done by this program. 

It works, sort of, but The Ideal solution is not yet found... 


The Downsides... 

The Concept of having every program "Report on its own SQL" is not quite feasible, or at least, Not Yet. 

The buffer for v$sql_history is too small to be of use, hence I came up with this old-fashioned and somewhat twisted-looking alternative using V$SQLAREA.

If you decide to try some if this, Notably Beware of the following:

D1. Creating many module names can get in the way of other systems that use the v$sqlares, notably your AWR reports will look more chaotic. Your DBA may get confused.

D2. Excessive queries on v$sqlarea can cost you in CPU-load and latch-contention. I am not sure how bad this still is in latest versions, but beware. Especially on Large (prod) systems with huge Shared_pool.

D3. You may still not catch Exactly the queries you want. Only an SQL-Trace (the 10048) can catch all and exactly what a session does.

... And I'm sure the more knowledgable oracle-gurus can point out other, more detailed problems of tweaking items in the shared-pool.

Note that some of this trickery feels like "back to square one". Again I revert back to the advice of "flush shared_pool", or even  "startup-force the Instance" or restart of the container to have a clean Instance on relevant testing. One of the reason these rather primitive tricks work so well in DEV/OPS environments is because a lot of the DEV databases run in containers and are re-started for nearly every test-cycle. Hence the sh-pool is always "clean" and the top-SQL is often relevant.


Summary: Pick a SQL-History solution for your use-case

If you want a program to report on its SQL, and V$HISTORY does not (yet) work for your situation (bcse you are on v19, or bcse you cannot set the parameter, or bcse you do more than 60 SQL-statements...), You have some Alternatives.

However, short from doing an sql-10048-trace, there is no easy Ideal Solution. None of the options described is able to pick out Exactly the statements done by an individual session (yet). 

A good compromise is to have each program set the MODULE to the name of the program. This will allow to identify statements in the shared-pool, and in AWR reports, and attribute them to specific programs. 

Going one step further, and setting module to a name that is session specific is probably overkill but may be useful in some situations.

You can choose from the above, or devise a variety on the options to suit your situation. And there are some more ideas in the notes in appendix below.

But you will have to realise the shortcomings, and the risks, of each method.

I will end with a shoutout to Big Red: The V$HISTORY feature of Oracle23ai has Great Potential. But it needs a bit of improvement still.


-- -- -- -- -- End of this blogpost, for now -- -- -- -- --

Appendix: links to scripts and components...


ora_login.py :  contains all relevant functions.

duration.py : Time-measurement, needed as "stopwatch" for some functions.

.env : edit to include your own  scott / tiger @ host : port / service


Also needed, these python components:

pip oracledb : install the oracle driver

pip dotenv : install dotenv (used to store credentials)


Additionally: the grants...

To get to your v$ views, ... If you can get it past your DBA and your security-folks, I can recommend these two:

grant ADVISOR to scott ; 

grant SELECT ANY DICTIONARY to scott ; 

Your account still wont be able to see data in other schemas, but you will see the v$stuff, all the relevant views.. 

note: I am close to creating a separate github-repo for each blog... 


-- -- -- -- -- End of this blogpost, keep some notes below -- -- -- -- --

More Ideas (just a collection of notes...): 

i_1. Sql-trace to trace-file, expose via view, examine details... Overhead, extra-work.

i_2: Watermark SQL-stmnts: not practical, no total-control, excessive parsing...

i_3: ASH-run right-after. Not practical in our case. And ash-interval too long.

i_4: More v$sql_history, 60+, backport etc.. ; need to ask Oracle...

i_5: Tweak python driver, produce tracefile  on app-server? complicated ? (too lazy)

i_6: Use set_action as well as module.. put (sid,serial) in set_action?

i_7: Flush shared-pool (Not Recommended on busy systems), keep as option ?

i_8: Insert/merge the contents of v$sql_history to a table. This would require frequent saves as SQL can sometimes be fired at rates of multiple per ms.

i_9: detect the DEV / TEST / PROD databases from the service_name, and act accordingly: on DEV always list the whole sql-area?


pffwt.. You Guessed it: You can experiment and play with this forever...

-- -- -- -- -- End of this blogpost, for Real -- -- -- -- --

Wednesday, 17 September 2025

Oracle23ai, and a new kind of history.

TL;DR: I discovered the view to find sql_history of a session. I can use that to report "what a session has done". Much more Simple than via other means.

This feature is Currently only available in latest version, 23ai...but I really like the concept. Check for example this blog by Ulrike Schwinn.
This Feature has Potential!


Image: Developers love to hand over a problem. Especially if they can say "This SQL is now Your problem..."


Background: What did that program do... ?

In earlier (python)work, I've demonstrated how to pinpoint the "latency problem". And just from reporting the "time lost" and the (variable, erratic) latency measured from ping-times, We were able to find and then eliminate a lot of "lost time" (one of several examples: eliminate unnecessary roundtrips)

But latency isnt the only time-sink problem. The RDBMS takes up time, DB-time. And the App itself needs time to do work...

With a little more effort, I was able to, primitively, report the total time spent in a python program, and split it out into three components (documented here)

Total = App-time + DB-time + Network-Latency.

One by-product of this measurement was that at least some of the slowness could be pointed towards the python-program, and the need for "Faster CPUs" to run our python-programs (they run in processing-containers, deployed in various environments that are often beyond the comprehension of my boomer-legacy knowledge).

But then there was the DB-time, which needed attention...


Old (and good) solutions: ASH, AWR, Trace, profiling...

In traditional DBA-work, the Oracle-gurus will come up with AWR and ASH reports, sometimes even with SQL-trace files (tkprof, anyone?) to find slow queries and then "tune" those.

AWR will tell us what the top-queries are. But not where they come from, nor which specific programs or runs caused the lost time. It is the old problem of "an aggregate does not give you details".

ASH can tell you more details but has the drawback of sample-time and you end up spending a lot of time correlating SID/Serial with ASH-records. Again, not all of the precise info for a single session that I want.

SQL-Trace outputfiles will give you Relevant Info on just the session you need. But trace and tkprof also come with challenges. Sql-trace involves notable overhead, and the trace-files reside on the Scacred-Database-Server. You can, with some effort, download the trc-file, but then you dont have tkprof to "profile" it. And you will find that the profile-information is overkill to a DEV-person who wants you to "just fix some query" anyway.

Note: as a DBA, I Appreciate all those tools, and I could not do without them. And if you have time + access available, I can Recommend them for use in Troubleshooting!

But for my current DEV/OPS-environment, I need something more Simple...

The average DEV/OPS-person would like a very simple script or report to look at. In many cases, a select from V$SQLAREA order-by-elapsed is all they can handle. And there lies some of this solution...


New in Oracle23ai: V$SQL_HISTORY

My Austrian friend Martin Berger, pointed this out: There is a new view available: V$SQL_HISTORY (link). 

This view currently still has to be enabled by a parameter:

SQL > alter system set sql_history_enabled=true scope=both;

And beware: this parameter is SYSTEM-wide. You need to set it at CDB or PDB level, and you can Not set it at Session-level. Martin Bach pointed that out in this forum post (link)

With this view enabled, I was able to create a script to list the SQL done by the a session. Here is the SQL*Plus variant:


Using the SID, we can limit the records to "just from this session". 
We see that the history for this short demo-session contains 6 statements. The longest SQL was an INSERT that was executed 7 times and consumed a total of 3.9 seconds Elapsed. This was also close to what the total of "set timing on" reported for this query. First impression: Looks Good.

For calibration, I included a call to spinf_n ( 0.5 ), which is supposed to keep the CPU busy for 0.5 sec. And line 5 of the query-result reports something very close to 0.5 sec. That also looks Good, QED.

Now note: there seems to be some anomaly in reporting of sleep-time. On the second line we have a sleep of supposedly 2.1 seconds, but it is listed as having only 417 microseconds "elapsed".  The result from "set timing on" was close to 2.1 seconds though. Something for later research.

But the concept is demonstrated: 
We can List the Queries from our session, and pinpoint the ones that take up most of the time.



Built it into a python-program...

We have something here: Every (batch, container, utility) python program can now report its top-SQL to stdout. Devs van read that output and cp/paste the offending SQL_ID for further examination...

(On 2nd thought, my first-attempts to include something in a python-program got rather messy and error-prone. Not in the last place, bcse a lot of my instances are 19c, causing pythong to crash with. ORA-00942. I'll do a screenshot, and then alter and beautify my code + output for future blog...)


The screenshot shows output from a (very short demo-) program that did a few queries. The idea is to list the SQL (part of each SQL), with an indication of how much time was spent on that SQL. It will serve two purposes: 1) give the DEV an idea which queries used up the time, and 2) supply the SQL_ID that can be used for further investigation.

Anyway... This seems to work! 
And the DEVs like the idea: they can kick the problem back to the DBA.
It just needs a bit more "effort". Some Quality-Python time.

But it is working already in the DEV-environments, where most developers use the 23ai container-images and thus have this feature available (Thx to Gerald Venzl, link). 

Already, run in Dev/Test, if done against 23ai, can now produces the "list of Heavy SQL".  

Now we just need to ask for a backport, or create a similar report-thingy for 19c... 


Nice, but More Work to do...

Developers, not really ora-minded, already like and Want the feature. The system spits out the "worst queries in the program", so they can copy-paste that to the DBA: Your Problem now, Go fix this!

But this new trick works only under two conditions, you need.. 1) Version 23ai, and 2) Have a parameter-set. My first suggestion to big-read would be to make this default in 23ai: sql_history_enabled=true;

But even then, as most of the world is still on 19c or even older version.
If given Time.. I may be able to rig something up to let older version behave similarly. 


Even more: Observability, Open Telemetry, OTel

Another interesting item I came across while researching this topic, Martin Berger from Vienna from Oracle pointed me to OTEL: Open Telemetry (link). And there was a good presentation on this topic by Jasmin Fluri at POUG which will probably also appear at DOAG and UKOUG later this year. Something to keep an eye on.

-- -- -- -- -- End of this Blogpost, for now -- -- -- -- -- 

Links to scripts..

mysqls.sql :  script for use in SQLcl and SQL*plus.

Python script: later (it needs some work and some beautifying...)

-- -- -- -- -- End of this Blogpost, for Real -- -- -- -- -- 

Thursday, 14 August 2025

Oracle23ai and python - Measuring (Network-)Time.

TL;DR: I wanted a utility-function in python to report the "time spent", notably looking for "Network-Time". It evolved during coding into a time-reporting-function. It Worked. But I learned something new again.

Image: waiting for a bus, you never know when...


Background: (I'm not obsessed with - ) Roundtrips and Latency.

From earlier work, I have a number of exploratory python code-snippets to study and measure "network time" for (python-)programs that talk to the database. It so happened I came across some "Chatty" programs that were running in network-configurations with high Latency. That sparked a lot of tuning or "optimisation" (simple: removing roundtrips). But there is still a lot to be gained, notably by deploying code closer to the RDBMS: By Reducing Latency.

At first, I set out to write two functions:
 - f_rt_time_lost() : to report "time lost" on RoundTrips in the network. 
 - f_run_pings() : to imitate the ping-behaviour: report pingtimes to stdout.

And if I had those two, I might as well create a test-program: db_ping.py..?

In the process of building and testing those, I also found the need for something akin to linux time, to report how much time a program has spent on CPU (on the client side). 
And as a good DBA,  of course I wanted to report on the DB-time from my RDBMS.
So this "project" got a little out of hand, code-wise. I typed up a lot more than I had originally expected.

But to stay focussed: I wanted to "Report" on the time lost (in network...) by various programs.. so most of the effort went into collecting and reporting the basics:

The formula for time "lost in the Network":

    Network-time = 
        RoundTripTime (RTT)  *  number of RoundTrips  (nrRTs)

In between I started thinking of "total time", and would it be possible to put time in perspective: How much is spend "in the App", and how much is spend "in the RDBMS".

The next equation I can think of is this one: The Equation of Time...

    Total time = 
        App-time (spent running code from client-program) 
    + DB-time (spent running SQL or PL/SQL in RDBMS-session) 
    + Network-time (nrRTs * RTT).
[    + Idle- or user-input-time (sleep, keyboard-wait, other?... ). ] 

If I can report on those items I can make fairly clear-cut cases for troubleshooting: Try reducing the biggest time-consumer. And if that happens to be "the network" than we know where to try and improve. 
If it is Not the Network, than we probably still know where to look. And we can better predict what kind of impact any improvement can have.

Note1: At this point, real-techies start to suggest "profilers", or even a fancy "flame graph". But in my environment I dont have the luxury nor the knowledge-base to go and do extensive profiling. I will have to help my colleagues with the tools we have, and with tricks they can understand and implement.

Note2: Anything that runs multi-threaded or uses PX (parallel processing of SQL), will pose additional challenges. I'm deliberately keeping those out of scope.

Note3: Equation of Time.. EoT.. I'm deliberately not going to use that abbreviation.

For now..... 
Most programs already pop out a "summary" of their processing to stdout, and I wanted to add my own little item to that.

What if Every Program could produce that Equation of Time at the end of its run... ?


Some musings on measuring time.

In IT it is always very much "in vogue" to blame network or latency, or "our WAN connection" for any slow response-times or long-running-programs. 
Which is Why I want to Know.. I want to be able to say with more confidence: "this is a network issue". 
And since I am not a "network engineer", not even a proper sysadmin, I need Data, and I need data that I can "Understand" myself and that I can Explain Clearly to Managers and other non-tellectuals.

With the Oracle Database, we are used to AWR (or STATSPACK), and ASH, to report on activity and we know how to use that information for "tuning". But those tools have difficulty covering network activity. 
At session-level I can get the nr of RoundTrips (RTs) from a session, And I can have a session report how much DB-time it spent. 

The two items missing from AWR are, IMHO, App-time and Network-time. 
App time would be the time spend "in the App", e.g. on CPU or other activities at the client-side (some of it maybe even waiting for a keyboard-enter).
Network-time would be the time spent on "transport" to and from the RDBMS. And that Network-time is possibly the least-productive if them all. It is the "waste" that we are trying to eliminate...

I realise full well that AWR, based on the RDBMS, will not be able to measure App-time or Network-time, and I am not blaming Oracle or not providing this data. But I can look for that data myself...


Let's put it in code..

From earlier tweaking, I had some code more or less ready to copy-in. 
For python, I decided on the time module, and using time.perf_counter_ns() as probably the most precise way to measure time (see also older blog on duration.py)

Last year, I already had thrown together some "stopwatch" ideas so there was some re-use of a module called duration.py. (it now also has a function inspired by linux  time command- maybe good topic for later..

In short: I can cover the App-time, both active-time and elapsed clocktime from the python time module.

The functions I used comes from the time module.

time.perf_counter_ns () : a clock counting nanoseconds. The reference point is undefined, hence you need to save the value when your program starts and than check the difference to know how many nanoseconds have elapsed.

time.process_time_ns () : the nr of nano-seconds spent on user- or system-time. See this as "busy-time". When compared to linux time, this should be close to the sum of user+sys time (check this...?)

Then I needed DB-time, and number of RoundTrips: those could be gotten from v$mystat and v$sess_time_model. I used this query:


This gives me the nr of RoundTrips for the session, and the DB-time from both v$mystat and from v$session_time_model. I used the time-model as it returns microseconds and implies to be more precise, but the old-fashioned DB-time from v$mystat, in centiseconds, achieves the same result once the session has consumed a few seconds of time.

To find the RoundTripTime (RTT), I need a ping, or a number of pings. And then multiply ping-time by the total nr of RTs that the session has done.

I sampled RTT using 5 pings, with 1 second in between. More on that later...  

And with the code approached completeness, I had:
App-time: from time.process_time_ns()
DB-time: from the v$sess_time_model
Network-time: from the multiply of RoundTrips x Ping-time
And Total-time: from the wall-clock of time.perf_counter ().

That looked quite complete and realistic. 


The Test-setup and first run...

While coding, I had the function included in ora_login.py , my module with oracle-utilities for use in our python programs (link below).

And for testing, I had the function called at the end of do_sql_times.py, as described in an earlier blog.

This program allows me to run an ad-hoc query against the RDBMS. It uses the dotenv utility to get credentials from the file .env. And it reports on how many microseconds each record takes to "retrieve". I've used that program to measure timings before, notably to show the effects of arraysize and prefetchrows.

A test-run with a query on dual looked like this:


After the Connect, this program notifies that "array" and "prefetch" are set to non-defaults of 1 (to maximise RoundTrips). 

Every record will be a RoundTrip, to maximise the number of round trips and thus maximise Network-time.

The very simple query is just 1 record, and that seems to take 49 ms (I happen to know that could be the approximate RoundTrip time for this system). 

The 2nd part, the "reporting of the time" is interesting: We collect the ping-time, we retrieve DB-time and RTs from the RDBMS, and we can calculate "all the values":


Note that the ping-times to this system are fairly high, and variable. This is the kind of system we have trouble with... 

And then we see the data we want from the programs:
App-time: measured by time.process_time()
DB-time: from the DB
Network-time: pingtime x nr RoundTrips
Idle time: calculated.
Total time: measured from time.perf_counter()

The first time I tested this, I was surprised by the large, 5sec, idle time in a program that I thought did very little. Then I realised: The loop for the 5 pings has a 1-sec sleep in it, and that generates the (approx) 5 sec of Idle-time.

As a separate check, I've used the linux time command to time the run of the program, and the values concur, even for a very trivial, short-running program:

Real: the elapsed time, close to the 8sec reported by python.
User and sys : Together they are +/- close to the App-time repoorted by python.

And the data from time correspond, with a margin of error, to the data reported by my python contraption.

So I now have a function I can call at the finish of a program that  will report how time was spent. And a verification via linux time looks acceptable. 

So far it looks promising, usable (but you can feel a problem coming...)


Test some Longer running queries...

Then I did a very simple test that turnd out funny. Query 100 records (with one RoundTrip per record), and I found this:


Let me point out: 
Each record seems to take between 30 and 60 ms to retrieve (forced to do a RoundTrip every record)
The ping-samples are much longer than that, with an average of 175ms ?
And the Idle time is reported Negative ??

On first impression: the individual records come back (from a RoundTrip) with some 40ms, which is expected on this system. And if it takes 40ms to fetch a records, the RTT should be lower than that.
But the ping-samples are higher and show more irregularity than expected. 

The same test, against a database running nearby (ping-time around 1ms), shows this:


On this easy-to-reach RDBMS, ping-time shows less variance, and the Idle-time is as expected: 5 seconds from 5 call to ping plus sleep-1-sec to sample RT-time.

It seems that on the "irregular system" the sampling of the ping-times (5 pings in 5 seconds) comes back with values that are too high, and possibly more irregular than they are in reality.
And from over-estimating the RTT, the Idle-time is then calculated as ... Negative?

Something is Not Right.


Partial solution: Do the ping "faster".

One apparent fix was to modify the ping-sampling, and remove the sleep-time between pings. This gave more regular and lower RTT times:


But mainly shows that pings (and RoundTrips, RTT) were faster when done at high-frequency. And we dont know at what frequency the regular python-programs do their round-trips, especially while we are trying to Eliminate the RoundTrips in the first place.


Wrap Up: This helps, but there is More... 

This measurement is only Partly Helpful.  First the list of positives...

DB-time: We get a view of how much time is spend in the database. If this time is significant, we can go in and optimize some of the queries. Classic DBA-tuning stuff... (but not our biggest challenge on this system).

App-time: We get an idea of how much time is spent by the python-code and the measure from time.process_time () seems to concur, but not quite, with what the linux time command provides as user + sys time.

RoundTrips: We can record the exact number of RTs per program, and thus have a better measure for its chattiness (but we already knew that and from AWR-data...). In our test runs, the program did 110 RTs in about 9 seconds,  say +/- 12 per second. No idea if that is good or bad, and we also dont quite know in what "pattern" they were done. As it turns out this program did most of the roundtrips in a quick sequence while fetching 100 rows. Other programs will display different patterns...

Network time: Not Quite. This "measure" is still unreliable, mainly due the the uncertainty in measuring ping-time or RoundTripTime (RTT). 
Ongoing experiments will probably be with some "ping" utility.

Current measure: For the moment, I'm reducing (removing) the sleep-time between pings when sampling. But I do not know the behaviour of the pyhthon-programs with regards to their frequency of RTs to the RDBMS, or the pauses between SQL-Net calls.

Root Causes: We suspect this  behaviour is due to the nature of our systems. Most of the Databases run on VMs. But a number of (test)databases are running on containers. The latency (and ping times) vary depending on the locations in the networks and the time of day, and possibly the phase of the moon...  In general, the VMs seem to have more problems than the containers with test-databases...

Dont tell the manager, but .. we dont really "Know Everything" yet.

Main goal for the moment remains: Find and Remove RoundTrips....

-- -- -- -- -- End of this blog post, for now -- -- -- -- -- 

Appendix : links to code and scripts

do_sql_times.py : the program used in testing. 

You can use this program to run "any query" and get timing data out of it. The actual function to report the timing is ora_time_spent ( the_conn) and is defined in the file ora_logon.py.

The utilities used:

ora_login.py : Functions to logon, some utilities, and now also time-reporting.

duration.py : Time-measurement (and some reporting).

prefix.py : Enable a prefixed print to stdout, add timestamp in microsec.

.env : The config file for dotenv.

Beware: the .env now contains non-defaults for arraysize and prefetchrows. If you run with those set to 1 (one), some of your code may cause excessive RoundTrips! 

-- -- -- -- -- End of this blogpost, for real -- -- -- -- --




Tuesday, 12 August 2025

Oracle23ai and python - Pingtime and RoundTrips

TL;DR: from my obsession with finding and measuring RoundTrips, I got to tweak with pings and RTs. Some findings, some code. And maybe more to investigate if and when that is deemed relevant... 

image: queues, roundtrips, time-spent... 


Background: RoundTrips, Latency.

When moving our python-code around to find the best (safest, and cheapest) place to run, I got obsessed with network-delays. Notably RoundTrips and ping-times. You can find Proof of my obsession in some of the previous blogs...  

From using ping (on linux and mac-OS) and tnsping, and my earlier doodling with the ping-function from the python-oracledb driver, I wanted to incorporate a "latency measurement" into the python (problem-)code to collect data, maybe help advise on where to deploy the programs.
And another logical follow-up would be to build a db_ping(.py) utility for when tnsping was not available (thin driver...). Later.

Mind you: a lot of detailed "network stuff" is way out of my "knowledge zone", but there are Two Things I know: 
1) Fast Connections, low latency, makes our programs "run better" and 
2) Keep the nr of RoundTrips Down, no matter what. 
So I should be measuring and counting those items.

Here are some notes, some warnings, and some code to re-use if you want.


The DBA view: AWR (STATSPACK) does not know...

As a DBA I have a tendency to look for "data" in AWR (Automatic Workload Repository), ASH (Active Session History) and similar information provided by the RDBMS. It will give me things like user-calls and SQL*Net Round Trips, and information on the SQL-statements done by the application. Those tools are Excellent for troubleshooting Database-behaviour, and to advise app-coders on where+how to improve their SQL. But the data from AWR has two flaws for my purpose:
1) For the RoundTrips, AWR does not produce any "ping-times", no Latency is measured.
2) It can only report on things from "inside" the RDBMS. Any "time used" or "time lost" on the client-side is not registered.

To measure Latency, you need something similar to ping. And to measure time consumed by the application, you need to "profile" its behaviour or measure consumed-time and consumed-CPU somehow. Not something that the RDBMS can "see" without additional help. (multi-component-view needed, topic for future blog..?)


Find the ping-time.

Firstly, I felt the need to report the RoundTrip-times before and after various (python)programs. 
(Note that good old tnsping was also not available bcse of thin driver...)

I had already, innocently, incorporated something like :


os.system('ping -c10 www.oracle.com ' ) 


But I quickly found out that ping (and my other old-time favorite: traceroute) were not always present in the (production-, container-) locations where the python code was running. And getting tools "included in the build" was complicated. (tip: nobody seems to worry about the pip3-stuff because  "it does not run under root". No further comment, and I am not hacking our own systems... Never).

So I built a few (amateurisch-) functions for checking connectivity and had them included in some of the python-rollouts.

Link to complete source below, but here is my quick code to measure a single ping and a series of 100 pings:


In this code you can see there is a function, f_rt_1ping(), that is is called once at the very start of the "main" program. And then it is called again, in a loop of 100. The result looked like this:


That first ping seemed to have taken of 28ms, and that is what I more or less expected from the linux-pings to this system.
The loop of 100 pings yielded an average of 25ms. Also in expected range.

Note: Please disregard the burps about starting MAIN, and the connection-info: I Need to Be Sure of the address, instance, user etc.. to avoid reporting data from the wrong database (oh yes, that still happens a lot...not kidding - link to older blog..).


Calibrate: make sure there is no overhead.

Because I am slightly paranoid, I wanted to make sure there was no hidden "overhead" in my ping process, I also built a calibrate-function. I would measure the same activity with and without the actual ping. The result was reassuring: 

The overhead of code looping around a number of pings is negligible, see program run below. And I +/- knew this from an earlier experiment as well..


Something to Include into python programs...

To collect some data, I experimented with code to do "a sample of pings" inside the regular python programs. It would ping for a pre-determined interval (e.g. say 2 seconds?), or a pre-determined number of pings (say 100?). The output could go to stdout and get collected into the log. In future, I could also envisage storing the data in a table (it is an RDBMS after all).

For now: I was happy to have an approximate RoundTrip time (in ms) in stdout.


Even Better: I also now the RoundTrips and the DB-time..

From earlier work, I already had a way to interrogate V$MYSTATS. This would tell me what the RDBMS thought that my session had done. I could re-use that sampling of V$ to report on the session activity, as seen from the RDBMS. It would tell me how many RoundTrips a program had done.

So near the end of each program (more accurate: before closing the DB-Connection), I could report the number of RoundTrips to stdout.


Let's run the test-program to completion:

With all this data to be "collected, the end-result, to be put at the bottom of a python-program, will look like this:


With just a little extra code, any (relevant) python-program could, once it has done its job, report this data into stdout for examination by the sysadmins. And to myself, the DBA...
Notice that this program (just a test-thing), has done... 
 - a test of 1 ping, which yielded 27.266 ms RoundTrip time.
 - a test of 100 pings, which yielded avg 27.373 ms.
 - a test of 2sec calibrate-pinging, with an avg (netto) pingtime of 27.0266.. ms.

And from the session-statistics, we detect:
 - 177 RoundTrips.

Also notice: I can use the linux time command to report the real+user time consumed by the program. I woul like to use that somehow. Maybe Later.

Note: I'll ignore the difference between db-time (microseconds from the latest time_model and centiseconds from the old mystats) for the moment. I might ask some ppl who know more about oracle when we meet. And I'll keep an eye on that when running more relevant programs once the code is included...

(And Yes, I'll think about popping this data into some log-table - that is definitely an option! Later)


Making Management happy, with numbers...

So by collecting RT-times and number of RTs, I have "measures" for each program or program-run.

High RT times, high Latency, will cause time lost in "the network".

High nr of RTs, Chattiness, will cause time lost in "the network".

The multiple of the two should be approx total time lost on network-traffic, call that Network-time.

So by collecting these numbers, I can "guesstimate" the time lost in the network. And I can indicate the possible gains from either bringing down the RTs or placing the program closer to the RDBMS.

From the example above: with 177 Roundtrips, at an average of 27ms, those round-trips may have consumed about 177 x 27 = 4779 ms, or 4.8 sec of "network time".
This we can report to our Manager...

I have a slumbering idea to take it even further: 
I can try to measure time spent in python on the app, on the container, App-time.
And I can try too measure time spent in the RDBMS (DB-time, from V$STATS or V$SESS_TIME_MODEL. 
And for non-threaded, non-parallel processes, the total of time: App-time + DB-time + Net-time => Should add up to running time of the program.

Maybe later. I have some more ideas to tweak code and experiment. 
Focus for now is on reducing Chattiness and Hopefully reducing Latency.


Wrap up: It helps to collect Data.

At this point in time, I am just happy to collect "data" and use it as a broad-indicator. Finding the programs with most RTs. And relating the RTs to "sensible values".

For example, if a certain program does Millions of RTs, but only processes a few thousand records (and if we somehow know that with certainty...): something is suspicious and it may need examination of code...

And in our case, any component(-location) with a high latency will also get some attention.

At least by collecting the data, I have arguments to discuss with various colleagues.

More to come. Always more Ideas. Maybe if time.

-- -- -- -- -- End of this blogpost, for now -- -- -- -- --

Appendix : links to code.. 

tst_netw.py : the test-program used for the demos and screenshots above.

Plus these additional files, needed to import helper functions into test-program:

ora_login.py :  login and some other oracle-helper functions.

duration.py : time-keeping and reporting functions - instrumentation

prefix.py : print the filename + timestamp to stdout - instrumentation.

.env : to replace credentials with your own scott/tiger@host:port:service

that code: Take it! 

Feel free to copy or plagiarize (do you really need a large disclaimer..?) Just Copy What You think is Useful). 

 If you have suggestions for improvements: Let me know. I'm always interested in discussion these things if time+place allow. Preferably in the margin of a Good Conference, or on one of the social channels (bsky, slack, email...).

Oh, and you may need to get (pip3) the following:
pip3 oracledb   
pip3 dotenv
pip3 datetime
pip3 array

-- -- -- -- -- End of this blogpost, for Real -- -- -- -- --

Tuesday, 5 August 2025

Oracle23ai and python - too many RoundTrips, Fixed.

 TL;DR: Time got lost due to leftover code. In our case, some well-intended "defensive coding" caused additional RoundTrips and some commit-wait time.

Found it. Fixed it.
And at the same time, we (re)learned a little about coding, and on how to measure, and how to "instrument" our code.


Image: waiting for a train. I've spent a lot of time waiting for trains, waiting for planes, and waiting for traffic jams..


Background: Suspiciously slow python code.

After some careful testing we stumbled across parts of the program that did more RoundTrips than the data justified. And RoundTrips were still the item that seemed to cause us most "wait-time".


Testing: Run the suspicious code in "isolated" test.

To better observe, we re-worked the UnitTest to run just the parts of the code that we suspected to cause "slowness". After much simplification, we ended up with this result:


We see the result of 1000 loops of "insert item". That took 14.17 seconds, corresponding to a "speed" of 70.58 records/sec.

From the session info (V$MYSTATS) we see that the session did about 1000 executes, which makes sense for 1000 inserted records.
But the surprising numbers are the RoundTrips and the user-calls: 3000 each ??
It seems this code does 3 Roundtrips for every record inserted....

We also know that this particular setup at the time had a latency of around 3-4ms per RT. Meaning that 3000 RTs would consume close to 12sec of the 14sec elapsed.

Furthermore, from V$MYSTAT, we notice that the RDBMS consumed 160 Centiseconds, or 1.6 sec of time.

Note1: The db-connect and the measurement-queries represent the additional overhead of 14 executes, 4 roundtrips and 5 user-calls. We know this from earlier testing of similar code, see previous blogs (link). 

Note2: I also realise the reading or our stdout takes some getting used to, we sometimes appear a little "over instrumented". My bad. But it Helped!

Note3: the DB-time for the session is reported at 160 CentiSeconds e.g. this session only consumed 1.5 second of time at the RDBMS. The rest of the elapsed 14 sec is spent "elsewhere", and we suspect 12sec of that were spent in RTs ... Interesting? Something for later research?

Anyway, from this test, it seems the insert of these particular item-records contains some overhead. Time to read the code...



Code Inspection: Aha.

Once we had the suspect isolated as "inserting of items", we examined the code involved. And after some searching, the simplified version of what we found was this:


There are two things "wrong" with this code.

Firstly, the Try-Except block. It will catch parse-errors. But after catching the exception, it does not really know "what to do with the error". It simply re-raises it and then quits. You can smile at "return 0" as well.. 
But what mattered in our case: the parse-attempt causes a RoundTrip to the RDBMS (and RTs are considered Evil, in our case).

Secondly, there is a Commit after the insert. Oops. Transaction-Logic was not supposed to be part of the "insert function". That commit was a leftover from the "development time", when commits were useful to preserve the state (the data) that would otherwise get lost if the program did an exit-on-error.
But guess what: a commit causes a RoundTrip (and RTs are considered Evil, in our case).

Note: The developer was Innocent: he had created this code with the Best Intentions. There was a reason for putting the parse and print-of-stmnt there.  On errors, I will _Always_ ask for the offending SQL. And over time, as the datamodel evolved, columns were added/modified, causing statements to fail. Hence the need to "print the SQL".
Please understand that I don't want to unfairly blame the DEV...


The improved version.

It was not difficult to remove the offending code. And after some edit, we ended up with this (simplified!) version:


Now the function only does the insert, and returns the newly-found ID. Parse and Commit have been removed. This should avoid some round trips. 

Will this speed up our (test)code ? Let's run it... 


And Voila, the same 1000 inserts now done in 4.46 seconds, and with now only 1000 Round-Trips. Since on this setup we know that an RT is generally about 4ms, the timing-info make sense: We eliminated 2000 RTs, and at 4ms per RT that would make the program run about 8sec faster from eliminating RTs alone... 
And especially from a "far-away" application with vey high numbers of inserts, this is going to make a difference.

But there is more: The reported DB-time from V$MYSTAT is now down to just 14 CentiSeconds. That might be "explained" from the 1000 commits of the previous version of the code. A commit would realistically (on this old test-system) take about 1ms. Interesting.

The parse-time we save by no-parsing is likely to be minimal: re-parsing the same statement is a Very Efficient operations thx to the Oracle Shared_Pool.

The total gains from the new code seems to be about 10sec: 8sec gained from eleminating RTs, and 1.4 sec gained at the DB-time, probably from eliminating Commits. Total gain close to 10 sec. And that concurs with the measurement.

In summary: The new reported "speed" is now 224 records/sec.
About 3x faster.


Wrap-up, on the (Human-) Coding-side.. 

The  Defensive Coding is generally not a bad thing and the "problem-code" was built with the Best Intentions.

The parse-call was intended to check the connection, check the privs, and check the syntax of the statement. And "defensive coding" is generally not a bad thing. Except that sometimes, the addition of "being careful" results in time spent on un-necessary activity. These "checks" were better done in the Testing-phase, or ultimately by single-calls at the start of the program. Not inside of a high-frequency call.

Key points: 
Spend more time informing DEVs.
Pay attention to (Unit-)Testing, maybe include reporting of statistics and time ? 
Do occasional code-reviews ?


Wrap up, on the (geek-tech) Measurement-side:

We have the start of instrumentation: python does some time-keeping and prints timestamps.  And the captured data from V$MYSTAT helped to identify the additional RoundTrips and user-calls. We also noted the DB-time: after removing parse- and commit- overhead, there is Notably less time reported as DB-time.
It would be nice to know a little more on the "time spent" by the various componentsin more detail... 

Future improvements come to mind : better use of the timing-information available in python (time.perf_counter/ns ? ). And what about the data available in Oracle (the new sys_time_model?). Maybe also use the linux time command?
Would it be possible to "measure" time on both sides: application + RDBMS, and then attribute the "missing time" to RoundTrips ?  Future Blog(s).. ? 

Key point: 
Improve on Instrumentation ?

(spoiler: possibly more ideas on instrumentation coming... )


-- -- -- -- -- End of this Blogpost, for now -- -- -- -- --

Appendix: links to code..

tst_merge_src.sql : set up the tables for this test (re-used from previous test)

tst_ins_item.py : the test-code used in the demo



Plus these additional files, needed to import helper functions into test-program:

ora_login.py :  login and some other oracle-helper functions.

duration.py : time-keeping and reporting functions - instrumentation

prefix.py : print the filename + timestamp to stdout - instrumentation.

.env : to replace credentials with your own scott/tiger@host:port:service

And you may need to get (pip3) the following:
pip3 oracledb   
pip3 dotenv
pip3 datetime
pip3 array

-- -- -- -- -- End of this Blogpost, for Real -- -- -- -- --