Saturday, 8 November 2025

Oracle 26ai and UUIDs - Playing with uuid-v7

TL;DR: The UUID capability of Oracle 26ai prompted me to play with some UUID data. It also gave me ideas for some PL/SQL code. 

But first: Three Caveats:

1) Do Not use this your Production systems (unless you Really Know what you are doing, and even then...).

2) A lot of this blogpost is trickery, or plain Bad-Practice: You Should Not put relevant Information in a technical Key. And do not depend on info that might be available in the key.

3) The Jury is still Out whether UUIDs, in an Oracle RDBMS, really are better than plain old, smaller, readable, Sequences or (Integer-) ID-colunms.

The reason I still coded this, and wrote this text, is that I got inspired (got jealous?) from the python-module for UUID (link) and from some other systems. I tried to copy some of its capabilities. Additionally, Maybe some ppl out there can benefit from some of this code. And lastly, I wouldnt mind to have some discussion on the use of UUIDs.


Image: UUID, clock and universe, generated by some AI tool (it is called Oracle 26ai, Twenty-Six-eeh-aai, after all...)



Background: UUIDs are coming, like it or not..

The use of UUIDs ... Will Just Happen. 
In practice, I expect a lot of new (distributed) system will be designed to generate and process UUIDs. For "distributed systems" those keys are a necessity. Hence your data will contain UUIDs soon.

The jury is out whether UUIDs (stored as RAW, Varchar, or otherwise) are All that useful or efficient in an Oracle RDBMS.  But Given that their use is growing, Expect them to invade your RDBMS soon (if not already).

Allow me a few comments, personal opinions...

Opinion nr 1: In a lot of cases, plain-old Integer-IDs, when used sensibly, may still be usable, and perform even better... YMMV.

Opinion nr 2: For UUID, V4 and V7 probably equally useful, and equally performant, unless you have some Extreme Use Case (or when you plan to use the hidden time-info in a UUID-V7...).

Since Oracle introduced the SYS_GUID (somewhere around 2013?), I have not seen a lot of usage of that function. Possibly because most Oracle systems are much older ad will stick to Sequences. 

Add to that: Experienced Oracle folks (designers, dbas, gurus..) have learned to work around the (perceived!) downsides of sequences. 
Notably:
1. The correct Caching of Sequences alleviates a lot of contention.
2. Partitioning (for time) is Conceptually Better on a true Date- or Timestamp- column than on (tweaked) UUID-V7 values.

But enough Caveats and Opinions now.



Let's Tweak Some... Start with Version.

So, after all these Cautious words, I still could not get the documentation  of the python module (link) out of my head, and decided to do some coding anyway. From the functionality included in the python module, I started with the Easy Ones...

The easiest thing to extract from a UUID is the visible-properties: Version. A function to do that would look like this:


Easy to define, but there are some items to "catch" around the values for MIN/MAX, and how to trap invalid input in general. After a bit of coding and testing, output was generally OK. Result is intended to always be an Integer (number), or NULL, if null is given as input. 


Notice the results for min, max and the negative values that signal "invalid". A  SYS_GUID() will result in a return of -2. A lot more sophistication is possible here, but since I dont have a pressing need or urgent use-case, I didnt go deeper, yet.

Note: a few days after writing, I considered the NIL-UUID should have version 0 rather than the negative -255 ??

One of the Use-Cases for UUID_VERSION could be to check or to report on "incoming data" when ingesting from other systems or sources.

And speaking of Ingesting data....



Conversions from/to Varchar (lowercase and hyphens...)

For use with ingest- and display software, it will be good to have conversions to/from Varchar2. Trivial, but useful. Hence..
 
UUID_TO_VC2 ( RAW) -=> lowercase 8-4-4-4-12. This is covered mostly in the function FMT_UUID ( RAW) already described in the previous blog-post (link). Hence a "wrapper" will do for now. Others (or myself) can improve on it later if needed.

VC2_TO_UUID ( vc2, upper/lower, hyphens, dots... ) -> RAW(16). This will come in useful when ingesting data from text-files or other sources that deliver the UUID as a string.

Those two "converters" are fairly trivial functions, but these will notably allow easy conversion of ingested data. And especially for the "to_uuid", I want an error Raised when input is dubious or invalid. I might put a little more Effort into that later. For now using the same dataset as above:


Currently, the only check is on "length" and valid-hex. Hyphens, if any, will simply be removed from the string, and then it will go HEXTORAW. Any non-HEX will result in Error-Raised, any wrong length will also result in Raise of error...

At this moment, the errors are rather un-elegant. But I would prefer to throw an error than to ingest "corrupt" data. There is room for improvement here, such as removing "spaces" or other separators like dots, underscores etc. Maybe Later.



Time from a UUID7 (which you Should Never Use...!)

Given a RAW UUID-V7, I wanted to extract the Timestamp, either as epoch-value, as oracle-DATE or TIMESTAMP. That should not be too difficult.

These time-extractions can be used to sneakily select ranges with given date-time boundaries. They can also be used to determine the "create-time" of a given UUID-V7 (which may give an indication of the origin, or the time when a certain record was created, should you meed that ...)

It starts with a function to extract the epoch:


Once we can extract the Epoch from a UUID-V7, we can build on that. From the epoch (with approx 3 decimals, hence milliseconds) we can convert to a Timestamp or to a Date. Hence two more functions:

UUID7_TO_TIMESTAMP  ( UUID7 RAW ) => return a Timestamp (without tz)
UUID7_TO_DATE ( UUID7 RAW ) => return a Date (precision on whole seconds)

There is a bonus function included, one that I needed anyway:
EPOCH_TO_TIMESTAMP ( TS NUMBER) => return a Timestamp (without tz).

Demo (link to file at bottom): Select some values of UUID7, including the the NIL and the the max-Epoch, and show the resulting Timestamps and Date-values:


There we are. And if anyone needs it: extracting the time-values from UUID-V1 and -V6 should not be too difficult. But I didnt have the need yet, and I was too lazy...

Also notice the max-epoch value resulting from UUID-MAX. Remember that number and remember that Date...



And to Abuse the time-info....

Now that we have this information, we can do something naughty, like counting records per Hour, based on just the UUID...


 We see an example of a count of record per day-hr, based on the information hidden in an ID column with keys that conform UUID-V7.
And as you can verify, no DATE or Timestamp column was used in this demo...

I can not repeat enough how this is ABUSE of the UUID(V7)... But it might happen.

Another way you can use those implicit time-values for is to check on UUID7 keys: if the resulting timestamp is before, say, 2020, the data is probably "generated" and not "genuine" because the Standard for UUID-V7 was only designed after 2020. Just saying.



Summary and closing remarks...

Regardless of the fact that UUIDs were Firstly meant to be Unique + Random, there can be some "information" hidden in them. Python (and other systems) are able to expose that information. Other systems, notably the Python module for UUID, can extracat that information. So I also built a few primitive PL/SQL functions, just to play.

We can those functions to extract version and date/time values from UUIDs as s "forensics tool". And we can do some conversion (and formatting) between Varchar2 and Raw for use in interfaces.

I had also thought about constructing UUID-NIL and UUID-MAX, but those are so trivial that I will leave that to the next hobby-programmer.

For Practical use? ...  I would recommend to be Very Careful to Use those in Prod. A true Key-field should Never be abused for "information". Period.

Question for Oracle: Will they succumb to temptation to include extractor-functions|? Or will they stand on the opinion that Key Data should be Meaningless ? 
Note: The links to the UUID-documentation is Evolving as we Write. For example, this page on function IS_UUID appeared recently in my searches... Good Sign.

Oh, and the elephant in the room: Should I demo the use of UUID-V7 for use in partitioning or other constructs with time-restricted data ?  We all know it can be done. But doing it will set a Bad Example. 

My aversion to use Key-values for anything other than Key-Reference is still strong...

-- -- -- -- -- -- End of this blogpost, for now -- -- -- -- -- --

Appendix A: links to scripts

mk_uuid7.sql :  Original from previous blogpost, with some corrections.

mk_uuid7_p2.sql : The additional functions.

-- -- -- -- -- -- End of this blogpost, For Real -- -- -- -- -- --

Saturday, 1 November 2025

Oracle 26ai - UUID v4 available, and looking at v7, Interesting!

TL;DR: in Oracle 26ai, there is the new function UUID(), which implements the UUID-V4. Someone also mentioned UUID-V7. I started experimenting with the Possibilities.

And of course I expect (hope) Oracle implements other versions of UUID quite soon...

(For the purists on Standards and Database-IDs, keys: Several Caveats included. Dont do this in Production unless You Know What You are Doing...)


Image: some Universe with the UUID.



Background: UUID are everywhere (but I avoided them)

So far, I have not encountered UUIDs much in Oracle. Some other databases use UUID, and to me it was mostly a funny-looking random-string. I was aware of the Benefits of UUIDs: Uniqueness + Randomness, even when generated at different place and time. And in Oracle there was the SYS_GUID() function, should we need it. But I dont think I ever used it.

That May Change: Because in Oracle 26ai, there is a function to generate UUID-V4 keys. 

Here is the link to the Standard Paper (which I didnt Study in Detail myself...)

Jonathan Lewis experimented some, he also dug into the error-messages, and pointed out that Oracle will probably also implements the versions 1, 2, 3, 7, and 8 (link to forum) 

In the Database-world, the growing consensus seems to be that UUID-V7 is Better. Notably in this article is an interesting example of someone who tried to benchmark, and concluded that "in his case" the UUID-V7 was better in a number of scenarios. 

We also had at least 1 Oracle-DBA who was Actively Asking for UUID-V7. And as Jonathan Lewis showed, the indications are that this will be implemented at some point in the (near) future.



Let's write some code...

DBA's being Data-Geeks. Shortly after the intro of UUID-V4 in Oracle came the discussion whether UUID-V4 was good, was usable, or was just plain bad for use (as a key/reference/pointer) in an Oracle RDBMS.

Geeks have to Geek, hence... Jasmin Fluri wrote a PL/SQL function to generate UUID-V7 keys and blogged her findings here (link). I could not resist, and took that code, tweaked it a bit, and started experimenting as well. As a result, I ended up with a few functions that I can share, and that I might use in further experiments. 

Link(s) to script(s) are at bottom of blog, in case someone wants to mess with it.



Raw, but (not very) Readable...

The Basis if the UUIDs is a 128-bit number, and this wiki-page on UUIDs is quite extensive.

Given the definition of the standard, and the nature of the documentation found so far, the consensus between myself and a small group was: Store this as RAW(16) for the moment (we might re-consider that later, and of course ymmv...).

With a copy of the code from Jasmin, I was able quicly able to create UUIDs of V7.

For the complete source: link at bottom.

I was now able to create UUID-V7 and experiment with them.

From working with "the other database", I was used to see UUIDs in a certain format. And I sometimes had data-sets delivered to me with (quoted) GUID-keys in them. They are generally displayed as 8-4-4-4-12, as per Standard. Both the Standard and the Wiki-page also states that the Lowercase is preferred. 

At this point I regretted that Oracle did not have (yet) a data-type for UUID. In the other database, I can define a column as a UUID, and get the display-formatting automatically on selecting the column.

Oracle provides the function RAW_TO_UUID, but at this moment, Oct 2025, with v23.26ai, that only works on UUID-V4 keys.

So out of habit, I wrote a formatting-function to facilitate display of generic UUIDs. The result was :

As we can see, the function is able too format most forms of UUID, including the old SYS-GUID() from Oracle v12.x. Can you spot the V4 and V7 UUIDs?



Time-ordered keys, hence I want Boundaries, Ranges.

The UUID-V7 is specifically time-ordered. That means that if you use them in an ORDER-BY, that the values are sorted by "created_date".

Thus, if I wanted to select all the records with UUID-V7 keys generated between T1 and T2, I needed to be able to specify T1 and T2, Precisely, and in a RAW, UUID usable type.

This called for a function that would generate the UUID7 for a Given Timestamp. The result was :

By specifying a timestamp, I obtain the corresponding UUID. And I placed some irrelevant (?), but informative data in the "random bytes" after the last two groups. 

Caveat - Note: I ignored the "variant" information from the Standard-spec. Something to think about later.

The results, output, can look like this :


The first two lines in the result are the UUID generated with explicitly-specified SYSDATE and SYSTIMESTAMP. And the time is "readably" stored in the last two groups of 4 + 12 bytes. You can recognise the ID generated from SYSDATE in the first record from the last two bytes: Because the DATE is only accurate to the Seconds, the Centi-Seconds, are "00".

You can experiment for yourself with the peculiar difference between :

Select uuid7_ts ( systimestamp) from dual connect by level < 3;

Select uuid7_ts ( )             from dual connect by level < 3;

There might be discussion on the implementation-possibilties there... Later.

Main point is: I can now filter the UUIDs generated before, after or between timestamps. 

Caveat! ... This is Not Best Practice (abuse of "information" in key-fields).. But it can bed Very Useful when analyzing data per hour / day /  week.


Closing Summary (hmmm, not yet closing...).

It looks like we can now generate UUID-V7 values. And we can even generate them with "given timestamps". 

We have the following functions:

- FMT_UUID ( UUID ) : return a varchar2 with uuid-format: 8-4-4-4-12 for "display" usage. For formatting the other UUIDs, since Oracle choose to accept only UUID-V4 for their RAW_TO_UUID (), at this moment (Oct 2025).

- UUID7 () : return a RAW(16) with the UUID of V7

- UUID7_TS ( Some_TS ) : return a RAW(16) with a UUID-V7, but with the time-part set to the given parameter. 

I look forward to playing with those ...


Because there is More...

Now, Once we have the ability to generate "precisely timed values", we can use those to allow partitioning on Time-ranges. Something for another session. 

Then there is the question: Do I want to Know the Time ? The UUIDs of type V1, V6 and V7 contain timestamp-bits. I can potentially extract the time-of-generation from those. Would this be useful? 

For a future versions, I would also consider creating a package to bundle all my UUID-related functions. A package will also allow "overloading" the two functions UUID7() and UUID7_TS() to a single "name". Something to think about.

Give me some time to think and experiment...


Or.. You can experiment Yourself. 

The best learning is from the things You Do Yourself - so Go tweak this code.

Enjoy !

-- -- -- -- -- -- End of blog text, for now -- -- -- -- -- --

Appendix A : links to Scripts..

mk_uuid7.sql : create the functions, and some test code to verify.

demo_uuid_fmt : demo the formatting.

-- -- -- -- -- -- end of this blog-episode, for real -- -- -- -- -- -- 

 

Sunday, 28 September 2025

Oracle23ai, python, and v$history for your developers...

TL;DR: We got the v$sql_history feature configured for use in development. Python programs can now report on the SQL statements done Exactly by Them.
Will this replace sql_trace one day ? 

Image: (copied from some "History" page...) Keeping History is Important. 


Background: What SQL you did spent your time on...?

After tackling most of the latency-problems, the "suspicion of slow SQL" came back. Not in the last place, because most python programs started reporting their "App-time", "Network-time" and "DB-time" components. So now we need to find the slow-SQL again...

From AWR and ASH, we have a fair idea what the most over-all time-consuming SQL stmnts are. And they are "known good": they need to be running, and they have little room left for improvement.

But what else is there? Which stmnts remain "under the radar". Which New stmnts needed attention, and which ones might be performing erratic.

Hence the wish: Let Each program report on "What it Did in the Database", and notably, what were the most time-consuming SQL-stmts. 

This is what v$sql_history looked Ideal for...


Earlier attempts...soso.

In a previous blog (link), I described how I hit the limits of sql_history: only on 23ai, and only when the parameter was set. I made a feeble attempt (used an old trick), to report on SQL done by specific programs or modules, based on "module" in the session-properties. This worked to a point, but it wasnt what I wanted. Not after I had seen the possibilities of v$sql_history.

additional note + disclaimer: I've tweaked a lot of Oracle databases and I have also recently experimented with the  active_session_history that was built into Yugabyte,  and did some experimenting with a the feature. I guess I am in hobby-obsession-territory here (link to sibling blog).

Oracle wise, I would expect a (hidden) parameter to overcome that limit of 60? But searching the view x$ksppi didnt seem to give me any candidates, so I started asking around...  


Friends (from oracle) to the rescue...

Martin Berger, from Oracle RWP-Group, pointed me to his blog (link) : Yes, there was a hidden parameter to help increase the nr of stmnts kept in v$sql_history (and it turned out I had queried for hidden parameters against a v19 database... sigh)

And a week later, Connor also "answered the question" in his office-hours session of 24 Sept 2025. Thx Connor! (link).

Of course you need the, non-default, enabler:

Alter system set sql_history_enabled=True;

And then this (hidden)parameter to allow more history than the dflt 50:

Alter system set "_sql_history_buffers"=10000 ;

Setting those parameters had my problem fixed, to some point. Development happens mostly on laptops, the Dev's very own machines, using the 23ai containers from Gerald Venzl (link). Hence, I could have the feature enabled and buffers set for developers fairly easily (those containers have a very easy provision to allow scripts at db-create-time and/or at db-startup time. Great tools for simple-devs!).


The (python) code.

Firstly, I experimented with SQL*plus, and the result was the script mysqlhist.sql  (link below). I'll probably improve that over time as it gets used. Notice that I report "aggregated" values per SQL. In some cases you might want to report the complete history, all statements in order of execution. That list can get Very Long Very Quicly... I was more interested in which of the statements took most time overall, e.g. accumulated.

But my original search into history was for the python programs...
 
A function got includee into my generic py-file with oracle-stuff,  ora_login.py. This import-able file begun life with the login-function, and gradually grew to contain my oracle-toolkit.  Link below.

I included the checks on version and parameters, just in case, so when someone accidentally runs this code against the wrong database, nothing bad happens:


Notice: this test was against v19, and it only produces the friendly 3-line message. The same message will appear if history-enabled is not set.

But when a call to the function ora_sess_hist ( ora_conn)  is done at the end of a python program that is connected to an "enabled" database, it will use the ora_conn to query v$sql_history. And output will looks like this: 


Notice how, at the end of the run, this program called two functions: first to report history and do the listing x lines of SQL (limited to 100, to keep it readable). We notice 8 different SQLs, one of which was executed 1000 times totalling some 236495 microsec. But the "biggest SQL" in this run was the one that was executed 5 times, totalling 5.6 seconds.

After listing the history, this program also calls the "ora time spent" function to report how much time it has spent in the various components: in App, in DB, and (a guesstimate) in Network doing RoundTrips. You can combine this time-reporting with the linux time command to double-check (see previous blogs as well).

Notice that the DB-time, listed as 6.612 sec, and it corresponds roughly with the total of the sum of the elapsed time values (which are in microseconds!). Consider this a useful double-check.

This example of course contains just some bogus SQL for demo-reasons. The self-test of ora_login.py will yield similar limited results.
But please feel free to test this yourself and let me know. 


What we dont quite know yet... 

The main uncertainty is the behaviour of sessions with the new parameters. This is probably why the feature is default not enabled (yet), and why the value of "buffers" is (IMHO) rather low at 50 or 60.

On our current dev-container-version, v23.9.0.25.07, the default value for sql_history_enabled is False, and when enabled the dflt buffers seems set to 50. If this parameter ever hits "Production" with high values, what will be the effect on PGA and behaviour ? Where are the practical limits and what do they depend on....?

There are a few other anomalies to note, possibly to investigate and keep an eye on in future releases:

Buffers? 
The actual nr of records in the view that came out was mostly just a few more than the setting of "buffers", say 8-10 more lines more. Not sure why/how.
(future test: various buffer-settings, and the actual impact on PGA with/out history)

Session-history private?
Sessions by the same user could see each others records. This may be why the SID is part of the view. Not sure if this is intentional or not. I'm curious to see how this feature will end up in Prod.
(future check: can system or sysdba also see all history from every session ?  )

Double counting?
Rows 2 and 6 from sql_hist in the screen-print above: The pl/sql block from line 6 contains the create-stmnt from line2. I suspect there is a double count in there. Not a big problem, but I'd like to verify that some time. There is probably some way to avoid that using top_level_sql_id... (Later!)


Summary: Got it Working.

This now works for Dev, and can work in production if/when we every get this feature in a Prod database. We can call a python function to list the v$sql_history, or at least the last 10k statements from that program.
With some checks, that function-code can now also be run without causing errors in Prod, on v19 or other database, or on a system with the history parameters "disabled", as they are by default.

I would still be careful to use this feature "in anger" as the effect on PGA-memory is not really known or tested yet.

And if needed, we can always revert to searching statements from v$sqlarea, as demonstrated in previous blog (link).

In future, I might even enhance the python-function to save the history to a table before exiting the program. Since history is currently only taken in DEV, on machines and databases that are very volatile (many re-starts, re-creates...), saving that data is not practical or useful, yet. 

Once history is also available in pre-prod and Production systems, I might want to persist history before exiting program. Until then, Prod will have to rely on AWR and ASH to find problematic SQL.

And I can see how this feature may replace sql_trace ?


Wishlist... Yes!
(Someone at Oracle is probably also thinking about this ...) 

Session: I would like to be able to set the parameters on a Session basis and only consume resource for specific sessions that Need/Want to keep their History. A bit like sql-trace, but without the file-interface.

AWR/ASH: I expect the v$sql_history to be available in combination with AWR or ASH in the near future.


-- -- -- -- -- End of this blogpost, for now -- -- -- -- --

Appendix: links to source-files.

mysqlhist.sql :  for use in SQL*Plus or SQLcl

tst_hist.py : short demo program.

ora_login.py : contains the actual funtion ora_sess_hist ( conn )

.env : used by dotenv, edit to add your own credentials.

Also needed are these python components:

pip oracledb : install the oracle driver

pip dotenv : install dotenv (used to store credentials)

Additionally: the grants...

To get to your v$ views, ... If you can get it past your DBA and your security-folks,... I can recommend these two:

grant ADVISOR to scott ; 

grant SELECT ANY DICTIONARY to scott ; 

Your account still wont be able to see data in other schemas, but you will see the v$stuff, all the relevant views.. 

-- -- -- -- -- End of this blogpost, For Real -- -- -- -- --

Sunday, 21 September 2025

Oracle23ai, sql_history, and some options, including for v19

TL;DR: The V$SQL_HISTORY is Very Cool Idea, but it still has some limitations. Also, it is not available in older versions... so I examined a few primitive workarounds and re-took an old trick setting "module".

(and I would say: Dont Try this in Production, yet. But you might be able to use some of it in testing...)

Image: How to prevent my sql-history, my footsteps, from disappearing...


Background: I had a suspicion, and I have older versions.

In a previous blog (link) I discovered V$HISTORY in Oracle23ai, and how it proved its worth in reporting the "Heaviest" SQL from a (short) program or session. On first attempts, primitive unit-tests against 23ai(free), that worked Nicely. It was like checking v$sqlarea, but for just-my-session.

Even with rather primitive code quickly inserted in some python-programs, I got SQL_IDs from slow-ish programs, and discovered some SQL that had remained "below the radar".

One "adjustment" was necessary: Avoid-Errors when running against version 19.x. But that was fixable with a check on v$version and on v$parameter (I'll put links to the code in appendix below).

More Tricky was: When I ran some non-trivial tests, some SQL that I had expected, that I _knew_ was in there,... Didnt show up. So I had a suspicion that all was not well in sql_history-land.


Problem statement: I want the SQL-History... ;-) 

For my current problem-case, I would like to focus on "What did This Session Do". I would prefer each (python)programs to report on Itself, at the end of its own run, using just SQL and stdout.

I'm deliberately excluding SQL-Trace, and I'm also excluding the running of reports for AWR and ASH against the database". Those are possible, but they represent too much work for programs that run (ir)regularly and frequently, during day and night.

And of course, in future, I will Recommend to save some of the logging-data to tables... But that has to wait for now.


But Some SQL was missing...

After the first Haleluja-feeling with v$sql_history, some things didnt add up.  Here is a typical run of a test-program:

At this point, the program has run 114 loops of something, and it will normally have done 114 inserts of the type /* t3/4 indv */, and many other SQLs. But the report only shows 59 executes? And I happen to know that before these 114-loops, there were a few other, probably heavier statements.

With some old scripts to check on v$sqlarea, I was able to verify: yep, the other, heavier statements were in the shared-pool. And the nr executions of the statements were higher than reported by v$sql_history... 

After a bit of poking around, this was the best summary of my finding:

The Feature of SQL_HISTORY, at this point (version 23.9....), will only show 60 items per SID. And that explained the missing SQL.

Only Sixty Records... A typical program will run 1000s, sometimes millions of executes. But this feature will only "keep" the last 60 ??? 

A very quick python-hack had proven useful in Unit Testing, but with a 60-record limit, this would not help a lot in "Production". And I had wanted to Proudly-Present the "heavy-hitter-SQL" at the end of each run... 


So... Can we think of something ? 

Aim is to have a (python-) program report on the SQL-statements done by that program, and tell the Dev or Ops or whoever is looking, which SQL-stmnts consumed most time.

Finding the missing SQLs in V$SQLAREA or V$SQL was the first clue: Can we link the stmnts in those v$views to the running program or individual session ? 

Spoiler: No, we can not (yet)... But we can try to get close.

Let's examine a few (old...) ideas.

Option1: List All SQL. At the end of a program, just report on the heaviest SQL in the shared-pool. A known trick. And rather similar to running an AWR-report or any SQL*Plus script after the program to see what was happening in the database-instance.

On a test-system with only one or few programs running, this might provide relevant information. On a rather busy production system, the data from "just my run" will be lost in the total.

Option2: Use SET_MODULE (always a good practice). A program can identify itself by setting the MODULE property of the session to the program-name. This module-name will then appear in the V$ views, and in the AWR and ASH reports. And it will identify the statements that were used by that program, provided that program was the first to load them into the Shared_pool.

Option3: SET_MODULE to a Unique Name for the SESSION (typically the pair of SID, SERIAL#). As a result, all statements that got loaded-first by the session got tagged. But statements that were already executed by earlier programs, e.g. statements that were already in the shared_pool, did Not get Identified. The result was that any 2nd run or subsequent run of the same program missed most of the statements in the list-of-my-statements.

These Options can be improved by doing flush shared_pool; at the start of a run. But but in serious Production, the flushing of sh-pool is Not recommended. Not At All.

You can think of several more options. For example, I am a big fan of watermarking queries with a  /* wm42 comment */ in the statement-code. However, marking statements for individual sessions might provide a huge overhead in parsing. Not Recommended either (but nothing will stop you from trying... ). And another problem with /* watermarking */ is that you can not watermark the SQL in code that you dont control.

Others will point out v$active_session_history and ASH-reports. The ASH (link) is a good tool to have! But for this case, the problem is the 1sec sample-interval. Picking SQL from the ASH-view will only find the sampled-ones, and in our case it tends to miss or to under-represent the high-frequency SQL.

.... and I have to stop discussing options at some point, 

I'll just kick the rest of the ideas into an appendix below...


In Concept (this works with 19c as well)... 

After pondering over the options, and revisiting some old ideas, I decided to try one python function to cater for the three options above.

Key tricks would be:

1. Watermark the session by module name.

2. Query v$sqlarea for all SQL with that module_name.

The python driver (doc here - link ) allows for easy setting of the module. And we can provide several levels of detail.

In this case, using python, we can set it in two ways. The simple way:

connection.module = 'some_program_name'

or, we can provide more detail in the module:

connection.module = 'some_program_name' + ':(sid,serial#)'

In the second, more detailed case, the module-value will represent the Unique Session in the instance, e.g. "this very run of the program".

Once the session has the module set, every new SQL that gets hard-parsed by the session will have the module-property set to the value of this session.

Following this, we can query the v$sqlarea. The simplest query, corresponding to Option1 above, would be:

select * from v$sqlarea 

where module like '%' 

order by elapsed_time ;

This will return all SQL-statements in the shared_pool. 
This is what we have been doing for years to find the heavy SQL. But the resulting SQL is instance-wide, it can be a very, very long list, and doesnt focus on our program or session at all. In a busy Prod system this will generally tell you only what you already know from the AWR reports.

To find more specific SQL for the current program, we can query for the program_name:

where s.module like 'some_program_name%'

This will give us the equivalent of Option2 above: all SQL that was ever initiated by some run of this program. This trick goes back to any old versio of oracle where we could and did used to set the module-value to the name of the program.

This option would be my recommended usage.

But to go very specific, we can ask for the SQL that was first-parsed by this very session:

where s.module like 'some_program_name:(SID,SERIAL#)%'

This will only list the SQL that was issued by the current session, and was issues for the first time (e.g. was parsed by this session).

There are pros and cons to each of the three options. But an advantage of this concept is: A Single SQL, with a like-condition, can cover each of the options, depending on what we but in the LIKE-clause.


Let's recap the options...

Option1: By asking for Everything: 

Where s.module like '%'

we will see a long list of SQL. On unit-test runs, or on systems where the shared-pool was flushed, the SQL may be relevant to the running program. But on a true Production-system, it is unlikely that the heaviest-statements happen to come exactly from the running-program.

Option3: When we ask for SQL from precisely this program-session:

where s.module like 'program:(sid,serial)%'

we will get only the statements from this program that were parsed, or issued-first, by this very run of the program. This is mainly useful when the program is running for the 1st-time against this instance of the database. One effect is that on 2nd and subsequent runs, we miss a lot of statements bcse they were parsed by previous runs and carry the module-name from those previous runs.

I came to the conclusion that I prefer Option2,  To list All SQL that was ever initiated by this program.

But you have options to choose from, even though I consider none of them to be "Ideal"... 


In python code (link to code in appedix)...

(note: you also use this concept in SQLcl, Java(Script) or in any other SQL-client, but I'm currently working with python....).

The function that picks up the statements starts like this:

If there is no module specified: the default for the_module='%' and will fetch all of v$sqlarea.

If the module-name is specified, including the (sid,serial#), it picks the very precise statements from This Session.

But in most cases, I would specify just the program name, and take into account that the resultset also contains the statements from previous run by the same module.

The self-test includes all three calls to the function like this:

The first call, INSTANCE, will list the Heaviest Statements from the sh-pool, regardless of which module they belong to. I had to include a limit of 100 stmnts to prevent excessive scrolling...

The second call, MODULE, we give it the module-name, and this will list the stmnts that belong to the module designated as g_module_name with no (sid,session#) specified. I recommend this type of usage as probably the most useful, but your situation may differ.

The third call, SESSION, we add (sid,serial#) to the module-name, and it will only list the statements parsed by this particular session.

On any initial-run of a program, the 2nd and 3rd call will yield more or less identical lists. But on subsequent runs of the same program, the 3rd call, the precise-one, will skip the statements that have already been parsed on earlier run.

Hence the 2nd option, with just the module name, no (sid,seria#), is my recommended way to use this.

But You can Choose which for yourself how you want to use this, if at all...


The Demo (actually a self-test)...

For practical reasons, I limit the demo to Select-only, so you can run it from any user, provided you have access to the dictionary (grant advisor, grant any select any dictionary, see below).

The easiest demo/test you can run is the self-test of the source-file, ora_login.py. Try it:

$ python3 ora_login.py

If you run it more than once, then at the end of the program you should see... 

Notice in this (partial) output: The first list are all SQLs ever done (hard-parsed) by this program. In my case, a total of 7 statements, and your result should be similar. 

But the second list, the search for SQL stmnts "watermarked" by this very session, is empty. That is because all the stmnts had already been parsed in previous runs, and were thus marked by module-names from those previous runs. The current-running module, with name 'ora_login_selftest:(248,64897)' has not added any new stmnts to the shared-pool, and thus that query returned ... 0 stmnts. 

If you run the self-test for the first time, or on a "fresh" instance, or after flush of shared_pool, you will see that the 2nd list actually does contain records. And on DEV/TEST systems, which often re-start, this can bring in some usable results.... 

Now I think you can see why I would use this with just the module, and not use the additional (sid,serial#) to watermark the module name.

But even then, it is quite possible that my program has called on queries that were parsed earlier by other modules. Hence there is no certainty that I catch every SQL done by this program. 

It works, sort of, but The Ideal solution is not yet found... 


The Downsides... 

The Concept of having every program "Report on its own SQL" is not quite feasible, or at least, Not Yet. 

The buffer for v$sql_history is too small to be of use, hence I came up with this old-fashioned and somewhat twisted-looking alternative using V$SQLAREA.

If you decide to try some if this, Notably Beware of the following:

D1. Creating many module names can get in the way of other systems that use the v$sqlares, notably your AWR reports will look more chaotic. Your DBA may get confused.

D2. Excessive queries on v$sqlarea can cost you in CPU-load and latch-contention. I am not sure how bad this still is in latest versions, but beware. Especially on Large (prod) systems with huge Shared_pool.

D3. You may still not catch Exactly the queries you want. Only an SQL-Trace (the 10048) can catch all and exactly what a session does.

... And I'm sure the more knowledgable oracle-gurus can point out other, more detailed problems of tweaking items in the shared-pool.

Note that some of this trickery feels like "back to square one". Again I revert back to the advice of "flush shared_pool", or even  "startup-force the Instance" or restart of the container to have a clean Instance on relevant testing. One of the reason these rather primitive tricks work so well in DEV/OPS environments is because a lot of the DEV databases run in containers and are re-started for nearly every test-cycle. Hence the sh-pool is always "clean" and the top-SQL is often relevant.


Summary: Pick a SQL-History solution for your use-case

If you want a program to report on its SQL, and V$HISTORY does not (yet) work for your situation (bcse you are on v19, or bcse you cannot set the parameter, or bcse you do more than 60 SQL-statements...), You have some Alternatives.

However, short from doing an sql-10048-trace, there is no easy Ideal Solution. None of the options described is able to pick out Exactly the statements done by an individual session (yet). 

A good compromise is to have each program set the MODULE to the name of the program. This will allow to identify statements in the shared-pool, and in AWR reports, and attribute them to specific programs. 

Going one step further, and setting module to a name that is session specific is probably overkill but may be useful in some situations.

You can choose from the above, or devise a variety on the options to suit your situation. And there are some more ideas in the notes in appendix below.

But you will have to realise the shortcomings, and the risks, of each method.

I will end with a shoutout to Big Red: The V$HISTORY feature of Oracle23ai has Great Potential. But it needs a bit of improvement still.


-- -- -- -- -- End of this blogpost, for now -- -- -- -- --

Appendix: links to scripts and components...


ora_login.py :  contains all relevant functions.

duration.py : Time-measurement, needed as "stopwatch" for some functions.

.env : edit to include your own  scott / tiger @ host : port / service


Also needed, these python components:

pip oracledb : install the oracle driver

pip dotenv : install dotenv (used to store credentials)


Additionally: the grants...

To get to your v$ views, ... If you can get it past your DBA and your security-folks, I can recommend these two:

grant ADVISOR to scott ; 

grant SELECT ANY DICTIONARY to scott ; 

Your account still wont be able to see data in other schemas, but you will see the v$stuff, all the relevant views.. 

note: I am close to creating a separate github-repo for each blog... 


-- -- -- -- -- End of this blogpost, keep some notes below -- -- -- -- --

More Ideas (just a collection of notes...): 

i_1. Sql-trace to trace-file, expose via view, examine details... Overhead, extra-work.

i_2: Watermark SQL-stmnts: not practical, no total-control, excessive parsing...

i_3: ASH-run right-after. Not practical in our case. And ash-interval too long.

i_4: More v$sql_history, 60+, backport etc.. ; need to ask Oracle...

i_5: Tweak python driver, produce tracefile  on app-server? complicated ? (too lazy)

i_6: Use set_action as well as module.. put (sid,serial) in set_action?

i_7: Flush shared-pool (Not Recommended on busy systems), keep as option ?

i_8: Insert/merge the contents of v$sql_history to a table. This would require frequent saves as SQL can sometimes be fired at rates of multiple per ms.

i_9: detect the DEV / TEST / PROD databases from the service_name, and act accordingly: on DEV always list the whole sql-area?


pffwt.. You Guessed it: You can experiment and play with this forever...

-- -- -- -- -- End of this blogpost, for Real -- -- -- -- --

Wednesday, 17 September 2025

Oracle23ai, and a new kind of history.

TL;DR: I discovered the view to find sql_history of a session. I can use that to report "what a session has done". Much more Simple than via other means.

This feature is Currently only available in latest version, 23ai...but I really like the concept. Check for example this blog by Ulrike Schwinn.
This Feature has Potential!


Image: Developers love to hand over a problem. Especially if they can say "This SQL is now Your problem..."


Background: What did that program do... ?

In earlier (python)work, I've demonstrated how to pinpoint the "latency problem". And just from reporting the "time lost" and the (variable, erratic) latency measured from ping-times, We were able to find and then eliminate a lot of "lost time" (one of several examples: eliminate unnecessary roundtrips)

But latency isnt the only time-sink problem. The RDBMS takes up time, DB-time. And the App itself needs time to do work...

With a little more effort, I was able to, primitively, report the total time spent in a python program, and split it out into three components (documented here)

Total = App-time + DB-time + Network-Latency.

One by-product of this measurement was that at least some of the slowness could be pointed towards the python-program, and the need for "Faster CPUs" to run our python-programs (they run in processing-containers, deployed in various environments that are often beyond the comprehension of my boomer-legacy knowledge).

But then there was the DB-time, which needed attention...


Old (and good) solutions: ASH, AWR, Trace, profiling...

In traditional DBA-work, the Oracle-gurus will come up with AWR and ASH reports, sometimes even with SQL-trace files (tkprof, anyone?) to find slow queries and then "tune" those.

AWR will tell us what the top-queries are. But not where they come from, nor which specific programs or runs caused the lost time. It is the old problem of "an aggregate does not give you details".

ASH can tell you more details but has the drawback of sample-time and you end up spending a lot of time correlating SID/Serial with ASH-records. Again, not all of the precise info for a single session that I want.

SQL-Trace outputfiles will give you Relevant Info on just the session you need. But trace and tkprof also come with challenges. Sql-trace involves notable overhead, and the trace-files reside on the Scacred-Database-Server. You can, with some effort, download the trc-file, but then you dont have tkprof to "profile" it. And you will find that the profile-information is overkill to a DEV-person who wants you to "just fix some query" anyway.

Note: as a DBA, I Appreciate all those tools, and I could not do without them. And if you have time + access available, I can Recommend them for use in Troubleshooting!

But for my current DEV/OPS-environment, I need something more Simple...

The average DEV/OPS-person would like a very simple script or report to look at. In many cases, a select from V$SQLAREA order-by-elapsed is all they can handle. And there lies some of this solution...


New in Oracle23ai: V$SQL_HISTORY

My Austrian friend Martin Berger, pointed this out: There is a new view available: V$SQL_HISTORY (link). 

This view currently still has to be enabled by a parameter:

SQL > alter system set sql_history_enabled=true scope=both;

And beware: this parameter is SYSTEM-wide. You need to set it at CDB or PDB level, and you can Not set it at Session-level. Martin Bach pointed that out in this forum post (link)

With this view enabled, I was able to create a script to list the SQL done by the a session. Here is the SQL*Plus variant:


Using the SID, we can limit the records to "just from this session". 
We see that the history for this short demo-session contains 6 statements. The longest SQL was an INSERT that was executed 7 times and consumed a total of 3.9 seconds Elapsed. This was also close to what the total of "set timing on" reported for this query. First impression: Looks Good.

For calibration, I included a call to spinf_n ( 0.5 ), which is supposed to keep the CPU busy for 0.5 sec. And line 5 of the query-result reports something very close to 0.5 sec. That also looks Good, QED.

Now note: there seems to be some anomaly in reporting of sleep-time. On the second line we have a sleep of supposedly 2.1 seconds, but it is listed as having only 417 microseconds "elapsed".  The result from "set timing on" was close to 2.1 seconds though. Something for later research.

But the concept is demonstrated: 
We can List the Queries from our session, and pinpoint the ones that take up most of the time.



Built it into a python-program...

We have something here: Every (batch, container, utility) python program can now report its top-SQL to stdout. Devs van read that output and cp/paste the offending SQL_ID for further examination...

(On 2nd thought, my first-attempts to include something in a python-program got rather messy and error-prone. Not in the last place, bcse a lot of my instances are 19c, causing pythong to crash with. ORA-00942. I'll do a screenshot, and then alter and beautify my code + output for future blog...)


The screenshot shows output from a (very short demo-) program that did a few queries. The idea is to list the SQL (part of each SQL), with an indication of how much time was spent on that SQL. It will serve two purposes: 1) give the DEV an idea which queries used up the time, and 2) supply the SQL_ID that can be used for further investigation.

Anyway... This seems to work! 
And the DEVs like the idea: they can kick the problem back to the DBA.
It just needs a bit more "effort". Some Quality-Python time.

But it is working already in the DEV-environments, where most developers use the 23ai container-images and thus have this feature available (Thx to Gerald Venzl, link). 

Already, run in Dev/Test, if done against 23ai, can now produces the "list of Heavy SQL".  

Now we just need to ask for a backport, or create a similar report-thingy for 19c... 


Nice, but More Work to do...

Developers, not really ora-minded, already like and Want the feature. The system spits out the "worst queries in the program", so they can copy-paste that to the DBA: Your Problem now, Go fix this!

But this new trick works only under two conditions, you need.. 1) Version 23ai, and 2) Have a parameter-set. My first suggestion to big-read would be to make this default in 23ai: sql_history_enabled=true;

But even then, as most of the world is still on 19c or even older version.
If given Time.. I may be able to rig something up to let older version behave similarly. 


Even more: Observability, Open Telemetry, OTel

Another interesting item I came across while researching this topic, Martin Berger from Vienna from Oracle pointed me to OTEL: Open Telemetry (link). And there was a good presentation on this topic by Jasmin Fluri at POUG which will probably also appear at DOAG and UKOUG later this year. Something to keep an eye on.

-- -- -- -- -- End of this Blogpost, for now -- -- -- -- -- 

Links to scripts..

mysqls.sql :  script for use in SQLcl and SQL*plus.

Python script: later (it needs some work and some beautifying...)

-- -- -- -- -- End of this Blogpost, for Real -- -- -- -- -- 

Thursday, 14 August 2025

Oracle23ai and python - Measuring (Network-)Time.

TL;DR: I wanted a utility-function in python to report the "time spent", notably looking for "Network-Time". It evolved during coding into a time-reporting-function. It Worked. But I learned something new again.

Image: waiting for a bus, you never know when...


Background: (I'm not obsessed with - ) Roundtrips and Latency.

From earlier work, I have a number of exploratory python code-snippets to study and measure "network time" for (python-)programs that talk to the database. It so happened I came across some "Chatty" programs that were running in network-configurations with high Latency. That sparked a lot of tuning or "optimisation" (simple: removing roundtrips). But there is still a lot to be gained, notably by deploying code closer to the RDBMS: By Reducing Latency.

At first, I set out to write two functions:
 - f_rt_time_lost() : to report "time lost" on RoundTrips in the network. 
 - f_run_pings() : to imitate the ping-behaviour: report pingtimes to stdout.

And if I had those two, I might as well create a test-program: db_ping.py..?

In the process of building and testing those, I also found the need for something akin to linux time, to report how much time a program has spent on CPU (on the client side). 
And as a good DBA,  of course I wanted to report on the DB-time from my RDBMS.
So this "project" got a little out of hand, code-wise. I typed up a lot more than I had originally expected.

But to stay focussed: I wanted to "Report" on the time lost (in network...) by various programs.. so most of the effort went into collecting and reporting the basics:

The formula for time "lost in the Network":

    Network-time = 
        RoundTripTime (RTT)  *  number of RoundTrips  (nrRTs)

In between I started thinking of "total time", and would it be possible to put time in perspective: How much is spend "in the App", and how much is spend "in the RDBMS".

The next equation I can think of is this one: The Equation of Time...

    Total time = 
        App-time (spent running code from client-program) 
    + DB-time (spent running SQL or PL/SQL in RDBMS-session) 
    + Network-time (nrRTs * RTT).
[    + Idle- or user-input-time (sleep, keyboard-wait, other?... ). ] 

If I can report on those items I can make fairly clear-cut cases for troubleshooting: Try reducing the biggest time-consumer. And if that happens to be "the network" than we know where to try and improve. 
If it is Not the Network, than we probably still know where to look. And we can better predict what kind of impact any improvement can have.

Note1: At this point, real-techies start to suggest "profilers", or even a fancy "flame graph". But in my environment I dont have the luxury nor the knowledge-base to go and do extensive profiling. I will have to help my colleagues with the tools we have, and with tricks they can understand and implement.

Note2: Anything that runs multi-threaded or uses PX (parallel processing of SQL), will pose additional challenges. I'm deliberately keeping those out of scope.

Note3: Equation of Time.. EoT.. I'm deliberately not going to use that abbreviation.

For now..... 
Most programs already pop out a "summary" of their processing to stdout, and I wanted to add my own little item to that.

What if Every Program could produce that Equation of Time at the end of its run... ?


Some musings on measuring time.

In IT it is always very much "in vogue" to blame network or latency, or "our WAN connection" for any slow response-times or long-running-programs. 
Which is Why I want to Know.. I want to be able to say with more confidence: "this is a network issue". 
And since I am not a "network engineer", not even a proper sysadmin, I need Data, and I need data that I can "Understand" myself and that I can Explain Clearly to Managers and other non-tellectuals.

With the Oracle Database, we are used to AWR (or STATSPACK), and ASH, to report on activity and we know how to use that information for "tuning". But those tools have difficulty covering network activity. 
At session-level I can get the nr of RoundTrips (RTs) from a session, And I can have a session report how much DB-time it spent. 

The two items missing from AWR are, IMHO, App-time and Network-time. 
App time would be the time spend "in the App", e.g. on CPU or other activities at the client-side (some of it maybe even waiting for a keyboard-enter).
Network-time would be the time spent on "transport" to and from the RDBMS. And that Network-time is possibly the least-productive if them all. It is the "waste" that we are trying to eliminate...

I realise full well that AWR, based on the RDBMS, will not be able to measure App-time or Network-time, and I am not blaming Oracle or not providing this data. But I can look for that data myself...


Let's put it in code..

From earlier tweaking, I had some code more or less ready to copy-in. 
For python, I decided on the time module, and using time.perf_counter_ns() as probably the most precise way to measure time (see also older blog on duration.py)

Last year, I already had thrown together some "stopwatch" ideas so there was some re-use of a module called duration.py. (it now also has a function inspired by linux  time command- maybe good topic for later..

In short: I can cover the App-time, both active-time and elapsed clocktime from the python time module.

The functions I used comes from the time module.

time.perf_counter_ns () : a clock counting nanoseconds. The reference point is undefined, hence you need to save the value when your program starts and than check the difference to know how many nanoseconds have elapsed.

time.process_time_ns () : the nr of nano-seconds spent on user- or system-time. See this as "busy-time". When compared to linux time, this should be close to the sum of user+sys time (check this...?)

Then I needed DB-time, and number of RoundTrips: those could be gotten from v$mystat and v$sess_time_model. I used this query:


This gives me the nr of RoundTrips for the session, and the DB-time from both v$mystat and from v$session_time_model. I used the time-model as it returns microseconds and implies to be more precise, but the old-fashioned DB-time from v$mystat, in centiseconds, achieves the same result once the session has consumed a few seconds of time.

To find the RoundTripTime (RTT), I need a ping, or a number of pings. And then multiply ping-time by the total nr of RTs that the session has done.

I sampled RTT using 5 pings, with 1 second in between. More on that later...  

And with the code approached completeness, I had:
App-time: from time.process_time_ns()
DB-time: from the v$sess_time_model
Network-time: from the multiply of RoundTrips x Ping-time
And Total-time: from the wall-clock of time.perf_counter ().

That looked quite complete and realistic. 


The Test-setup and first run...

While coding, I had the function included in ora_login.py , my module with oracle-utilities for use in our python programs (link below).

And for testing, I had the function called at the end of do_sql_times.py, as described in an earlier blog.

This program allows me to run an ad-hoc query against the RDBMS. It uses the dotenv utility to get credentials from the file .env. And it reports on how many microseconds each record takes to "retrieve". I've used that program to measure timings before, notably to show the effects of arraysize and prefetchrows.

A test-run with a query on dual looked like this:


After the Connect, this program notifies that "array" and "prefetch" are set to non-defaults of 1 (to maximise RoundTrips). 

Every record will be a RoundTrip, to maximise the number of round trips and thus maximise Network-time.

The very simple query is just 1 record, and that seems to take 49 ms (I happen to know that could be the approximate RoundTrip time for this system). 

The 2nd part, the "reporting of the time" is interesting: We collect the ping-time, we retrieve DB-time and RTs from the RDBMS, and we can calculate "all the values":


Note that the ping-times to this system are fairly high, and variable. This is the kind of system we have trouble with... 

And then we see the data we want from the programs:
App-time: measured by time.process_time()
DB-time: from the DB
Network-time: pingtime x nr RoundTrips
Idle time: calculated.
Total time: measured from time.perf_counter()

The first time I tested this, I was surprised by the large, 5sec, idle time in a program that I thought did very little. Then I realised: The loop for the 5 pings has a 1-sec sleep in it, and that generates the (approx) 5 sec of Idle-time.

As a separate check, I've used the linux time command to time the run of the program, and the values concur, even for a very trivial, short-running program:

Real: the elapsed time, close to the 8sec reported by python.
User and sys : Together they are +/- close to the App-time repoorted by python.

And the data from time correspond, with a margin of error, to the data reported by my python contraption.

So I now have a function I can call at the finish of a program that  will report how time was spent. And a verification via linux time looks acceptable. 

So far it looks promising, usable (but you can feel a problem coming...)


Test some Longer running queries...

Then I did a very simple test that turnd out funny. Query 100 records (with one RoundTrip per record), and I found this:


Let me point out: 
Each record seems to take between 30 and 60 ms to retrieve (forced to do a RoundTrip every record)
The ping-samples are much longer than that, with an average of 175ms ?
And the Idle time is reported Negative ??

On first impression: the individual records come back (from a RoundTrip) with some 40ms, which is expected on this system. And if it takes 40ms to fetch a records, the RTT should be lower than that.
But the ping-samples are higher and show more irregularity than expected. 

The same test, against a database running nearby (ping-time around 1ms), shows this:


On this easy-to-reach RDBMS, ping-time shows less variance, and the Idle-time is as expected: 5 seconds from 5 call to ping plus sleep-1-sec to sample RT-time.

It seems that on the "irregular system" the sampling of the ping-times (5 pings in 5 seconds) comes back with values that are too high, and possibly more irregular than they are in reality.
And from over-estimating the RTT, the Idle-time is then calculated as ... Negative?

Something is Not Right.


Partial solution: Do the ping "faster".

One apparent fix was to modify the ping-sampling, and remove the sleep-time between pings. This gave more regular and lower RTT times:


But mainly shows that pings (and RoundTrips, RTT) were faster when done at high-frequency. And we dont know at what frequency the regular python-programs do their round-trips, especially while we are trying to Eliminate the RoundTrips in the first place.


Wrap Up: This helps, but there is More... 

This measurement is only Partly Helpful.  First the list of positives...

DB-time: We get a view of how much time is spend in the database. If this time is significant, we can go in and optimize some of the queries. Classic DBA-tuning stuff... (but not our biggest challenge on this system).

App-time: We get an idea of how much time is spent by the python-code and the measure from time.process_time () seems to concur, but not quite, with what the linux time command provides as user + sys time.

RoundTrips: We can record the exact number of RTs per program, and thus have a better measure for its chattiness (but we already knew that and from AWR-data...). In our test runs, the program did 110 RTs in about 9 seconds,  say +/- 12 per second. No idea if that is good or bad, and we also dont quite know in what "pattern" they were done. As it turns out this program did most of the roundtrips in a quick sequence while fetching 100 rows. Other programs will display different patterns...

Network time: Not Quite. This "measure" is still unreliable, mainly due the the uncertainty in measuring ping-time or RoundTripTime (RTT). 
Ongoing experiments will probably be with some "ping" utility.

Current measure: For the moment, I'm reducing (removing) the sleep-time between pings when sampling. But I do not know the behaviour of the pyhthon-programs with regards to their frequency of RTs to the RDBMS, or the pauses between SQL-Net calls.

Root Causes: We suspect this  behaviour is due to the nature of our systems. Most of the Databases run on VMs. But a number of (test)databases are running on containers. The latency (and ping times) vary depending on the locations in the networks and the time of day, and possibly the phase of the moon...  In general, the VMs seem to have more problems than the containers with test-databases...

Dont tell the manager, but .. we dont really "Know Everything" yet.

Main goal for the moment remains: Find and Remove RoundTrips....

-- -- -- -- -- End of this blog post, for now -- -- -- -- -- 

Appendix : links to code and scripts

do_sql_times.py : the program used in testing. 

You can use this program to run "any query" and get timing data out of it. The actual function to report the timing is ora_time_spent ( the_conn) and is defined in the file ora_logon.py.

The utilities used:

ora_login.py : Functions to logon, some utilities, and now also time-reporting.

duration.py : Time-measurement (and some reporting).

prefix.py : Enable a prefixed print to stdout, add timestamp in microsec.

.env : The config file for dotenv.

Beware: the .env now contains non-defaults for arraysize and prefetchrows. If you run with those set to 1 (one), some of your code may cause excessive RoundTrips! 

-- -- -- -- -- End of this blogpost, for real -- -- -- -- --