TL;DR: I use Oracle23ai and python. Measuring the time it takes to call cursor-functions reveals when RoundTrips take place. It also shows how prefetch and arraysize affect the round-trips.
Background : Investigating RoundTrips.
When using python and Oracle, we ended up with some python code that did too many round-trips and investigated the behaviour. I'll try to demonstrate some of the findings.
Some of what I demonstrate is also nicely pointed out in The Documentation... I just confirmed it.
For the demo, I use an instrumented copy of "do_sql.py", which allows me to easily send all kinds of queries to the RDBMS and observe the behaviour, notably the timings.
Earlier work on this problem can be found here (link to Latency-blog). And one of the outcomes was that on that system, the average RoundTrip time was about 1.5ms.
Rolling balls - waiting for all of them to roll down...
A program that measures running of Queries.
To experiment with queries and timing, I constructed do_sql_times.py (sourcecode in link below). It prints out timing-data (to microseconds, or µs) to stdout. And when retrieving rows from a cursor, it reports the nr of microseconds it takes to get/print every row.
The essential python code looks like this:
This is the code to get cursor-data, with minimal extras and overhead. (apologies for the comments, I do tend to leave a lot of notes inside code...).
The program (link to previous blog) can take arg1 as the SQL to execute, hence you can do...
python3 do_sql_times.py "select empno, ename, sal from emp "
And Here is a demo-run retrieving the well known EMP-data :
Notice the following on the first 4 records:
The 5,517 MicroSec (5.5ms): time to retrieve and display the 1st row.
The 70 MicroSec: time to get/display the 2nd row.
The 1,713 Microsec (1.7ms) : time to get/display the 3rd row.
The 70 MicroSec: display of 4th row...
The display of all subsequent rows then takes less than 300 MicroSec, with most in the 2-digit range (e.g. significantly less than 1 ms)
If you re-run the program with the Exact same SQL-stmnt, the time for the first row tends to go down from 5.5 to about 2.0 ms. This is most likely the effect of the Parse (on first run) and re-use of the SQL-stmnt from the Shared_pool. You can confirm that by flushing sh-pool or running slightly different SQL-stmnts but I wont go into that detail now, maybe for later.
In most of the following tests, I display "repeated runs" of the program with the "parsing effect" removed.
In Detail: First check for the Overhead...
To start exploring, lets first show the program running with 1 row from dual, like this:
_> python3 do_sql_times.py "select 1, 'one row' from dual "
The program will report on various stages, like doing imports and defining helper function, and then starts the Main code.
First part of stdout looks like this:
After the imports and definitions, the ora_login shows where the connection goes (yes, I still use scott/tiger...).
Ora_sess_info then reports the session statistics right after login, so we can later discount those as "overhead" or "initiazation work". In most cases, the logon (Thin Mode here...) is 1-3 CentiSec of DB-time, 3 RoundTrips, 10 executes and 4 usercalls.
The actual use of ora_sess_info adds 1 RT, 1 execute, and 1 usercall...
(Yes, you can use snapper-by-Tanel (link) or similar tools to observe a session, but I want my python-code to do its own reporting, independent of other tools. And I might add some of this stuff to In-Production-Running-Code later)
Hence, at the point of creating the first cursor, we are at 2-3 CentiSec, 11 executes and 5 usercalls (also: see earlier blog, link).
And most important (topic of this blog) : We now Know this program does 4 (Four) RoundTrips before arriving at the cursor. Keep this in mind for the next testruns.
In Detail: Test a Very Simple Query...
The retrieval of that one row took 2ms
And only 1 RoundTrip: total 5 RoundTrips, minus the known overhead of 4).
Just to demonstrate the effect of "parsing", when I change the text of the query (add two Capital letters), that provokes a new parse, and the retrieval time suddenly becomes ...
A whopping 4.8ms now.
But still the same nr of RoundTrips.
I also added the "parse count (hard)" to the reporting of the stats. I can not quite prove-beyond-doubt, but given the reported parse, I am fairly sure this extra time is due to parse-time of the 'new' query. I notice the same effect on other queries: 1st run generally takes about 2ms more than subsequent runs of the Identical query.
And indeed, after re-running the same query again, time for first row goes back to about 2ms and no more hard-parse reported... I am now 99% sure this is parsing:
QED.
You can Check for yourself: link to programs below. Feel free to try it out.
And... the "benefit of the shared_pool" is demonstrated again...
Also notice: the need for parse does Not cause additional RoundTrips. A cursor.execute of a qry does the necessary parsing and the initial fetching all in 1 call, one RoundTrip. The separate options to parse statements are something to verify later, but for now out of my scope.
(note to double check this with the docu, and with Christopher Jones at Oracle - there could also be some python-optimising going on behind the scenes)
Now lets retrieve more data...
Next is to try and retrieve more rows, and possibly see the effect on nr of RoundTrips and other time consumed. We already spotted
The standard trick to get a series in Oracle is connect-by, and we start again with 1 single row:
We notice the same 5 RoundTrips as on the query to the DUAL table. And we know the rest of the program contains 4 Roundtrips. Hence, the execute of the cursor seems to have added 1 RT.
Now try 2 rows:
Suddenly we see 6 RTs. Minus the known-overhead of 4 RTs this query now seems to have caused 2 RTs.
The extra time needed to get the 2nd row was (only) 78 microsec.
But notice the timing info on the prefix-lines (highlighted): After the 2nd row from the cursor it took ... 460735 - 459081 = 1654 Microsec (1.7ms) before the next empty line was printed-out.
In the earlier run with 1-single-row, there are only 57 MircoSec for that interval.
So, on asking for 2 rows, we already got an extra RoundTrip, and something consumed 1.6ms at the end of the cursor-for-loop. That was probably that RoundTrip. Remember from previous blog: on this system the avg RoundTrip was about 1.5ms...
Let's try 3 rows...
Still only 6 RTs (e.g. the query caused 2 RTs). And the 3rd row took reportedly 1857 MicroSec. But the detection of the End-of-Set took only (553- 472 = ) 81 Microseconds.
Now 4 rows, check the output:
Still only 2 Roundtrips (e.g. 6 minus the known overhead of 4). And the time between the last row and the empty line is only 72 Microsec.
When retrieving 10s of rows: no more "slow rows" and no more RoundTrips get reported. You can now maybe guess where the next additional roundtrip got in...
Yep, at row 103 (and another one at row 203):
Wrap up : Prefetchrows and Arraysize.
These test-runs shows the effect of the cursor property: prefetchrows, default set to 2. But maybe not quite as I had expected. Fetching 2 rows still took 2 RoundTrips, probably because the session has to confirm There Is No Third Row (check?).
Then there is another Roundtrip every 100 records, caused by arraysize.
I did confirm this by playing with both prefetchrows and arraysize a while ago and watch the effects. For example, my tooling function ora_sess_info() has prefetch set to 20, to allow the anticipated number of records to be fetched in a single excute and for-Loop and thus only incur a One Single RoundTrip.
I suggest you read up on the relevant pages (link) and try it out for yourself.
From reading The Documentation on these topics, I Learned a few things, and I got a few ideas for "tuning" our applications.
But one of the main problems (for me) is the processing of single-records or small set, either retrieve or insert/merge.
If time + energy, I have a few more ideas for testing out. Later.
-- -- -- -- -- End of this blogpost, for now -- -- -- -- --
Appendix 1: links to Scripts.
do_sql_times.py : the program with timing-info added
ora_login.py : the login utility (uses dotenv) and the session-info.
prefix.py : contains pp ( *argv ) to print to stdout with timing info.
.env : used by dotenv to store + get credentials and other info.
Appendix 2: notes..
Check that the extra time on 1st row indeed is the parse-time (99% proven?).
Check that the extra RT at 2 rows is indeed to confirm End-of-Fetch.
Check for effect of size of cursor-results, notably large sets.
Check for vector-results: extra fetches? extra RTs? extra time ? We rarely retrieve vectors. We store them, 1-by-1, with often multiple secs in between. The overhead of RT on vectors is not a big factor (yet??). Later!
-- -- -- -- -- End of this blogpost, for now -- -- -- -- --