TL;DR: Time got lost due to leftover code. In our case, some well-intended "defensive coding" caused additional RoundTrips and some commit-wait time.
Found it. Fixed it.
And at the same time, we (re)learned a little about coding, and on how to measure, and how to "instrument" our code.
Image: waiting for a train. I've spent a lot of time waiting for trains, waiting for planes, and waiting for traffic jams..
Background: Suspiciously slow python code.
After some careful testing we stumbled across parts of the program that did more RoundTrips than the data justified. And RoundTrips were still the item that seemed to cause us most "wait-time".
Testing: Run the suspicious code in "isolated" test.
To better observe, we re-worked the UnitTest to run just the parts of the code that we suspected to cause "slowness". After much simplification, we ended up with this result:
We see the result of 1000 loops of "insert item". That took 14.17 seconds, corresponding to a "speed" of 70.58 records/sec.
From the session info (V$MYSTATS) we see that the session did about 1000 executes, which makes sense for 1000 inserted records.
But the surprising numbers are the RoundTrips and the user-calls: 3000 each ??
It seems this code does 3 Roundtrips for every record inserted....
We also know that this particular setup at the time had a latency of around 3-4ms per RT. Meaning that 3000 RTs would consume close to 12sec of the 14sec elapsed.
Furthermore, from V$MYSTAT, we notice that the RDBMS consumed 160 Centiseconds, or 1.6 sec of time.
Note1: The db-connect and the measurement-queries represent the additional overhead of 14 executes, 4 roundtrips and 5 user-calls. We know this from earlier testing of similar code, see previous blogs (link).
Note2: I also realise the reading or our stdout takes some getting used to, we sometimes appear a little "over instrumented". My bad. But it Helped!
Note3: the DB-time for the session is reported at 160 CentiSeconds e.g. this session only consumed 1.5 second of time at the RDBMS. The rest of the elapsed 14 sec is spent "elsewhere", and we suspect 12sec of that were spent in RTs ... Interesting? Something for later research?
Anyway, from this test, it seems the insert of these particular item-records contains some overhead. Time to read the code...
Code Inspection: Aha.
Once we had the suspect isolated as "inserting of items", we examined the code involved. And after some searching, the simplified version of what we found was this:
There are two things "wrong" with this code.
Firstly, the Try-Except block. It will catch parse-errors. But after catching the exception, it does not really know "what to do with the error". It simply re-raises it and then quits. You can smile at "return 0" as well..
But what mattered in our case: the parse-attempt causes a RoundTrip to the RDBMS (and RTs are considered Evil, in our case).
Secondly, there is a Commit after the insert. Oops. Transaction-Logic was not supposed to be part of the "insert function". That commit was a leftover from the "development time", when commits were useful to preserve the state (the data) that would otherwise get lost if the program did an exit-on-error.
But guess what: a commit causes a RoundTrip (and RTs are considered Evil, in our case).
Note: The developer was Innocent: he had created this code with the Best Intentions. There was a reason for putting the parse and print-of-stmnt there. On errors, I will _Always_ ask for the offending SQL. And over time, as the datamodel evolved, columns were added/modified, causing statements to fail. Hence the need to "print the SQL".
Please understand that I don't want to unfairly blame the DEV...
The improved version.
It was not difficult to remove the offending code. And after some edit, we ended up with this (simplified!) version:
Now the function only does the insert, and returns the newly-found ID. Parse and Commit have been removed. This should avoid some round trips.
Will this speed up our (test)code ? Let's run it...
And Voila, the same 1000 inserts now done in 4.46 seconds, and with now only 1000 Round-Trips. Since on this setup we know that an RT is generally about 4ms, the timing-info make sense: We eliminated 2000 RTs, and at 4ms per RT that would make the program run about 8sec faster from eliminating RTs alone...
And especially from a "far-away" application with vey high numbers of inserts, this is going to make a difference.
But there is more: The reported DB-time from V$MYSTAT is now down to just 14 CentiSeconds. That might be "explained" from the 1000 commits of the previous version of the code. A commit would realistically (on this old test-system) take about 1ms. Interesting.
The parse-time we save by no-parsing is likely to be minimal: re-parsing the same statement is a Very Efficient operations thx to the Oracle Shared_Pool.
The total gains from the new code seems to be about 10sec: 8sec gained from eleminating RTs, and 1.4 sec gained at the DB-time, probably from eliminating Commits. Total gain close to 10 sec. And that concurs with the measurement.
In summary: The new reported "speed" is now 224 records/sec.
About 3x faster.
Wrap-up, on the (Human-) Coding-side..
The Defensive Coding is generally not a bad thing and the "problem-code" was built with the Best Intentions.
The parse-call was intended to check the connection, check the privs, and check the syntax of the statement. And "defensive coding" is generally not a bad thing. Except that sometimes, the addition of "being careful" results in time spent on un-necessary activity. These "checks" were better done in the Testing-phase, or ultimately by single-calls at the start of the program. Not inside of a high-frequency call.
Key points:
Spend more time informing DEVs.
Pay attention to (Unit-)Testing, maybe include reporting of statistics and time ?
Do occasional code-reviews ?
Wrap up, on the (geek-tech) Measurement-side:
We have the start of instrumentation: python does some time-keeping and prints timestamps. And the captured data from V$MYSTAT helped to identify the additional RoundTrips and user-calls. We also noted the DB-time: after removing parse- and commit- overhead, there is Notably less time reported as DB-time.
It would be nice to know a little more on the "time spent" by the various componentsin more detail...
Future improvements come to mind : better use of the timing-information available in python (time.perf_counter/ns ? ). And what about the data available in Oracle (the new sys_time_model?). Maybe also use the linux time command?
Would it be possible to "measure" time on both sides: application + RDBMS, and then attribute the "missing time" to RoundTrips ? Future Blog(s).. ?
Key point:
Improve on Instrumentation ?
(spoiler: possibly more ideas on instrumentation coming... )
-- -- -- -- -- End of this Blogpost, for now -- -- -- -- --
Appendix: links to code..
tst_merge_src.sql : set up the tables for this test (re-used from previous test)
tst_ins_item.py : the test-code used in the demo
Plus these additional files, needed to import helper functions into test-program:
ora_login.py : login and some other oracle-helper functions.
duration.py : time-keeping and reporting functions - instrumentation
prefix.py : print the filename + timestamp to stdout - instrumentation.
.env : to replace credentials with your own scott/tiger@host:port:service
And you may need to get (pip3) the following:
pip3 oracledb
pip3 dotenv
pip3 datetime
pip3 array
-- -- -- -- -- End of this Blogpost, for Real -- -- -- -- --
No comments:
Post a Comment