This feature is Currently only available in latest version, 23ai...but I really like the concept. Check for example this blog by Ulrike Schwinn.
This Feature has Potential!
Image: Developers love to hand over a problem. Especially if they can say "This SQL is now Your problem..."
Background: What did that program do... ?
In earlier (python)work, I've demonstrated how to pinpoint the "latency problem". And just from reporting the "time lost" and the (variable, erratic) latency measured from ping-times, We were able to find and then eliminate a lot of "lost time" (one of several examples: eliminate unnecessary roundtrips)
But latency isnt the only time-sink problem. The RDBMS takes up time, DB-time. And the App itself needs time to do work...
With a little more effort, I was able to, primitively, report the total time spent in a python program, and split it out into three components (documented here)
Total = App-time + DB-time + Network-Latency.
One by-product of this measurement was that at least some of the slowness could be pointed towards the python-program, and the need for "Faster CPUs" to run our python-programs (they run in processing-containers, deployed in various environments that are often beyond the comprehension of my boomer-legacy knowledge).
But then there was the DB-time, which needed attention...
Old (and good) solutions: ASH, AWR, Trace, profiling...
In traditional DBA-work, the Oracle-gurus will come up with AWR and ASH reports, sometimes even with SQL-trace files (tkprof, anyone?) to find slow queries and then "tune" those.
AWR will tell us what the top-queries are. But not where they come from, nor which specific programs or runs caused the lost time. It is the old problem of "an aggregate does not give you details".
ASH can tell you more details but has the drawback of sample-time and you end up spending a lot of time correlating SID/Serial with ASH-records. Again, not all of the precise info for a single session that I want.
SQL-Trace outputfiles will give you Relevant Info on just the session you need. But trace and tkprof also come with challenges. Sql-trace involves notable overhead, and the trace-files reside on the Scacred-Database-Server. You can, with some effort, download the trc-file, but then you dont have tkprof to "profile" it. And you will find that the profile-information is overkill to a DEV-person who wants you to "just fix some query" anyway.
Note: as a DBA, I Appreciate all those tools, and I could not do without them. And if you have time + access available, I can Recommend them for use in Troubleshooting!
But for my current DEV/OPS-environment, I need something more Simple...
The average DEV/OPS-person would like a very simple script or report to look at. In many cases, a select from V$SQLAREA order-by-elapsed is all they can handle. And there lies some of this solution...
New in Oracle23ai: V$SQL_HISTORY
My Austrian friend Martin Berger, pointed this out: There is a new view available: V$SQL_HISTORY (link).
This view currently still has to be enabled by a parameter:
SQL > alter system set sql_history_enabled=true scope=both;
And beware: this parameter is SYSTEM-wide. You need to set it at CDB or PDB level, and you can Not set it at Session-level. Martin Bach pointed that out in this forum post (link).
With this view enabled, I was able to create a script to list the SQL done by the a session. Here is the SQL*Plus variant:
Using the SID, we can limit the records to "just from this session".
We see that the history for this short demo-session contains 6 statements. The longest SQL was an INSERT that was executed 7 times and consumed a total of 3.9 seconds Elapsed. This was also close to what the total of "set timing on" reported for this query. First impression: Looks Good.
For calibration, I included a call to spinf_n ( 0.5 ), which is supposed to keep the CPU busy for 0.5 sec. And line 5 of the query-result reports something very close to 0.5 sec. That also looks Good, QED.
Now note: there seems to be some anomaly in reporting of sleep-time. On the second line we have a sleep of supposedly 2.1 seconds, but it is listed as having only 417 microseconds "elapsed". The result from "set timing on" was close to 2.1 seconds though. Something for later research.
But the concept is demonstrated:
We can List the Queries from our session, and pinpoint the ones that take up most of the time.
Built it into a python-program...
We have something here: Every (batch, container, utility) python program can now report its top-SQL to stdout. Devs van read that output and cp/paste the offending SQL_ID for further examination...
(On 2nd thought, my first-attempts to include something in a python-program got rather messy and error-prone. Not in the last place, bcse a lot of my instances are 19c, causing pythong to crash with. ORA-00942. I'll do a screenshot, and then alter and beautify my code + output for future blog...)
The screenshot shows output from a (very short demo-) program that did a few queries. The idea is to list the SQL (part of each SQL), with an indication of how much time was spent on that SQL. It will serve two purposes: 1) give the DEV an idea which queries used up the time, and 2) supply the SQL_ID that can be used for further investigation.
Anyway... This seems to work!
And the DEVs like the idea: they can kick the problem back to the DBA.
It just needs a bit more "effort". Some Quality-Python time.
But it is working already in the DEV-environments, where most developers use the 23ai container-images and thus have this feature available (Thx to Gerald Venzl, link).
Already, run in Dev/Test, if done against 23ai, can now produces the "list of Heavy SQL".
Now we just need to ask for a backport, or create a similar report-thingy for 19c...
Nice, but More Work to do...
Developers, not really ora-minded, already like and Want the feature. The system spits out the "worst queries in the program", so they can copy-paste that to the DBA: Your Problem now, Go fix this!
But this new trick works only under two conditions, you need.. 1) Version 23ai, and 2) Have a parameter-set. My first suggestion to big-read would be to make this default in 23ai: sql_history_enabled=true;
But even then, as most of the world is still on 19c or even older version.
If given Time.. I may be able to rig something up to let older version behave similarly.
Even more: Observability, Open Telemetry, OTel
Another interesting item I came across while researching this topic, Martin Berger from Vienna from Oracle pointed me to OTEL: Open Telemetry (link). And there was a good presentation on this topic by Jasmin Fluri at POUG which will probably also appear at DOAG and UKOUG later this year. Something to keep an eye on.
-- -- -- -- -- End of this Blogpost, for now -- -- -- -- --
Links to scripts..
mysqls.sql : script for use in SQLcl and SQL*plus.
Python script: later (it needs some work and some beautifying...)
-- -- -- -- -- End of this Blogpost, for Real -- -- -- -- --
No comments:
Post a Comment