TL;DR: When using Oracle23ai and python, I developed some utilities. Here are a few for your inspiration. Links to runnable (copy-able) code in text and at bottom.
Background: Running python to process data.
Because I have a customer who wants to do ingest and repeated processing of data with python-tools, I have written some fairly basic python programs. In doing so, I could not resist to create some utilities (functions, modules) for various "system-administrative tasks". It started simple with recording of time, and monitoring of heavy programs. It developed into something more. Below I will examine some ideas and link to my resulting code. When it all got too much and long, I split-off some items for later blogs.
It is Nothing very Fancy, just little tools. Use them for Ideas. Copy and Modify as much as you want.
Illustration of Measuring tools...
Timers: how long does my loop take, and where does the time go ?
When you do a long, repeating, looping task, you want to measure the time. A lot of my functions already have local variables like start_dt and end_dt. But I needed something more global to span multiple functions. The result was two (global) variables to keep time and some set/get timer-functions.
One variable (g_tmr_total) is meant to measure "total time of an item", and can be set at "start" and queried by "tmr_total()". The other variable (g_tmr_durat) is meant to measure and re-set for shorter intervals. The functions are more or less self-explanatory:
And here is the link to the source, duration.py. You can import it into your program like this:
from duration import *
When you run the file stand-alone, it will do a self-test and demonstrate some timeings. The output should look like this:
This should all be self-explanatory but let me know if you have any questions or problems running on your system...
Notice there is one other function in there that allows you to "spin" a single CPU in your system for a given nr of seconds, in my case PI-seconds (3.14...sec). That function is out of scope for this blog, but it was useful for the test+demo. Maybe more later.
Notice also I used the linux time command to double check the total timing. Always a useful command.
If you are into precise timing or optimising of python programs, you should also start by studying the time -module. For example time.process_time() will give you the time spent "on your program". And you should check packages like timeit (link) for more advanced timing-ideas.
In Future, I will consider adding push and pop of timers onto a LIFO stack. This will allow more timers. But it will add the burden of book-keeping to know which timer is on top. Alternatively, there already are packages like timeit that do similar things.
Database-work: do you have an autotrace ?
Timing of activity is one thing, but as a DBA-geek, I also want to know what my RDBMS was doing. Inspired by sql-autotrace, I built a function to report on session-statistics. This would allow me to report the "work done by the session" at any given point. From earlier use, I already had a query very similar to this:
There is the SQL, defined as a constant inside the function ora_sess_info() that will display the relevant stats. Now completing that into a python program was easy. I had to add this code:
The simplest way of running and displaying a query form python. And when the function is called, your output is something like this:
Notice a few things:
You need access to V$MYSTATS (still, sometimes this is an item to discuss...)
The actual function needs a connection-object. Of course I never know which connections are available or how the connection-objects are named. They may not even be global. Hence it needs to be passed as an argument to the function.
Output is currently as text to stdout. In future, more sophisticated output might be enabled (e.g. json-format, or insert-into-some-log-table)
The function measures statistics "since connect". To reset the values, you can close and re-connect, but I would not do that with any high-frequency as the creation of a connection does take time and resources.
For Future additions, I had several ideas still. I would consider to include the SQL-stmnt into .env, and pick it up with dotenv. That would allow me to customize the query. And I would also consider storing some metrics in a table, to create a log of runs for particular programs and allow monitoring over time.
I was also considering to craft a function to report on the last-used plan of a given query, just to keep an eye on sudden plan changes (another one for the todo.txt).
But from actual usage, some other "requirements" came to light. Possible material for the next instalment of the blog (insert link when written?)
Summary : Feel free to copy and modify, YMMV. And there is more....
Like I said before, the most re-used code is the code that you create yourself (also called the Not Invented Here syndrome. So feel free to just copy and create your own varieties.
If I can give a few ppl ideas: Good!
As hinted earlier, once these snippes of code were in use, more ideas and unexpected requirements popped up. Let me think about those and I'll pop out another blog-post (future link here...)
And maybe, hopefully, we meet at some event or conference: Feel free tell me what I did wrong, and how you totally improved my code...
-- -- -- -- -- -- -- -- end of this blog, for now -- -- -- -- -- -- --
Links to code and constants:
ora_login.py (requires the dot-env file for credentials and constants)
.env (you need to edit to replace scott/tiger@localhost...)
Both modules are "self testing", if you run them they will produce some output to stdout. If you have all the requirements, they should ideally run without errors ;-)
-- -- -- -- -- -- -- -- end of this blog, really -- -- -- -- -- -- --