TL;DR: Some Very Basic Tips, to write better programs and tools. In my recent case: using python and Oracle. But in the handover to colleagues, I came to realise there are some tips+tricks that need re-iterating every few years. So Here Goes (again).
My opinion is of course strongly "biased" towards an Oracle Database, but that is not always a bad thing.
The actual blog on my Oracle-python-tricks will have to wait a bit.. but it is in the pipeline (# makefifo pythonstuff ; link to future blog...)
The things that tend to wash up on the desk of a DBA...
Background: Been There Done That (have T-shirts).
Originally, I set out to describe some of the pyton programs (functions, really) that I (re)used to make my life and that of my cstmr easier. As it turned out, we ended up discussing some "habits" and practices rather than the actual code. So just a few generic tips, which I also used to explain things to cstmrs. Hence I wrote that up before I dipped into my python code.
YMMV and "Never say Never": The tips applied to myself and my cstmr-situation. But I came across enough situations to say they apply in many case. However, Your Milage May Vary. So if you think they do not apply to you, I am fine with that. I'm Not going to go religious on it.
My content here applies to the type of (simple-) jobs that tend to wash-up on the desk of a DBA: Interfaces, ad-hoc reports, occasional maintenance-jobs, loading "meta data". that sort of thing.
Also, do not expect High-Tech. In my case, with the combination of python and Oracle, I have only build relatively simple programs. And I do nearly all my work just from a "terminal" on either linux, mac or sometimes even windows. I use mainly vi, SQL-Developer (and occasionally some IDE or even notepad) as editors, and often still use print/f or dbms_output statements for debugging (showing my age eh...).
Non-functional "Requirements": try to include in Specification.
Requirements, Specifications, or whatever name you have, often state: "ingest these datasets" or "produce a list of gizmos+items in that format", or "return a set of records with this data..." . And that is expected.
The un-spoken or so called "Non Functional Requirements" are often along the lines of:
1) Make sure it runs in future infrastructure (if we move to "the other cloud"...)
2) Allow running- and maintenance by our outsourced-vendor-resources.
2a) Include some (unit-, integeration-) test capability. testset...
3) Make sure it runs inside 1 min (One Minute, even if 10G of xml-data....)
4) Produce some output or report to allow control or verification (and keep that)
5) Allow only "authorised" ppl/tools to run, and/or prevent DoS situations.
etc...
Those requirements are often unspecified but implied when you are asked to do "just this simple program". But it helps to write them down (briefly!) even if nobody wants to read them. You may also need some trace of those Non-Functionals to justify the choices that you make (e.g. why didnt you build this program in Go, in C++, or as a bash-script using just grep and awk? ).
Note: The checklist of 1 to ... etc above would be Tip-nr-0, if I was so pedantic as to start numbering at Zero...
So here come the "Generic Tips (after weeding out half)...
Tip nr 1: Write it down, Briefly but Clearly
Why? Well, See above...
But also: the reader of your writings is probably yourself or your replacement-person in 3 years, or even in 20yrs (Yes.. I've seen them). Hence, it helps if you can introduce your program with a few lines of both the "functional" and "non-functional" requirements (what does it need to do!).
By writing (or just summarising) what your program needs to do, you can also "verify" it with the user, the architect or whoever is asking for it to be built.
You may also want to include some "justification", e.g. why you choose PL/SQL over Python or FORTRAN (hehehe).
Tip nr 2: Draw one or more ERDs. For better Understanding!
To understand a program: it rarely helps to read the source-code. But a look at the tables, or the ERD of the underlying data can do miracles for understanding. I tend to create my ERDs from the tables in the datbase and rarely work with other models than the Table-Relationship-Diagram.
Even if you have very simple data, it helps to have an ERD. Not only will an ERD help you define tables (if you even use tables at all...) and the way those table relate. But a brief documentation of each data-element will also allow you to communicate purpose and usage of the data-elements.
In creating the ERD you will most likely "discover" a few things about the data you process. I notably walk down the lines (relations) and verify things like .. "This order has zero, one, or more details...", and then ask "Is that Correct...?"
When you have to explain your program to colleagues, or have to ask "more questions" from the users: an ERD is a great tool to communicate, and much more readable than "open this source code" (the user may not have the dev-tools installed...).
Tip 3: SmartDB: Use. Your. Database (e.g. use SQL and PL/SQL)
If the data is moving into or out-of a an RDBMS: Consider using SQL and PL/SQL as much as possible.
Even if your organisation prefers using VMware virtual machines, or your architect prefers the use use of "spot-instances", or k8s, or some serverless FaaS solution... If your data ends up in a database, that database is the Safest, the Longest-Lasting, and probably the Easiest to Maintain place to "store + run your program".
I recall my managers, in the 90s and 00s demanding things like "use FORTRAN to convert incoming data", or "only C will do this fast enough". In the end most of that code ended up being converted (reduced!) into PL/SQL with good results.
The excuses to not use a database are plenty:
- Database CPU is too expensive (maybe, but the database is often the moste efficient place to process and keep Data)
- PL/SQL requires a DBA... (well, a pl/sql person, acutally, so ..?)
- This is so simple, you should use grep/awk for this (ok, try that. and then introduce a small change in requirements...or start processing GBs of data...)
I must say: I have broken this rule of #SmartDB many times, often at the expense of myself and my cstmr. Nothing lives as long as the database...
Of course, YMMV. In some cases FORTRAN, C++ or python, is advised bcse of available tools or libraries (as was sometimes my case). The main Valid reason to do processing outside of the RDBMS is, in my opinion, the use of certain libraries for specific functions (math, geo, or some specific engineering- or scientific purpuse library).
But In General, by solving the problems "inside the database", the code was the most efficient and the easiest to maintain (some links to the praise of #SmartDB...?).
Tip 4: Provide some Test Cases.
This may sound trivial, but it can really help. I'm not specifically advocating TDD (Test Driven Development), or some other "framework" that may require a lot of methodology (and work). But you need some test-cases, if only to prove that the original code you delivered still works.
Ideally, you have a scripts (test_setup.sh and test_run.sh).
And those tests should not "break" or otherwise affect your production data. Hence this needs some thought, and probably needs to be embedded into the processes of the customer.
All cstmrs have Test-platforms. Some cstmrs even have separate platforms where they run Production systems.
Tip 5: Plan Liberally, every IT tweak takes more time than expected.
The best known book on this is probably "The Mythical Man Month" by F.P. Brooks. One of my early managers made me read it. Recommended!
My rule of thumb is to sometimes multiply the original time-estimate. For simple systems: multiply by 3.14 ( approximately pi, Archimedes...). For more complicated systems: multiply by 9.8 ( approx pi-square or g)
Most ppl can remember pi and g (earth gravity). Because simply stating "I multiplied by 10" sounds rather un-scientific.
Tip x : Stick with the tool(s), unless...
I also tend to include a warning: stick with the tool (bash, python, sqlcl, pl/sql...). If every little-maintenance job completely re-writes the program (in Java, bcse that is the future standard... ), you end up with a hodgepodge of code. If some future coder Insists on re-writing (in .cpp using chatgpt...): think hard, and possibly find a coder that Does understand the original tools/languages it was written in.
When I was an engineering student we were taught some rules around robustness, and maintainability. Such as how to choose components or parts:
- Less Components make a system more Robust.
- Less Different Components make a system easier to Maintain.
I never forgot those.. and those simple rules also apply to IT-systems.
Oops... Writing Too Much ?
This text is getting way to long, nobody reads that much anymore (so I went back and weeded out half of it).
I could add a several more rules. Notably around the fact that the Oracle RDBMS now includes several neat features that make your life easier, while still staying at the #SmartDB level e.g. "With the Database". There are now options to process JSON, to build REST-services (recommended), the use of APEX for simple window-on-data (recommended). And you can now include ONNX models (new in Oracle23ai!). But all those are above + beyond the scope of what I wanted to write.
So there.
Summary:
Write Some Specs, for yourself and your colleagues.
Plan carefully (and maybe read The Mythical Man Month...)
Make na ERD, it helps the understanding and facilitates discussions
Use #SmartDB whenever possible, it is more efficient.
Provide some Test-cases.
Limit your nr of tools/languages, prevent code-chaos.
That is about it.. Next instalment should be : python Tips + python Code ...