TL;DR: In a few old scripts, it was time to replace sysdate with systimestamp, for very geeky reasons. The reason to revisit these old scripts was to increase precision and to adapt them for use together with some python-code - More on that Later.
Oh, and I'll show how to check/verify the CPUs in a system (with a very old but simple trick).
Background: some old scripts, time to upgrade.
Since way back, Orale7, I had two scripts, and two similar functions: spin.sql and sleep.sql. The sleep.sql is evident: put a process to sleep for N seconds. The spin.sql would do the opposite: make the attached connection run full-speed, thus generating N seconds of load on the database server. Why these scripts... you'll see.
Note that I first created those as runnable-scripts, but very soon also as select-able functions for easy use in statements. I'll demo some of it, and I'll put the sourcecode in links below.
For "sleep", having a script provided very flexible: in cases where the execute-grant on DBMS_LOCK was not given immediately. It may even have started with just a host-call to the OS sleep program, hehehe. (Dear DBA, can I would like to have the following GRANTs..., approval could take a few weeks...).
But in the back of my head was always the nagging knowledge : those are not very exact. We Can Do Better...
Running the "old versions" of the scripts...
The lack of precision was notably on very short callsm and was often evident when I used set-timing-on:
You can see me running the script-versions of each. The sleep 1 and sleep 1.1 provide an elapsed time nearly Exact to the specified nr of seconds. The DBMS_LOCK.SLEEP is documented as 1/100sec precision and at the time that was fine.
But spin.sql, when called with various values, provides elapsed-times that are often less, but sometimes more (?) then the specified value.
And this is probably because start and end are measured in Whole-Seconds taken from the SYSDATE. So if you started a script or function at the tail-end of a second, you would loose the starting-part of that second. This was oracle7, remember.
But for spin.sql, or the call to the spinf() function, the actual "elapsed" time was never exactly the asked-for time, due to rounding errors on the "Seconds" of SYSDATE.
Also note: For the script-version of sleep, I would insert a host-sleep call if it was too complicated getting grant on DBMS_LOCK from the DBA (sigh..). The actual precision from a linux sleep call would be around 1/00 sec as well, often less precise due to various overheads. Still, at the time, that was good enough for me.
And normally precision was not a problem, as typical usage was for intervals of many seconds or minutes. Still, we could do better. And with the use python came the time module (link). Here, I suddenly got a system pretending to be nanosecond-precise with calls like time.time() and time.perf_counter_n () : Nanoseconds...!
Systimestamp: FF9 measures down to nanoseconds (??).
Using SYSDATE short of shows the age of those scripts. And I didnt mind, I never had much need to get better accuracy than 1 Second anyway, sleep or spin. But still.... Since Oracle 9, there was SYSTIMESTAMP, which could be reported up to 9 decimals (e.g. nanoseconds...). Plus, in Oracle-18c, there was the more accessible sleep-function in sys.dbms_session instead of the old DBMS_LOCK (soon to be deprecated!). It was time to upgrade my scripts.
For starters, here is how I got the "epoch" out of my RDBMS:
(nb: geeks can now see or find-out when I wrote/tested this blogpost...)
So with an accuracy of microseconds, I could improve on the exact timings of those scripts ?
And were these scripts of any use at all ?
Use-cases: CPU speed and nr of CPUs available.
Use-case 1: compare CPU-Speed.
Way back, the use of spin.sql was mainly play-time. See how I could "load the server". But very soon, The first Two serious Use-Cases popped up: 1- Get a feeling for the CPU speed by counting the nr of loops/sec. And 2- Verify the nr of CPUs in a system.
For an newly arrived Exadata, to replace our VMWare serverpark, we got asked: how much faster would this make "our application" ? The app in question was 92% or more just DB_CPU:
This application was Not doing typical database work and it was Not IO bound. It was purely CPU-bound. But a jump to next screen would take just over 30sec, all of it CPU-work on 1000s of function calls (dont ask..., I presented on this "performance problem" in 2015 or so...).
So on the old system, we ran "spin.sql for 30 or 60sec, and let it report the nr of (empty) loops, just to get a feeling for the speed. Then we did the same on the Exadata, and we got about 3x the nr of loops per sec. Hence our prediction was: this will _only_ give you about 3x increase but not more than that. And we were surprisingly close to the mark.
Use-case 2: check nr of CPUs
The other real-world story goes back to +/- 2008: Cstmer had a new Sun-M-series. The system was given on trial, and supposedly had 24 CPUs. Of course, I was curious how fast it would be, and how it would stand up to real Load. So I ran spin.sql and got back a nice, large number for loops/sec. And I could quickly run up 12 or more (putty + nohup) spin-scripts to the system and try to make it blow steam out of the serverroom.
With a 24-CPU system, I had expected my 12 scripts spinning away to load the machine to about 50% (or more). To my surprise, it only loaded up the system up to about 40%. Either this system throttled my jobs, or we had more CPUs...
Testing with 4 spinning scripts: about 13% load
Testing with 8 spinning scripts: about 27% load... I could see a trend.
When I showed this to the sysadmin, he was surprised, but went off to do his own verifications. About 30min later he came back: Dont Tell anyone, but we have 32 CPUs in the box. They probably forgot to "disable" 8 of them... He went back to his cubicle a Very Happy Man.
Note that with many system having the cpu_count set to something lower than the actual machine-capacity, this trick has become of limited use, and your test will get strangled by "resmgr: cpu quantum" (if your DBA was vigilant..)
But at the time, I think Oracle version 10.2, I was kinda Surprised at what my little playful scrippie had unearthed...
Now for Modernising... Functions, and more Precision.
Since python has a set of time-functions (link) that pretend to do nanosecond precision, I wanted to try and pair my Oracle-scripts to that. Although most Oracle implementations seem to stop a microseconds (given fractions of Seconds as : .123456000, where the zeros are the 999 nanoseconds).
So I considered it time to rework my scripts to do similar. I started by creating functions rather than scripts. From the code of the existing scripts, I created functions spinf (n_sec) and sleepf (n_sec). This meant I could also put "spin" and "sleep" into SQL queries. A bit similar to what a competing RDBMS also does (link? nah..).
The sleepf remained at Centiseconds for the moment, as both DBMS_LOCK.SLEEP (soon to be deprecated!) and DBMS_SESSION.SLEEP are documented to be up to 1/100 of a sec.
The resulting functions are defined in the sourcefile mk_spinf.sql (link below), and they can be uses in queries. Here are some examples for spinf:
Notice the results are still "inaccurate" at low values because of the use of whole seconds (from SYSDATE accuracy). The request to spin 1.1 second resulted in a spin-duration of 1.5 seconds, and the resulting loops/sec is estimated too high.
Of course, we can have the imprecision "averaged out" by using longer intervals, as was the habit with the old code anyway. But still...
So the next version of the function was spinf_ns, where the suffix _ns would suggest the nano-second precision, similar to the same suffix in python. Lets run it a few times..
Firstly, we did not get down to the nanosecond precision, not yet. But I kept the suffix anyway, bcse it looks good, and it is python-y.
But the actual elapsed time are quite close to the 1/100 sec precision now. And that is much better than before.
I can now more or less control, specify, how much time I want as session to consume. I may use that to do some measurements later (future blog-plans...)
Similar with sleepf( n_sec ): allow for short powernaps, and verify.
While at it, I also created a function to sleepf ( n), where n can give you a precision of up to 2 decimals:
And voila, now the actual sleep-times are very close to the specified ones. You never know when it might be useful to slow down a query.
One more tool: Measure what my session did...
Just to top it up, I re-visited an other script as well: mytime.sql (link below). To report on the time spent inside the session. The view V$MYSTAT is a very nice way to see what Your Session actually Did. I've used that to see how much DB-time I have consumed (and roundtrips, and executes, and user-calls).
Here is how you can combine spinf_ns () and mytime.sql:
The first run of @ mytime . sql tells us the session has consumed 1,101 of DBtime (expressed in centiSeconds, so that would be 11.01 seconds). I then run spinf_ns to add 8.97 seconds of "activity". Taking into account the expected overhead of the "measurement", I should end up close to 2,000 centiSeconds (20sec).
And as you can verify: That Worked.
QED.
Wrapping it all up, future plans...
I have improved on my existing toolset. I have more precise instrumentation, notably to match the python tools (the ambitious _ns suffix).
And I have a few ideas to put them to work by instrumenting my python- and PL/SQL code.
I plan to use some of this to investigate python-code, notably with regards to time lost in roundtrips. If I can get it all to work, and if it makes sense, I'll write up the results... (link to future blog here...)
-- -- -- -- -- End of this blog, for now -- -- 2 more appendices -- -- -- -- --
Appendix 1 : links to scripts.
sleep.sql : old script for sleeping N sec.
spin.sql : old script for spin, output nr-loops per sec.
mk_spinf.sql: create functions for sleep and spin.
dbtime.sql: report DB time consumed by the instance (since startup).
mytime.sql: report DB time consumed by the current session (since logon)
epoch.sql : my various attempts to extract epoch-seconds...
Feel free to steal and adjust for your own usage (the most re-used code is the code you write/edit/tweak yourself!)
And, Just Curious: how many loops/sec does your system get out of spinf_ns.sql ... ? Ping me the results if you ever run any of the spin-stmnts.
Appendix 2: More measurements: check the SLOB tool by Kevin Closson
Over time, I have used spin.sql as basis for other scripts, notably to count nr-commits per sec and nr of Physical IOs. But because others did a much better job of determining these metrics, I've put those out of scope for now. Maybe later. But you could learn a few things from checking the work of Kevin Closson (webpage-link) on the SLOB tool to benchmark Physical IO (github link).
-- -- -- -- -- -- End of this Blog, Really -- -- -- -- -- --
No comments:
Post a Comment