Friday, 30 October 2009

Oracle Performance Tuning with hit ratios - Risky.

The Ratio, as drawn by Leonardo da Vinci.While he was sitting in front of a fireplace somewhere in New England, Martin Widlake prompted this quicky-post.

First my Confession: I still use "Ratios" in assessing and fixing performance.

I know current wisdom is that ratios and aggregated numbers are not good enough and should not be used in tuning. I agree that ratios can be abused, and can be hugely decepive.

But when guestimating the scalability or performance of an OLTP database-application, I use these three numbers:

Buffer-gets per row, 10 or less.
Buffer-gets per execute, 100 or less.
Buffer-gets per transaction, 1000 or less.

Martin will immediatly see why, IMHO, his result of 380 for the 1st or 2nd ratio was too high.

I picked those numbers for several, Simple,, reasons.

Firstly, these numbers are simple to remember.

Secondly, these numbers can simply be determined. You can get them from v$-views, from traces, from AWR (Licensed), or from statspack (free!).

Finally, these numbers (10, 100, 1000) have generally served me well in places where I looked into heavily loaded OLTP systems. I dont recall a single problem with any statement or transaction that managed to stay within those limits.

Now be careful. Your situation will be different. And you should not strive for these numbers per-se, but rather for happy customers. And whether these numbers really are appropriate for you depends on the number of calls you do, the type of work your users/systems do, on the capacity of your hardware, and the demands placed on your system. Some systems operate comfortable with one zero added to each number.

But if you need millisecond response times, or need to serve thousands of requests per minute, then my advice is: Strive for those numbers for every statement and transaction on critical the path.

Any query or transaction that consistently breaks one of those ratios, should be looked into.

And if your system doesnt meet the numbers, there are three fixes:
You either Eliminate the statements (just dont do it),
Optimize the offending statements (improve where-clause or indexes), or
Contain those statements (do it less frequent, and dont do it at peak-load periods).

A further word of warning on the general danger of "Ratios" is appropriate.

Phi, or the Golden ratio of 1.6180... Maybe we should look for some relevant way to use that, just for fun.We have had a clear demonstration, by Connor McDonald I think it was, that Hit-Ratios are not a good indicator for "performance". Connor was able to "set" the hit-ratio to whatever his manager wanted, with very little impact on performance [link?].

Subsequent other performance specialists, notably Cary Millsap, have indeed proven beyond doubt that ratios and other "aggregate" metrics are not good indicators of performance.

Back in the nineties, my colleagues and myself have used scripts based on numerous book (from the nineties...) to determine all sort of (hit)ratios for cache_size, shared-pool, redo en other settings. The ratios were derived by high-tech scripts we crafted, based on the books we bought. And from the nicely, SQL*Plus formatted, output we determined that we needed more memory, larger shared-pool, more redo-buffers or larger redo-files.

In very few cases did those ratios point us to the real problem and in even fewer cases did the advice make any difference.

In almost every case, the real solution came from B/Estat, tkprof, Statspack, AWR or ASH. And most of the time, we were brought straight to the heaviest query in v$sql.

That query was then either made more efficient (less buffer gets per execute) or was eliminated (just dont do it). In 90% of cases, that solved the problem (and sometimes improved a few ratios as a by-product).

My Conclusions:
Most Ratios are indeed of little use - they do not address the real problem.
But Some ratios can help you, if you know what you are doing (and if you know what your customer is trying to do).

With or without Ratios, you still need to find out what your system does, and use common sense to fix it.

I'll save a few other examples of Ratios for later, notably one I use to find locking problems. I think Martin can already guess which one.