Simple Oracle Dba: 2009

Monday, 30 November 2009

Tom Kyte at UKOUG on Complexity

That car may not have been as reliable tho...

You all guessed it: I love simple stuff.

Must date back to the days I watched the Flintstones and Yogi Bear. Great Fun! Even my own kids, highscool-teenagers, already blogged about how their nostalgic primary-school days were refreshingly simple (and they have tons of jpgs to illustrate their memories).

Today, at UKOUG TEBS, Tom Kyte gave us loads of "Simple" and very good advice.

"We still underestimate complexity."
So why are some vendor-supplied products so complex?
so why do we bespoke our own work to un-explainable levels of ~~complexity~~ sophistication?

"Less code is less bugs"
So why do we all increase footprints everywhere?

"Errors will happen, prepare to handle them!"
So why do we still hide, ignore, or report errors erratically ?

His security-message related to the Starship Enterprise is Brilliant!

And his main message remains this very sensible:
"Always Question Everything"
(including this anti-complexity rant)

I know, I know,
we are not wearing bearskins anymore.
we have even moved on from Cobol.

Real world business isnt simple.
And ~~Real Appl.... oops~~ Real World Requirements are driving this unavoidable(?) complexity. Not to mention that increasing complexity is part of my own job-seurity

But still, some of Toms messages were of Refreshing Simplicity.

Thanks Tom.

Saturday, 21 November 2009

simple oracle puzzle

Invalid Objects ?
Dependencies ?

Couldnt resist pointing out this Oracle puzzle by Rafu.

Friday, 30 October 2009

Oracle Performance Tuning with hit ratios - Risky.

The Ratio, as drawn by Leonardo da Vinci.

While he was sitting in front of a fireplace somewhere in New England, Martin Widlake prompted this quicky-post.

First my Confession: I still use "Ratios" in assessing and fixing performance.

I know current wisdom is that ratios and aggregated numbers are not good enough and should not be used in tuning. I agree that ratios can be abused, and can be hugely decepive.

But when guestimating the scalability or performance of an OLTP database-application, I use these three numbers:

Buffer-gets per row, 10 or less.
Buffer-gets per execute, 100 or less.
Buffer-gets per transaction, 1000 or less.

Martin will immediatly see why, IMHO, his result of 380 for the 1st or 2nd ratio was too high.

I picked those numbers for several, Simple,, reasons.

Firstly, these numbers are simple to remember.

Secondly, these numbers can simply be determined. You can get them from v$-views, from traces, from AWR (Licensed), or from statspack (free!).

Finally, these numbers (10, 100, 1000) have generally served me well in places where I looked into heavily loaded OLTP systems. I dont recall a single problem with any statement or transaction that managed to stay within those limits.

Now be careful. Your situation will be different. And you should not strive for these numbers per-se, but rather for happy customers. And whether these numbers really are appropriate for you depends on the number of calls you do, the type of work your users/systems do, on the capacity of your hardware, and the demands placed on your system. Some systems operate comfortable with one zero added to each number.

But if you need millisecond response times, or need to serve thousands of requests per minute, then my advice is: Strive for those numbers for every statement and transaction on critical the path.

Any query or transaction that consistently breaks one of those ratios, should be looked into.

And if your system doesnt meet the numbers, there are three fixes:
You either Eliminate the statements (just dont do it),
Optimize the offending statements (improve where-clause or indexes), or
Contain those statements (do it less frequent, and dont do it at peak-load periods).

A further word of warning on the general danger of "Ratios" is appropriate.

Phi, or the Golden ratio of 1.6180... Maybe we should look for some relevant way to use that, just for fun.

We have had a clear demonstration, by Connor McDonald I think it was, that Hit-Ratios are not a good indicator for "performance". Connor was able to "set" the hit-ratio to whatever his manager wanted, with very little impact on performance [link?].

Subsequent other performance specialists, notably Cary Millsap, have indeed proven beyond doubt that ratios and other "aggregate" metrics are not good indicators of performance.

Back in the nineties, my colleagues and myself have used scripts based on numerous book (from the nineties...) to determine all sort of (hit)ratios for cache_size, shared-pool, redo en other settings. The ratios were derived by high-tech scripts we crafted, based on the books we bought. And from the nicely, SQL*Plus formatted, output we determined that we needed more memory, larger shared-pool, more redo-buffers or larger redo-files.

In very few cases did those ratios point us to the real problem and in even fewer cases did the advice make any difference.

In almost every case, the real solution came from B/Estat, tkprof, Statspack, AWR or ASH. And most of the time, we were brought straight to the heaviest query in v$sql.

That query was then either made more efficient (less buffer gets per execute) or was eliminated (just dont do it). In 90% of cases, that solved the problem (and sometimes improved a few ratios as a by-product).

My Conclusions:
Most Ratios are indeed of little use - they do not address the real problem.
But Some ratios can help you, if you know what you are doing (and if you know what your customer is trying to do).

With or without Ratios, you still need to find out what your system does, and use common sense to fix it.

I'll save a few other examples of Ratios for later, notably one I use to find locking problems. I think Martin can already guess which one.

Friday, 25 September 2009

Happy Birthday OFA

Cary Millsap from Method-R just reminded us that the OFA standard is now 14 years old.

Check out his original post here:
http://carymillsap.blogspot.com/2009/09/happy-birthday-ofa-standard.html

The Good Thing about Optimal Flexible Architecture was that it created a standardized way to install and maintain the Oracle (database) software stack.

And in writing down OFA Cary used his meticulous, thorough approach. Every aspect of OFA was derived from a real "Requirement". And he paid special attention to Robustness and Maintainability. The configuration of Oracle software, and the life of the DBA became a lot simpler.

Many DBAs owe Cary for writing up OFA!

Friday, 26 June 2009

Oracle Security done Right - Simple

In the middle of a bit of a roller coaster week, I attended the workshop from Frits Hoogland about "Oracle Security done Right".

There is of course a lot to be said (and written) about security and every situation will demand a tailored approach.

But Frits laid down a relatively "Simple" approach.
He shows how to cover both accountability and auditability.

A true "Trust but Verify" approacht. And Simple.

The approach merits a good look and during the workshop already there was some good discussion.

It will not be the be-all-end-all of security, but it is a good start.
A bit of criticism wont hurt. A discussion will probably make the approach better.

Just remember: keep it simple. Please.

And that is why I think his approach is worth some attention.

Saturday, 23 May 2009

Single Table Hash Clusters: Size Matters.

OK, so we know that in certain situations, with the right design, the right size-assumption, and the correct usage, a Single Table Hash Cluster seens to be the most efficient way to get to a row, using a single block get.

I'll have to ask Richard Foote for confirmation on "the fastest way" as I still think that Index Organized Tables are very efficient indeed, and have less downsides. But since IOTs are B-tree indexes, they will result in 2 or more block-gets when the blevel increases.
IOTs are an old hobby of me, but they also nearly ended my career once. Complicated story. Some of you know it. Better not divulge.

I now want to know what happens if we get outside the "comfort zone" of an STHC. What if the well-intended design-assumptions about the number and the size of the records grow wrong.
In both cases, I expect the hash-mechanism to become less efficient.

Caveat: I can only test a limited number of cases, and will focus on the overflow-case. The case where too-many records end up in one block is harder to prove, as it will probably involve hash-collisions and extra CPU effort to retrieve a record. I'm lazy, and will focus on stuff I can prove with just simple queries and the odd autotrace.

Recap

Our initial STHC was created as a table with 64K records and an average recordsize of aporox 75 bytes. The original script to create it is still here (demo_sthc.sql). It was developed on 10.2.0.3 (winXP) with an 8k blocksize.

To play with sizing, and get a feeling for the storage-efficiency, I'll first do some sizing-reports to compare STHC to other options. I'm not worried about disk-storage (disk is cheap, innit?). But blocks in memory still take up valuable cache, and the reading, getting and scanning of blocks is still "work". Effort that can be avoided. Hence I always assume small (and Simple) is beautiful.

To view sizes of objects, I use an auxiliary script segsize

In the original demo, the segements for heap, iot and cluster had the following sizes:

TABLE_NAME    NUM_ROWS   BLOCKS empblck avgspc chncnt AVG_ROW_LEN COMPRES
---------- ----------- -------- ------- ------ ------ ----------- -------
HEAP            65,535      720      47    855      0          77 DISABLED
IOTB            65,535                              0          79 DISABLED
CLUT            65,535    1,014       9  2,886      0          78 DISABLED
 
 
SEGMENT_NAME            KBYTES   BLOCKS EXTENTS
-------------------- --------- -------- -------
HEAP                      6144      768      21
HEAP_PK                   4096      512      19
-------              --------- -------- -------
Total for HEAP           10240    1,280      40


IOTB_PK                  11264    1,408      26
--------             --------- -------- -------
Total for IOT            11264    1,408      26


CLU                       8192    1,024      23
CLUT_PK                   6144      768      21
--------             --------- -------- -------
Total for Cluster        14336    1,792      44

The cluster I created for my first demo seems relatively large and inefficient compared to the heap-table with the same data, as it takes up roughly 30% more space. That was probably because I didnt care for the size- and hashkey parameters yet.

All indexes are relatively large compared to their tables, due to the long and rather funny VC2 field that is the PK.

If we replace the key-values with, say, 'abcd'||to_char(rownum) or a sequenced number(n) field instead of the spelled-out numbers, all segments and notably the indexes turn out much smaller. But my original "Business Case" has long-ish strings of format AAAAnnnnnnnnn in a VC2 field as the PK (and search-criteria). GUID-strings of 64 hex-chars stored in a VC2 would be a comparable case.
And now I'm stuck with this demo...

btw: bad practice to store numbers in VC2 if you already _know_ they will be numbers, please dont.

Setting the SIZE

My initial errors were to not specify SIZE and to specify the wrong nr of HASHKEYS, supposedly also leading to hash-collisions. I'm a bit surprised nobody spotted it. But I guess there are not that many readers eh ?

I had already created the cluster with HASHKEYS 100, 200, 500, 1200 and 2000 and I still did not specify size (it is not mandatory, and I'm lazeeee).

Go on, run the script (demo_sthc_hashkeys.sql) for yourself
(At your own risk! Please dont sue me if it ~~F***s~~ Cleans up your data...)

SQL > @http://www.hotlinkfiles.com/files/2562031_70t6l/demo_sthc_hashkeys

(you did not have a table called CLU01 out there did you ?)

From this, I found that Oracle allocated the storage (in extents of 64KB) such that the nr of blocks in the cluster was just higher then the nr of specified hashkeys. It seems to assume that one hashkey per blocks will be required
(Question: Is this default behaviour when you omit the SIZE parameter? Consistent over all versions? Does this hold for all blocksizes? I dont know yet...).

I now decided to do it by-the-book and specified SIZE and HASHKEYS. And my next script (demo_sthc_75x64k.sql) would, I hoped, create a proper cluster ready to hold 64K records of 75 bytes each:

Oh come on, run it! (at your own risk etc...)

SQL > @http://www.hotlinkfiles.com/files/2562028_kvses/demo_sthc_75x64k

For those of you too lazy or too scared, part of it looks like this:

CREATE CLUSTER clu ( key varchar2(78 byte) ) 
  SINGLE TABLE 
    SIZE 75
HASHKEYS 65536;

CREATE TABLE clut (
  key            VARCHAR2(78 BYTE),
  num_key        NUMBER,
  date_modified  DATE,
  padding        VARCHAR2(300 BYTE)
, constraint clut_pk primary key ( key)
)
    CLUSTER clu ( key );

And the allocated storage for the resulting (still-empty) cluster is ....:


SEGMENT_NAME            KBYTES   BLOCKS EXTENTS
-------------------- --------- -------- -------
CLU                       5120      640      20
CLUT_PK                     64        8       1
---------            --------- -------- -------
Total                     5184      648      21

That 5 MB (in my case) is surprisingly close to that of the original heap-table (6 MB in my case).
But also surprisingly close to 65535 x 75 = 4.8 MB.

Will this pre-created cluster-segment contain sufficient free-space?
And will it be able to cater for contingencies ?

Let me insert the data...

(if you had run the script at your own SQL-prompt, you would know all the answers by now...)

SYSTEM @ DB10  > insert into clut select * from heap ;

65535 rows created.

SYSTEM @ DB10  > commit ;

Commit complete.

SYSTEM @ DB10  > 
SYSTEM @ DB10  > analyze table clut compute statistics ;

Table analyzed.

SYSTEM @ DB10  > set echo off

SEGMENT_NAME                 KB   BLOCKS
-------------------- ---------- --------
CLU                        8192    1,024
CLUT_PK                    6144      768

Heeeey.
After I inserted the same 64K rows, the cluster has increased in size to the same 8M size of my previously created cluster.

Next time, I might as well create my STHC with a bit of contingency. And for further experimenting I will aim to create this particular cluster with SIZE 100 and HASHKEYS 80000 to end up with the correct size straight away (demo_sthc_100x80k.sql):

One more time I will try to tempt you to run one of my scritps...
(at your own risk, of course...)

SQL > @http://www.hotlinkfiles.com/files/2562061_v4rto/demo_sthc_100x80k

This cluster is now finally created at the right size (space allocated) and if you ran the script (did you? on 8k blocks?) you can verify that it absorbed all the rows without growing.
Supposedly, every row now really IS accessible by a single block get, and hash-collisions should be avoided.

I did verify a few selects, similar to the ones in the previous episode and I did find the 1(one) consistent get again - At least that still worked ...

Lessons from this so far:
- Pay attention to SIZE and HASHKEYS, and dont run with defaults here.
- Even properly(?) specified parameters dont seem to suffice. Always Verify!
- And (minor issue but...) my STHC still takes up more space then my heap-table.

PS: I will not be surprised or offended if someone comes out and says:
That is all wrong, and it is very much different on this other system... Please do!
Such is life with Oracle.
And we may be able to learn something still.

In the next experiment, I will try to overfill the STCH with data to see if any problems arise from that.

It takes some bloody long train-journeys, and a significant follow-up to investigate all this obscure stuff. And all the searching leaves you with a feeling you know even less.
Arcane stuff that nobody uses anyway.

Oh, btw, For those of you who want to set up a so-called mission-critical system where my demo-scripts would "accidentally" delete the data owned by your boss:
You Do Run this stuff AT YOUR OWN RISK. Period.

Sunday, 10 May 2009

Oracle Single Table Hash Clusters are Faster.

This post is about Single Table Hash Clusters.

It is also about The Fastest way to get to 1 single Record of data.
I will try to prove that with a demo.

This post (a bit of a rant, as usual) is prompted by several things:
Prompted by Richard Foote (Recommended reading), who justly argues that Every Block Get counts and who is on a mission to Educate us on the correct use of Indexes.
Prompted by a few reactions to my Endinburgh Presentation about (cost based-) optimized access to data. It all starts with good access-paths to your data.
Prompted by a client with 24 CPUs in a database-server, trying to squeeze the last milliseconds out of 25 million (business-)transactions per hour. (how many app-servers, JVMs and CPU's does it take to process 25 Million items per hour....?? - I don't want to divulge, but the numbers on the "estate" are multiples of 24...).
Prompted by a good lunch-discussion with same client on IOTs and other nerdy but sometimes crucial optimizations (more material for posts and demos, thx J)

In this particular case, the client had a requirement to (repeatedly??) check existence and/or content of a record based on PK-access. This particular data was relatively static (lookup-type data) and apparently only modified overnight.

There is also the need to check on certain values for de-duplication, and that is a different and more challenging matter. Such de-dup-data is modified (inserted) constantly, and needs to be purged based on time, hence "needs" another index. I might eleborate on that some other time.

For the moment, I will also deliberately discard the possibilities of using SQL-resultset-caching by Oracle, or the option of Hash-map tables or other (indexed-)arrays cached by some Java component, or Oracle Coherence. All of those I consider usable, but much more complex and intrusive then a Simple, well-designed database-solution.
I am a data-beast after all.

Why (not) clusters...

Single Table Hash Clusters can fetch a row with a Single Block Get. That beats any other method of access.

Of course, those who have read and understood the manuals, metalink, and various books on the subject already know all of this, and dont need to waste their time on here. But I see surprisingly few implementations of Clusters and IOTs ...

Normally, I shy away from using clusters because you need to know up front:
1) How many records go into your tables (parents and children for multi-table clusters), and
2) How big those records will be and
3) For Hash-clusters, Critical access is on the clusterkey.
Those three conditions are rarely met, and even more difficult to enforce over the lifetime of a system. But sometimes you get lucky.
And if any of the Oracle Cluster-constructions is useful, my vote goes to Single Table Hash Clusters. Simple, Elegant and ... Fast!

If you already know your theory about Single-Table-Hash-clusters, you dont need to read on. You know we will end up showing 1 single consistent get...

The DIY Demo

The rest of the post will allow you yourself to demonstrate on your database how Fast Single Table Hash Clusters can be.

Copy this into your SQL*Plus:

SQL> @http://www.hotlinkfiles.com/files/2531452_b8v0g/demo_sthc.sql

Press enter a few times, and you can re-read the output in your own spool file...

The script creates 3 tables; a heaptable, an IOT and a Single-Table-Hash-Clustered one. All tables are filled with the same 64K records. The Key is deliberately chosen to be a Varchar2 with funny distribution to resemble client case. And as it uses the Julian date 'JSP' format to generate spelled-out-numbers, your results may vary depending on your NLS settings. Any quirks or issues: send me a comment or a postcard.

If anyone spots flaws in the demo: let us know in a comment or an email. My knowledge is only skindeep and errors are easily made. Nor am I as skilled in education as Jonathan Lewis or Tom Kyte. My original demo was only whipped up to show to an Architect. Getting stuff out ready for the dba-community-at-large is more work then I anticipated, and my knowlege on the subject of Clusters is not as deep as I wished.

And if my demo-script fails you, check out the only other mention of STHC-with-demo that I recall seeing, by Joze Senegacnik here.

The demo uses on only 64k records as that was an amount I could test and re-test on my laptop in reasonable timeframes (10 seconds, I'm an impatient person). But it will work on Any size table, as long as the same conditions are met: knowing your data (numbers and size of records), and accessing on (PK-)equal condtion. Single Table Hash Clusters scale nicely.

The demo script can be found Here: demo_sthc.sql. It can be run with no modification in just about any schema able to create tables, indexes and clusters (of course, I always log on as system, but I have tested on 9.2.0.8, 10.2.0.3 and 11.1.0.6...).

The Single table Has cluster is created like this:


create cluster clu 
( key         varchar2(78 byte) 
)
single table
hashkeys 1000;

CREATE TABLE clut
( key           VARCHAR2(78 BYTE),
  num_key       NUMBER,
  date_modified DATE,
  padding       VARCHAR2(300 BYTE), 
  constraint clut_pk primary key ( key)
)
CLUSTER clu ( key );

Note that the 1000 hashkeys I use is based on the fact that I cold fit approx 64 records in my 8K blocks, your results will vary if you use different blocksize or different size records. All those nitty-gritty details are part of why this is a rather nerdy and underused concept. Material for a follow-up here...

When the table is filled, I then fire some selects with a Key=value condition at the tables, and I use Autotrace to examine the 2nd execute of each query, like this:


select * from heap
where key = 'FIVE HUNDRED'
/
set autotrace on
/
set autotrace off

So, How Fast is this thing ... ?

(Did you run the demo in your own database yet... ?)

For a Single Table Hash Cluster, the result is 1 (one) Consistent get.

The result for a conventional heap table is 4 consistent gets. For an IOT it is 3 consistent gets (same index, but no table-access). And these numbers risk to increase by 1 if the blevel of the indexes goes from 2 to 3.

That may not seem like a big improvement, but it happens to be 3 or 4 times less Logical IO then the alternatives. And if executed at 25-million/hr, that is still a significant saving in LIO.

Of course, there is more to it. Such as the additional CPU-usage, chooseing the hash-key and guesstimating the size-value. And how does it perform when your data-volume unexpectedly explodes and it all goes horribly wrong....
I might re-run the demo a bit and experiment. I see a follow-up coming on the next long train journey.

Wednesday, 29 April 2009

Edinburg and CBO Demos

I'm having some problems uploading my files to a server.
bear with me.

demo-scripts should be here soon:

HERE!

Thursday, 23 April 2009

select count x with nulls

I came across this when trying to find why a query didnt work. And I was repeatedly assured it had always worked correctly. Of course it had always worked correctly, or rather, it had been made to work in a "satisfactory" way to the masters and callers that examined the output.

So how come it suddenly refused to return the expected results?
Surely this was a database bug ?

Well, it wasnt the database at fault, nor was it the code at fault. The code in question was a roundabout way to check for existence or presence of a number of children for a set of parents. Not elegant either, mais soit.

You can imagine that "the problem" did not occur immediately when the query was run in SQL*Plus, but we persisted, looking for the oddities and edge-conditions.

When we findally found it, I whipped up a demo in SQL*plus to show the developers. I had to disguise (rewrite) the query to remove references to original table, and I simplified it. The original was a tad more complex (30+ lines..), with more bind-vars and the issue hidden in the middle of it. But the concept was the same.

You can copy-paste the whole thing into SQL*Plus, provided you have scott/tiger with empt/depth.

Go see for yourself.



/*

select_null.sql: 
  how you can acidentially not select a thing..

You get the Idea: 
If you count empty strings, or NULLs, 
the result will be ... 

*/

-- First we show accepted behaviour,
-- you count records by selecting count('x'):

select count ('x') from dept ;

-- And the following is an accepted (Oracle) quirk,
-- and a known pitfall...

select count ('') from dept ;


-- But if the count is hidden in a subqry, 
-- and if the count is disquised with a bind-var,
-- Suddenly; a lot depends on the bind-var ....


variable x varchar2(10) ;

exec :x := 'a' ;

select :x as bindvar, d.* 
from dept d 
where ( select count (:x) 
        from emp e 
        where e.deptno = d.deptno 
      ) > 0 
/

-- and if the variable is empty...

exec :x := '' ;

select :x as bindvar, d.* 
from dept d 
where ( select count (:x) 
        from emp e 
        where e.deptno = d.deptno 
      ) > 0 
/

-- Be careful what you count for, 
-- you might get it...

Remarks on a postcard in the comment dept.

NB: Does anyone have an established, and format-perserving way to present SQL-code in blogger? My stmnts seem to left-align whatever I do...

Wednesday, 18 March 2009

Simple lessons from the DBMS SIG

Crisis or not, there was a reasonably good turnout on the last DBMS Special Interest group this Tuesday (17th March).

If you need to justify membership cost and time to attend SIGs like this, the one single item is probably Phil Davies from Oracle Support who talks you through the latest scary bugs. Some of those, I can almost recite by now, but I know full well how useful this horror-stories can be for project who are preparing to upgrade or migrate.

The XPLAN presetation by David Kurtz was instructive, as there are still a few options in dbms_xplan that I plan to explore when time or necessity arise.

For monitoring, the rapidly delivered presentation from Jason Lester was refreshing. We all know what Grid Control can do by now, but there is always need for more, for different and for more customizable tools. I do not need to be convinced of the advantages of Open Source Software (I did a similar topic myself in ... eeeh... 2004, for a different forum and I stil stand by that message).

Although I have seen Zabbix (http://www.zabbix.org/) used more often then Nagios (www.nagios.org), I would like to add my own endorsement for any OSS or otherwise "independent" monitoring tool. You need some independent tool, next to GridControl and other vendor supplied solutions (openview still sprouts up everywhere - sure we use it if someone has paid for it, or enforced it onto us.)

But quite often, for the actual, Real Application Monitoring we will also implement our own home-grown or OSS solution next to the GridControl. You generally need a tool where you can add/tweak your own monitoring, and that you can rely on to do (or build) the stuff that the commecial tools dont do, or dont want to do. And despite what the book, the courses, and the management say, any DBA or sysadmin worth his salt will DIY in this area. I am also still guilty of jotting up my own unix and sqlplus scripts in various situations. They will do Exactly what I need when I need it. Some of my unix/linux scripts have dates in them going back to 1994 (sad eh?).

Mental note: do a topic on "simple monitoring" sometime.

Pete Finnigan had his moment with explaining the uses and pitfalls of VPD (or is that RLS or FGAC? ... ). Always nice to hear PXF himself explain (the lack of) security in his down to earth way. He keeps going on about security being a "layered" thing, and how adding more security can also itself against you if you dont do it properly and exactly Right. Well done, Cheers Pete!

The cherry on the cake was Jonathan -scratchpad- Lewis explaining how Simple (and yet Complicated) the analysis of Statspack (aka AWR, if you have the budget) really is.
Jonathan did a Great job of letting statspack-reports explain themselves, and with constant "challenging" and checking of his own assumptions.

Key messages that I retained, somewhat biased by my own experience, were:

1. Read from the top, get a feeling for overall functionality and the load of the system in hand. How much time, how many CPUs, how much memory and IO was used. Quite basic.
2. Relate the work done, as shown in statspack, to the capacity of the underlying
system: was the database challenging the hardware or not ? Is it a capacity problem or a single-user, single-application problem.
3. Dont be afraid to ask questions. always.

Jonathan refuses to "write the book" on statspack, with the excuse that there will be too much material to cover, and he is afraid to leave out items that are deemed critically important. My reply to him is along the lines of: Real Application Statspackreading is about common sense.
And if Jonahtan can convey some good messages in a 1 hour presentation, surely
it must be possible to write a not-too-complex book to help the average
reader out there in 90% of the cases? For the other 10% you can always hire a specialist.

I become more and more tempted to write a few "simple" things about Statspack, the Real Fantastic CBO, and the blessings of proper physical design for Real Appliations. The book by Tom Kyte is all you really need (need link).
Hm, will I too then get sucked into the wonderful CBO? (Real Application CBO; it gets it right most of the time... )
My next presentation maybe.

Friday, 6 March 2009

deadlocks: get rid of the bitmap indexes

This one is for a certain Peter, he is a DBA at one of my customers.

We had a problem: Deadlocks happening. Bitmap index.

Users: Complaining, deadlock errors ... ???
we: That bitmap index is causing the deadlocks.
Duhvelopers: But we NEED that index for job xyz, and others Might use it too.
we: Sure, but that index is causing your deadlocks
Users: still complaining!

Yes, we know there is more to Deadlocks, but this was the Bitmap!
Peter is the DBA who found Bug 6175584.
And in the lenghty process of convincing metalink to see that bug, he learned more about Ora-00060 then you will ever need - until version 11.2 comes along at least.

Anyway, this bitmap index...

SQL> Alter index monitor usage ;
Showed the index was indeed being used. But by whom ?
Only by the one intended job or also by other more user-relevant queries?

Peter, who is the local onsite-DBA where I am only a passing consultant, finally managed to convince the "architecs" and other title-bearers that the index should go.

It went, and with it went the deadlock-error.
General Performance also felt better.

Apparently nobody but the batch job was using that index.
Or if they did, it was not helping them.
Problem Solved.

Then Anjo Kolk, over a nice meal, mentioned he had seen something similar, and had it fixed while on his hand-free carphone.
And I should have realized how simple it is to "prove" this case to the Architects.

A Query like this:
SQL> select sql_id, object_name
from v$sql_plan
where object_name like '%STATUS_BMP';
Or a similar qry against the WRH$ table will show you immediately which statements have used a particular index !

Peter is off to check a few other indexes straight away.
Kuddos to Anjo for pointing it out. Sometimes a Simple solution is right under your nose. But it takes a nice "incident" to learn even simple lessons.

As for the developers and architects who concoct it all: No hard feelings. After all, the Funtastic manual says bitmpas are for low-cardinality colums. And at least their tests had been really fast using "this bitmap thing".
They had no bad intentions, and where would we be without their complex solutions and indexes eh ?

Simple Oracle Dba