Simple Oracle Dba: oracle

Showing posts with label oracle. Show all posts

Thursday, 2 October 2014

Riding to SIOUG and HROUG

Usergroups are the lifeblood of a good community. And I like to help a little, but visiting or speaking if I can.

Hence I will be speaking at SIOUG.si and HROUG.hr usergroup conferences together with some some old friends and lots in interesting new people that I look forward to meeting.

Check the links to the conferences here. SIOUG is in Lubljana, the lovely capital of Slovenia:

And HROUG is in Rovinj, on the Adriatic coast:

This trip will not be as long as expedition I did last summer, visiting 3 conferences, doing 12000km including some classic mountain passes. But I still look forward to riding a lot of interesting roads on
the motorcycle.

The previous trip was a Great Success. So let's do it again.

Here is the approximate route-plan for the coming weeks:

If anyone who lives along the route wants a quick consulting visit or simply has good coffee, drop me a note. I always like to stop by hospitable folks (and I may need to dry some clothes if the weather isn't cooperating)

In case of moderate weather, I will have to use the toll-roads and tunnels like everybody else, but if the weather is good, I will search out some real motorcycle roads !

Best thing I ever did: re-start my motorcycle-life after so many years!

Saturday, 6 March 2010

multible blocksizes in an Oracle Database

In short: Not useful.

More elaborated: Messy, waste of memory-space and admin-effort.

Let me explain:

I've come across this discussion again and I considered myself lucky that Charles Hooper has done all the research already.

I'm summarizing his findings here:

Multiple Blocksizes do NOT offer Any Proven Advantage.

The theory about more efficient indexes and better managed cache is good, and I dont deny there is good reasoning behind it. But in practice, having multiple blocksizes and multiple db-nK-caches doesnt make a difference.

It is a waste of the extra (little bit of) work.

And most likely, you end up wasteing cache-space because you hard-divide the cache space into chunks and you prevent the Oracle LRU-mechanism from utilizing all available cache as it sees fit.

Of course, you may be the exception that proves the rule, and I would like to hear from you in that case, but until further notice, I'll stick with my motto:

Simple = Better.

Saturday, 6 February 2010

sqldeveloper - the swiss knife for Oracle.

SQL Developer is the Swiss Army Knife for Oracle Developers (and DBAs and other interested parties, for that matter)

This post is about SQLDeveloper and about "The Book" that introduces it.

In short: Recommended. The book and the tool.

Recommended for Newbies and experienced folk alike.

Recommended for Developers, DBA's and other users of the database.

Let me explain why.

Manuals are sooo last century.
Remember the boxes that came with Oracle7 and Oracle8/i?
(looks for picture of pallet)

Then in Oracle 9, there was a CD with .pdfs. I've seen people consume a tree or two on that, and I'm guilty myself of selectively printing chapters to study on long-distance trains and put yellow-stickies on. Laptop batteries were brief in those days.

I would be interested to know how many ppl dont carry books at all anymore, just .pdfs on laptops, e-readers or ipads [jokelink]. I have switched to laptop-reading for most manuals, but not for leasure-books. And the person who used to say: "From their bookshelf, I recognize the type of DBA/Developer" will have to flip through his customer's laptop or ipad to do an assesment"

Normally, I hestitate to recommend books to developers (or anyone) because there is a lot of rambling rubbish out there. Books tend to follow the hobby-horse of the author (at best), books may contains outdated information (a lot, given the pace of Oracle 7-8-9-10-11, the opt_cost_adj=120 and the go_faster=true|false come to mind). At worst, the information in books can be rampantly wrong. Notably Jonathan Lewis points out that you should be critical of information on the internet, the same caveat applies to the bookshop: Dont believe everything that is printed.

Oracle SQL Developer by Sue Harper, another link to the publisher ...

But the SQLDeveloper book by Sue Harper is one of the Good Exceptions.

This is how a good introductory book should be.
It simply presents the tool and all its possibilities.
It doesnt impose a methodology.
It doesnt try to convince you of anything.

Like any truly good book, it allows you to do your own thinking.

And it does a very good job of introducing the tool and its many possible uses.
Check it out, and decide for yourself. You might be surprised.

To get an impression: you can read the first chapter Here at the publishers website.

To the Newbie developer it is a good introduction, to the more seasoned SQL-Plus user (or users of other tools, for that matter) it is a good reference.

As a die-hard command-liner, I will of course stick to my #$% prompts, but for those of you who have to be more productive, or who are forced off other tools for cost-cutting reasons this book can help you on your road to fortune and glory.

I'd also like to mention that there is a trend: Sue is not the first oracle Product Manager to write a book, and I had already enjoyed the writing of Larry Carpenter who runs the team that nurtured DataGuard into bloom. I can heartily recommend his book as well. I have recently used that too, to convince Customers/managers/leaders/victims/architects of the benefits of Dataguard (and soon to come: GoldenGate, running on a system near you). Larrys book was reviewd by Harold "Prutser" van Brederode

A tip of the hat and a big "Thank You" to both authors for going through the process of writing those books!

Monday, 30 November 2009

Tom Kyte at UKOUG on Complexity

That car may not have been as reliable tho...

You all guessed it: I love simple stuff.

Must date back to the days I watched the Flintstones and Yogi Bear. Great Fun! Even my own kids, highscool-teenagers, already blogged about how their nostalgic primary-school days were refreshingly simple (and they have tons of jpgs to illustrate their memories).

Today, at UKOUG TEBS, Tom Kyte gave us loads of "Simple" and very good advice.

"We still underestimate complexity."
So why are some vendor-supplied products so complex?
so why do we bespoke our own work to un-explainable levels of ~~complexity~~ sophistication?

"Less code is less bugs"
So why do we all increase footprints everywhere?

"Errors will happen, prepare to handle them!"
So why do we still hide, ignore, or report errors erratically ?

His security-message related to the Starship Enterprise is Brilliant!

And his main message remains this very sensible:
"Always Question Everything"
(including this anti-complexity rant)

I know, I know,
we are not wearing bearskins anymore.
we have even moved on from Cobol.

Real world business isnt simple.
And ~~Real Appl.... oops~~ Real World Requirements are driving this unavoidable(?) complexity. Not to mention that increasing complexity is part of my own job-seurity

But still, some of Toms messages were of Refreshing Simplicity.

Thanks Tom.

Friday, 30 October 2009

Oracle Performance Tuning with hit ratios - Risky.

The Ratio, as drawn by Leonardo da Vinci.

While he was sitting in front of a fireplace somewhere in New England, Martin Widlake prompted this quicky-post.

First my Confession: I still use "Ratios" in assessing and fixing performance.

I know current wisdom is that ratios and aggregated numbers are not good enough and should not be used in tuning. I agree that ratios can be abused, and can be hugely decepive.

But when guestimating the scalability or performance of an OLTP database-application, I use these three numbers:

Buffer-gets per row, 10 or less.
Buffer-gets per execute, 100 or less.
Buffer-gets per transaction, 1000 or less.

Martin will immediatly see why, IMHO, his result of 380 for the 1st or 2nd ratio was too high.

I picked those numbers for several, Simple,, reasons.

Firstly, these numbers are simple to remember.

Secondly, these numbers can simply be determined. You can get them from v$-views, from traces, from AWR (Licensed), or from statspack (free!).

Finally, these numbers (10, 100, 1000) have generally served me well in places where I looked into heavily loaded OLTP systems. I dont recall a single problem with any statement or transaction that managed to stay within those limits.

Now be careful. Your situation will be different. And you should not strive for these numbers per-se, but rather for happy customers. And whether these numbers really are appropriate for you depends on the number of calls you do, the type of work your users/systems do, on the capacity of your hardware, and the demands placed on your system. Some systems operate comfortable with one zero added to each number.

But if you need millisecond response times, or need to serve thousands of requests per minute, then my advice is: Strive for those numbers for every statement and transaction on critical the path.

Any query or transaction that consistently breaks one of those ratios, should be looked into.

And if your system doesnt meet the numbers, there are three fixes:
You either Eliminate the statements (just dont do it),
Optimize the offending statements (improve where-clause or indexes), or
Contain those statements (do it less frequent, and dont do it at peak-load periods).

A further word of warning on the general danger of "Ratios" is appropriate.

Phi, or the Golden ratio of 1.6180... Maybe we should look for some relevant way to use that, just for fun.

We have had a clear demonstration, by Connor McDonald I think it was, that Hit-Ratios are not a good indicator for "performance". Connor was able to "set" the hit-ratio to whatever his manager wanted, with very little impact on performance [link?].

Subsequent other performance specialists, notably Cary Millsap, have indeed proven beyond doubt that ratios and other "aggregate" metrics are not good indicators of performance.

Back in the nineties, my colleagues and myself have used scripts based on numerous book (from the nineties...) to determine all sort of (hit)ratios for cache_size, shared-pool, redo en other settings. The ratios were derived by high-tech scripts we crafted, based on the books we bought. And from the nicely, SQL*Plus formatted, output we determined that we needed more memory, larger shared-pool, more redo-buffers or larger redo-files.

In very few cases did those ratios point us to the real problem and in even fewer cases did the advice make any difference.

In almost every case, the real solution came from B/Estat, tkprof, Statspack, AWR or ASH. And most of the time, we were brought straight to the heaviest query in v$sql.

That query was then either made more efficient (less buffer gets per execute) or was eliminated (just dont do it). In 90% of cases, that solved the problem (and sometimes improved a few ratios as a by-product).

My Conclusions:
Most Ratios are indeed of little use - they do not address the real problem.
But Some ratios can help you, if you know what you are doing (and if you know what your customer is trying to do).

With or without Ratios, you still need to find out what your system does, and use common sense to fix it.

I'll save a few other examples of Ratios for later, notably one I use to find locking problems. I think Martin can already guess which one.

Friday, 25 September 2009

Happy Birthday OFA

Cary Millsap from Method-R just reminded us that the OFA standard is now 14 years old.

Check out his original post here:
http://carymillsap.blogspot.com/2009/09/happy-birthday-ofa-standard.html

The Good Thing about Optimal Flexible Architecture was that it created a standardized way to install and maintain the Oracle (database) software stack.

And in writing down OFA Cary used his meticulous, thorough approach. Every aspect of OFA was derived from a real "Requirement". And he paid special attention to Robustness and Maintainability. The configuration of Oracle software, and the life of the DBA became a lot simpler.

Many DBAs owe Cary for writing up OFA!

Friday, 26 June 2009

Oracle Security done Right - Simple

In the middle of a bit of a roller coaster week, I attended the workshop from Frits Hoogland about "Oracle Security done Right".

There is of course a lot to be said (and written) about security and every situation will demand a tailored approach.

But Frits laid down a relatively "Simple" approach.
He shows how to cover both accountability and auditability.

A true "Trust but Verify" approacht. And Simple.

The approach merits a good look and during the workshop already there was some good discussion.

It will not be the be-all-end-all of security, but it is a good start.
A bit of criticism wont hurt. A discussion will probably make the approach better.

Just remember: keep it simple. Please.

And that is why I think his approach is worth some attention.

Saturday, 23 May 2009

Single Table Hash Clusters: Size Matters.

OK, so we know that in certain situations, with the right design, the right size-assumption, and the correct usage, a Single Table Hash Cluster seens to be the most efficient way to get to a row, using a single block get.

I'll have to ask Richard Foote for confirmation on "the fastest way" as I still think that Index Organized Tables are very efficient indeed, and have less downsides. But since IOTs are B-tree indexes, they will result in 2 or more block-gets when the blevel increases.
IOTs are an old hobby of me, but they also nearly ended my career once. Complicated story. Some of you know it. Better not divulge.

I now want to know what happens if we get outside the "comfort zone" of an STHC. What if the well-intended design-assumptions about the number and the size of the records grow wrong.
In both cases, I expect the hash-mechanism to become less efficient.

Caveat: I can only test a limited number of cases, and will focus on the overflow-case. The case where too-many records end up in one block is harder to prove, as it will probably involve hash-collisions and extra CPU effort to retrieve a record. I'm lazy, and will focus on stuff I can prove with just simple queries and the odd autotrace.

Recap

Our initial STHC was created as a table with 64K records and an average recordsize of aporox 75 bytes. The original script to create it is still here (demo_sthc.sql). It was developed on 10.2.0.3 (winXP) with an 8k blocksize.

To play with sizing, and get a feeling for the storage-efficiency, I'll first do some sizing-reports to compare STHC to other options. I'm not worried about disk-storage (disk is cheap, innit?). But blocks in memory still take up valuable cache, and the reading, getting and scanning of blocks is still "work". Effort that can be avoided. Hence I always assume small (and Simple) is beautiful.

To view sizes of objects, I use an auxiliary script segsize

In the original demo, the segements for heap, iot and cluster had the following sizes:

TABLE_NAME    NUM_ROWS   BLOCKS empblck avgspc chncnt AVG_ROW_LEN COMPRES
---------- ----------- -------- ------- ------ ------ ----------- -------
HEAP            65,535      720      47    855      0          77 DISABLED
IOTB            65,535                              0          79 DISABLED
CLUT            65,535    1,014       9  2,886      0          78 DISABLED
 
 
SEGMENT_NAME            KBYTES   BLOCKS EXTENTS
-------------------- --------- -------- -------
HEAP                      6144      768      21
HEAP_PK                   4096      512      19
-------              --------- -------- -------
Total for HEAP           10240    1,280      40


IOTB_PK                  11264    1,408      26
--------             --------- -------- -------
Total for IOT            11264    1,408      26


CLU                       8192    1,024      23
CLUT_PK                   6144      768      21
--------             --------- -------- -------
Total for Cluster        14336    1,792      44

The cluster I created for my first demo seems relatively large and inefficient compared to the heap-table with the same data, as it takes up roughly 30% more space. That was probably because I didnt care for the size- and hashkey parameters yet.

All indexes are relatively large compared to their tables, due to the long and rather funny VC2 field that is the PK.

If we replace the key-values with, say, 'abcd'||to_char(rownum) or a sequenced number(n) field instead of the spelled-out numbers, all segments and notably the indexes turn out much smaller. But my original "Business Case" has long-ish strings of format AAAAnnnnnnnnn in a VC2 field as the PK (and search-criteria). GUID-strings of 64 hex-chars stored in a VC2 would be a comparable case.
And now I'm stuck with this demo...

btw: bad practice to store numbers in VC2 if you already _know_ they will be numbers, please dont.

Setting the SIZE

My initial errors were to not specify SIZE and to specify the wrong nr of HASHKEYS, supposedly also leading to hash-collisions. I'm a bit surprised nobody spotted it. But I guess there are not that many readers eh ?

I had already created the cluster with HASHKEYS 100, 200, 500, 1200 and 2000 and I still did not specify size (it is not mandatory, and I'm lazeeee).

Go on, run the script (demo_sthc_hashkeys.sql) for yourself
(At your own risk! Please dont sue me if it ~~F***s~~ Cleans up your data...)

SQL > @http://www.hotlinkfiles.com/files/2562031_70t6l/demo_sthc_hashkeys

(you did not have a table called CLU01 out there did you ?)

From this, I found that Oracle allocated the storage (in extents of 64KB) such that the nr of blocks in the cluster was just higher then the nr of specified hashkeys. It seems to assume that one hashkey per blocks will be required
(Question: Is this default behaviour when you omit the SIZE parameter? Consistent over all versions? Does this hold for all blocksizes? I dont know yet...).

I now decided to do it by-the-book and specified SIZE and HASHKEYS. And my next script (demo_sthc_75x64k.sql) would, I hoped, create a proper cluster ready to hold 64K records of 75 bytes each:

Oh come on, run it! (at your own risk etc...)

SQL > @http://www.hotlinkfiles.com/files/2562028_kvses/demo_sthc_75x64k

For those of you too lazy or too scared, part of it looks like this:

CREATE CLUSTER clu ( key varchar2(78 byte) ) 
  SINGLE TABLE 
    SIZE 75
HASHKEYS 65536;

CREATE TABLE clut (
  key            VARCHAR2(78 BYTE),
  num_key        NUMBER,
  date_modified  DATE,
  padding        VARCHAR2(300 BYTE)
, constraint clut_pk primary key ( key)
)
    CLUSTER clu ( key );

And the allocated storage for the resulting (still-empty) cluster is ....:


SEGMENT_NAME            KBYTES   BLOCKS EXTENTS
-------------------- --------- -------- -------
CLU                       5120      640      20
CLUT_PK                     64        8       1
---------            --------- -------- -------
Total                     5184      648      21

That 5 MB (in my case) is surprisingly close to that of the original heap-table (6 MB in my case).
But also surprisingly close to 65535 x 75 = 4.8 MB.

Will this pre-created cluster-segment contain sufficient free-space?
And will it be able to cater for contingencies ?

Let me insert the data...

(if you had run the script at your own SQL-prompt, you would know all the answers by now...)

SYSTEM @ DB10  > insert into clut select * from heap ;

65535 rows created.

SYSTEM @ DB10  > commit ;

Commit complete.

SYSTEM @ DB10  > 
SYSTEM @ DB10  > analyze table clut compute statistics ;

Table analyzed.

SYSTEM @ DB10  > set echo off

SEGMENT_NAME                 KB   BLOCKS
-------------------- ---------- --------
CLU                        8192    1,024
CLUT_PK                    6144      768

Heeeey.
After I inserted the same 64K rows, the cluster has increased in size to the same 8M size of my previously created cluster.

Next time, I might as well create my STHC with a bit of contingency. And for further experimenting I will aim to create this particular cluster with SIZE 100 and HASHKEYS 80000 to end up with the correct size straight away (demo_sthc_100x80k.sql):

One more time I will try to tempt you to run one of my scritps...
(at your own risk, of course...)

SQL > @http://www.hotlinkfiles.com/files/2562061_v4rto/demo_sthc_100x80k

This cluster is now finally created at the right size (space allocated) and if you ran the script (did you? on 8k blocks?) you can verify that it absorbed all the rows without growing.
Supposedly, every row now really IS accessible by a single block get, and hash-collisions should be avoided.

I did verify a few selects, similar to the ones in the previous episode and I did find the 1(one) consistent get again - At least that still worked ...

Lessons from this so far:
- Pay attention to SIZE and HASHKEYS, and dont run with defaults here.
- Even properly(?) specified parameters dont seem to suffice. Always Verify!
- And (minor issue but...) my STHC still takes up more space then my heap-table.

PS: I will not be surprised or offended if someone comes out and says:
That is all wrong, and it is very much different on this other system... Please do!
Such is life with Oracle.
And we may be able to learn something still.

In the next experiment, I will try to overfill the STCH with data to see if any problems arise from that.

It takes some bloody long train-journeys, and a significant follow-up to investigate all this obscure stuff. And all the searching leaves you with a feeling you know even less.
Arcane stuff that nobody uses anyway.

Oh, btw, For those of you who want to set up a so-called mission-critical system where my demo-scripts would "accidentally" delete the data owned by your boss:
You Do Run this stuff AT YOUR OWN RISK. Period.

Wednesday, 18 March 2009

Simple lessons from the DBMS SIG

Crisis or not, there was a reasonably good turnout on the last DBMS Special Interest group this Tuesday (17th March).

If you need to justify membership cost and time to attend SIGs like this, the one single item is probably Phil Davies from Oracle Support who talks you through the latest scary bugs. Some of those, I can almost recite by now, but I know full well how useful this horror-stories can be for project who are preparing to upgrade or migrate.

The XPLAN presetation by David Kurtz was instructive, as there are still a few options in dbms_xplan that I plan to explore when time or necessity arise.

For monitoring, the rapidly delivered presentation from Jason Lester was refreshing. We all know what Grid Control can do by now, but there is always need for more, for different and for more customizable tools. I do not need to be convinced of the advantages of Open Source Software (I did a similar topic myself in ... eeeh... 2004, for a different forum and I stil stand by that message).

Although I have seen Zabbix (http://www.zabbix.org/) used more often then Nagios (www.nagios.org), I would like to add my own endorsement for any OSS or otherwise "independent" monitoring tool. You need some independent tool, next to GridControl and other vendor supplied solutions (openview still sprouts up everywhere - sure we use it if someone has paid for it, or enforced it onto us.)

But quite often, for the actual, Real Application Monitoring we will also implement our own home-grown or OSS solution next to the GridControl. You generally need a tool where you can add/tweak your own monitoring, and that you can rely on to do (or build) the stuff that the commecial tools dont do, or dont want to do. And despite what the book, the courses, and the management say, any DBA or sysadmin worth his salt will DIY in this area. I am also still guilty of jotting up my own unix and sqlplus scripts in various situations. They will do Exactly what I need when I need it. Some of my unix/linux scripts have dates in them going back to 1994 (sad eh?).

Mental note: do a topic on "simple monitoring" sometime.

Pete Finnigan had his moment with explaining the uses and pitfalls of VPD (or is that RLS or FGAC? ... ). Always nice to hear PXF himself explain (the lack of) security in his down to earth way. He keeps going on about security being a "layered" thing, and how adding more security can also itself against you if you dont do it properly and exactly Right. Well done, Cheers Pete!

The cherry on the cake was Jonathan -scratchpad- Lewis explaining how Simple (and yet Complicated) the analysis of Statspack (aka AWR, if you have the budget) really is.
Jonathan did a Great job of letting statspack-reports explain themselves, and with constant "challenging" and checking of his own assumptions.

Key messages that I retained, somewhat biased by my own experience, were:

1. Read from the top, get a feeling for overall functionality and the load of the system in hand. How much time, how many CPUs, how much memory and IO was used. Quite basic.
2. Relate the work done, as shown in statspack, to the capacity of the underlying
system: was the database challenging the hardware or not ? Is it a capacity problem or a single-user, single-application problem.
3. Dont be afraid to ask questions. always.

Jonathan refuses to "write the book" on statspack, with the excuse that there will be too much material to cover, and he is afraid to leave out items that are deemed critically important. My reply to him is along the lines of: Real Application Statspackreading is about common sense.
And if Jonahtan can convey some good messages in a 1 hour presentation, surely
it must be possible to write a not-too-complex book to help the average
reader out there in 90% of the cases? For the other 10% you can always hire a specialist.

I become more and more tempted to write a few "simple" things about Statspack, the Real Fantastic CBO, and the blessings of proper physical design for Real Appliations. The book by Tom Kyte is all you really need (need link).
Hm, will I too then get sucked into the wonderful CBO? (Real Application CBO; it gets it right most of the time... )
My next presentation maybe.

Friday, 6 March 2009

deadlocks: get rid of the bitmap indexes

This one is for a certain Peter, he is a DBA at one of my customers.

We had a problem: Deadlocks happening. Bitmap index.

Users: Complaining, deadlock errors ... ???
we: That bitmap index is causing the deadlocks.
Duhvelopers: But we NEED that index for job xyz, and others Might use it too.
we: Sure, but that index is causing your deadlocks
Users: still complaining!

Yes, we know there is more to Deadlocks, but this was the Bitmap!
Peter is the DBA who found Bug 6175584.
And in the lenghty process of convincing metalink to see that bug, he learned more about Ora-00060 then you will ever need - until version 11.2 comes along at least.

Anyway, this bitmap index...

SQL> Alter index monitor usage ;
Showed the index was indeed being used. But by whom ?
Only by the one intended job or also by other more user-relevant queries?

Peter, who is the local onsite-DBA where I am only a passing consultant, finally managed to convince the "architecs" and other title-bearers that the index should go.

It went, and with it went the deadlock-error.
General Performance also felt better.

Apparently nobody but the batch job was using that index.
Or if they did, it was not helping them.
Problem Solved.

Then Anjo Kolk, over a nice meal, mentioned he had seen something similar, and had it fixed while on his hand-free carphone.
And I should have realized how simple it is to "prove" this case to the Architects.

A Query like this:
SQL> select sql_id, object_name
from v$sql_plan
where object_name like '%STATUS_BMP';
Or a similar qry against the WRH$ table will show you immediately which statements have used a particular index !

Peter is off to check a few other indexes straight away.
Kuddos to Anjo for pointing it out. Sometimes a Simple solution is right under your nose. But it takes a nice "incident" to learn even simple lessons.

As for the developers and architects who concoct it all: No hard feelings. After all, the Funtastic manual says bitmpas are for low-cardinality colums. And at least their tests had been really fast using "this bitmap thing".
They had no bad intentions, and where would we be without their complex solutions and indexes eh ?

Monday, 6 October 2008

Backup and Recovery, at what level

Yes, this is Yet Another Replication Discussion (YARD).

Indeed, this beast seems to pop up all the time. Today’s trigger was a discussion with some architects on Disaster Recovery in a SOA environment. I’ll try to give it a nice twist, and I will also keep my lunch-budget in mind.

To the serious reader, this article is actually a DataGuard plug.
But with all the buzzwords that spin around these days, this article can be seen as Art: It makes you laugh, it makes you cry, it makes you THINK.
And maybe those SOA/BEPL types do have some good ideas sometimes.

Marketing Tagline:
With SOA and ESB, the DR-replication can move upwards in the technology stack. (and whatever the flavor of the month is: Grid, anyone ? Cloud computing ? The sky is the limit!).

On an even lighter note:
You should know the difference between an architect and a terrorist. It is at the bottom of this page.

(Health Warning: many buzzword-abbreviations ahead)

In most cases, I will advise anyone with HA (high availability) or DR (Disaster Recovery) plans to first consider a DG (DataGuard) construction. Replication of your data to a remote site offers many (documented and repeated) advantages:

DG offers you an off-site backup (but you should still consider tape or disk backups!)

DG gives you reporting on read-only-opened database (even better in 11g real time...)

DG allows you to use the 2nd or 3rd site/system to do maintenance or upgrades (but be careful, you may want 3 or even 4 system in that case, but you can tick some very fancy uptimes in your SLA box).

As an alternative to DG, most organizations will consider using SAN technology for replication. A surprising number of organizations seem to mix the two to complicate life.

Sideline: Some are even dreaming over sales-brochures (and some are entertained at seminars) on “stretched” clusters. When it comes to Oracle databases and RAC, these stretched-beasts are worth a separate story and a very carefully worded one indeed: I can’t kick powerful vendors/partners/colleagues, but I can’t put customers at risk either (?). Maybe Later. I have learned the hard way to Never criticize powerful vendors; it limits your lunch-invites. See bottom of page for lunch-items.

First on DataGuard

Despite my own confidence in Physical Standby (and current limitations of LogicalStandby), Oracle seems to be moving towards Logical (SQL-apply) rather then Physical (Redo-apply). Because of the lesser ties between primary and replicas, “Logical” will offer more possibilities for maintenance and management. The logical replicas do not need to be physically compatible and can span different versions. A trend we can see with open-source databases as well: logical replication is conceptually more powerful then the redo-apply mechanism originally used in Oracle Standby.Dataguard would merit a whole blog all by itself, but I have a different point to make today. I’ll repeat what I said before: DG is a very good start to improve availability of your system.

Now on SAN replication

The other route chosen by many organizations is to “outsource” the replication to the SAN. The SAN system will take care of block level replication of all storage. Attractive, often simple to set up (just shout at the SAN engineers, they will click and drag until it works), and heavily advertised (and paid for). SAN replication is especially attractive if you have to replicate more then just your Oracle data, as is often the case. The main downside of SAN replication is the complete dependency on the SAN technology, the license cost, and the higher volume of data to be transferred (when comparing to Logical or Physical DataGuard). SAN replication works if you have sufficient bandwidth, low latency and capable storage engineers (not to mention the friendly smiling vendor).

SAN replication and DG replication, if applied properly, can both be very powerful mechanisms. I recommend choosing the one you are familiar with (or get the best license-deal on) and stick with it. I would not recommend using both SAN and DG in an inter-dependent mix (a surprising nr of databases are indeed replicated by both concepts, and that tends to add to the confusion in case of recovery).

For Oracle/RAC on stretched clusters: suffice say that you shouldn’t pay for this yet, but if the (hardware) vendors wants to demonstrate this at fancy locations/parties: do encourage them to invite you. The prices of this technology warrant a Business class ticket (with spouse, and partner-program organized).

Coming to the point: The next alternative.

We now have a 3rd alternative: SOA-layer replication. To most of you this may be old news, or indeed an old concept re-worded: you are correct. Read on. And tell me how I got it all wrong when I became entangled in the latest hype without really understanding the concepts (or even knowing the jargon).

Stack-wise, I would situate SAN replication at the bottom of the technology stack (deep down in the infra-structure layer). And I would place DataGuard somewhat higher up in the stack (closer to the app-layer). Hold this layer-idea for a moment.

Enter stage: the Service Oriented Architecture (SOA) with the concept of an Enterprise Service Bus (ESB). Apart from the number of new abbreviations, what really struck me on SOA were the concepts of “above” and “below” the Layer (anyone read Tom Clancy in the nineties?). For some architects, the database is just a “service” that takes care of persistent storage. They are right (they are, after all architects). Google: “Database + ACID”, possibly add “Codd + rules + 13” for some very basic stuff on databases. They are correct; ACID-ity is the essence of a database.

Now think of replication for DR purposes as a means to store things in multiple locations, and possibly as a means to become technology-independent. And imagine a SOA request to store data to be sent not to a single service, but to two or more services. Borrowing from the DG concept: as long as more then 2 services have acknowledged storage we can consider that the data is safe.

The next level of "Replication", and one that the SOA crowd will understand!

Following this line of thought, the storage layer becomes Yet Another form of the Grid Concept (YAGC?) or Yet Another Computing Cloud (YACC!). And the calling process doesn’t care anymore where the data is stored. Just as your GmailFS file system doesn’t know or care where the data is stored. It knows data is stored, and that is sufficient. (nerds and architects alike: take note of GmailFS, I’m really curious: is Gmail the ultimate ESB?).

Like many architects that refuse to look “below the layer”, we can now state that we have a concept (eeh, a resource?) for replication and that this DR pattern is “Pure SOA” and “fully ESB compliant”. It even relates to “GRID” and will thus gain us brownie points in the Grid-Indexes of various partners (more links below).

Myself, I will of course stick with Physical Standby for the moment. But it doesn’t hurt to explore the possibilities. A manager with some spare budget might even turn it into a seminar (with lunch, I would presume).

The Relevant Links are

(SQL> select url from relvant_stuff order by credibility desc;):

Dataguard and Oracle high availability (valuable stuff here):

http://www.oracle.com/technology/deploy/availability/htdocs/DataGuardOverview.html

Gmail fs (think about this one):

http://richard.jones.name/google-hacks/gmail-filesystem/gmail-filesystem.html

SOA and ESB (…):

http://www-128.ibm.com/developerworks/webservices/library/ws-soa-design1

http://www.managementsoftware.hp.com/products/soa/ds/soa_ds.pdf

More on Grid hype (…):

www.hp.com/techservers/grid/index.html

http://www.oracle.com/corporate/press/2005_apr/emeagridindex2.html

Note for all Alliance Managers: take note of my cross-links to our “powerful partners”, and please keep me on the invite-lists, I do appreciate quality food, I’ll even tolerate the mail and spam for it.

In the pipe-cleaning department or “mopping up loose ends”, I owe you these:

Lunch: There is no such thing as a free lunch (unless you are the lunch).

Architects: The difference is that you can (sometimes) negotiate with the terrorists.

Now shoot me (over lunch?).

Friday, 16 November 2007

Databases and Systems, what kind of Relation.

Over the years, I’ve had several interesting discussion on the relationship between “databases” and “systems”. It comes down to this: Should there be many databases on a single system (m:1, the traditional approach), or should there be many systems underneath one database (1:m, grid)? And when is 1:1 appropriate?

Current hardware is powerful enough to allow “supernodes” that can run hundreds of databases. We are also confronted with Virtualization or “carve-up” of hardware by way of XEN, VMware, or vendor specific product that create domains or partitions.
With the current push towards “virtualization” of systems and the (in)capabilities of Oracle, it may just be worth to re-start some of the n:m discussion (did it ever go away?).

It is time to take a position on the many-to-many relationship between databases and systems.

This very story can also be found is actually the text version of my presentation "Databases Everywhere", and can be found from various UKOUG and other sites. (include a link...)

To summarize it for those architects and (account-)managers who are in a hurry:
I am in favor of 1:1 wherever possible, I support 1:m (RAC, grid) if really, really needed (but just please think about it one more time). RAC is a wonderful piece of technology that can serve many other vendors as a good example. It can be made to work. I will try to indicate under which conditions I think RAC can or cannot be applied.

Finally, I will only tolerate the old-fashioned m:1 for non-critical situations or on systems where there is some sort of risk-mitigation against interference between the (instances of) multiple databases.

After explaining these positions, I also have a list of recommendations for customers, providers and even for the Big-Oracle itself.

Those of you interested: read on. Possibly prove me wrong.
Others, keep browsing, the truth is out there, whichever version or vendor you want to see.

For the record: the bottom of the text contains links to all our major “partners”. Please keep the invites coming.

Intro, and some definitions.

On conventional systems, we generally find 1 or more databases running on a single system (*nix or even windows). For example, for concept-testing of a DG setup with cascaded standby, my laptop has run 5 databases simultaneously. Slow, but running. It just takes loads of memory and careful (memory-) parameter setting. It illustrates the capabilities of the Oracle database that the concept, once proven on a laptop could then be used to clone a 3TB production system (this time not on Windows). But the laptop Proof-of-Concept also illustrated an important issue when running multiple databases on a single system: Contention. The databases were visibly (and audibly from disk and fan) competing for IO and CPU resource.

NB: Can’t resist to put one in for Deb and Nigel: Let the world know that Deb got more memory into her laptop than Nigel, but then both David and I still have more memory in our laptops then the two of you together, so there ;-). Size matters.
Techies rule.
Back to serious business.

With the current possibilities for virtualization, you can take hardware and split it into many “systems” using VMware, OracleVM, any other Xen or vendor-specific tools for domains, lpars, etc..

Each resulting system can then be used to run instance(s) of 1 or more databases. When is this useful and how far should we take this?
With Oracle RAC (GRID, anyone?) you can take a database and distribute its instances over many system. When does this add benefits?

First a few quick definitions to delimit the playing field. Please not that the definitions are for the purpose of this argument only. They don’t pretend to be scientific or final. Or even to be correct.

System: a running *nix (or win*) instance containing a process list and an amount of addressable memory. On short: a running instance of an “operating system”.
A system can be Virtual if the hardware seen by the operating system is not identical to the actual underlying kit. This is the case when a larger system is split into “virtual” units by use of Xen, VMware or some vendor-specific layer of software or firmware. Some of those can also be modified dynamically (e.g.: Rolling). Some definitions and a good description can be found here:
http://encyclopedia.thefreedictionary.com/virtualization.

Database: an (Oracle) database, containing one system tablespace, and one user called SYS (I’m still trying to find the “essence” of an Oracle database, how about the sys.obj$ entries?). Note that the use of DataGuard can mystify the definition of a “database”, because each DG-clone can represent “the database”. The actual Single Point of Truth (SPoT) is where the current “primary instantiation” of the database resides.

Also note that my definitions do not include the binaries, or the ORACLE_HOME, as part of the database or the system. Indeed, systems and databases can be used in situations where the software needed to run them is “shared”. Most system only need a few “system specific” files in /var and /etc. I will always point out VMS as the ultimate mother of all clustered systems whereby files are shared between multiple nodes. But that requires a Clustered File System (CFS), and that opens a different discussion altogether. Suffice to say that a CFS is very suitable to ensure that all machines can be connected to the same, identical software and are guaranteed to run the same version of the binaries.

Now let me briefly elaborate on the different options.

Conventional deployment - many:1.

On conventional systems, we often see many databases running on a single (unix) instace.

The DBA can look after the databases, and the unix administrators have only one entity to watch. Any collision between databases will have to be handled by the DBA. Note that in most of these conventional cases all databases share the same software-tree (multi databases running from the same oracle_home).
These systems tend to have a relaxed SLA. The utilization varies and sometimes we see high percentages of CPU or IO bandwidth being consumed by a single database.

The main disadvantage of conventional systems is the fact that there is only one system, and all databases meet there. A problem with either a single database or with the system itself can quickly contaminate all databases on the system. And upgrades of system- or oracle-software lead to simultaneous outages of all databases on the system.

In case of system- or hardware failures many databases must be recovered simultaneously or re-started on one or more (other) systems (requiring some prioritization). The simultaneous recovery of many databases may lead to a brief period of overload and/or chaos an possibly a domino effect on other systems or components.

The advantages of an m:1 configuration are the simplicity of a “single system”, which is easy on the system-admin, and the often cheaper license structure when using per-system license.

Simple and Robust 1:1

On system with a high load or a stringent SLA, we tend to see 1:1 relationships: a single database, whereby the sole and single instance of the database runs in a single unix system. This is sometimes referred to as the monogamous configuration.
By having the whole system to itself, the database can benefit from all the resources available. There is no interaction (disturbance) from adjacent instances or other processes. Determining parameters is relatively simple.

However, as Oracle (-sales) will point out, this 1:1 configuration generally means the system is grossly under-utilized. It also means the database can still suffer from unix- or hardware failures.
But I like the simplicity of this configuration, I think this is the most “robust” solution and is applicable to the majority of databases. Whenever a problem occurs on one of the systems, only one database is affected, and recovery-efforts can be concentrated on a single database and application, reducing the risk of a domino-effect.

The 1:1 situation also lends itself very well to hardware-clustering or “cold failover” whereby a database is re-started on another system (node) in case the underlying system or hardware fails. Only one database needs to be re-started or recovered.

Since a 1:1 configuration requires many “systems”, it is attractive to use server virtualization. By running multiple “virtual” systems on a single piece of hardware, you can quickly create the required number of “separate” or isolated systems. When doing this, keep in mind that the underlying hardware remains the single point of failure. When one or more virtual system are meant to replace one-another in case of failure, they should preferably run on separated hardware.

Real Application Clusters, RAC - 1 : many

As a techie, I like the technology behind RAC. It is a wonderful thing to play with and I like the challenge to master this thoroughbred in real-life system. But I have to be careful not to be running a “solution looking for a problem”.

We tend to see RAC databases in organizations with formal and very stringent SLA’s and with the budget and the resources to try and meet these requirements.

OPS and RAC eliminate the SPoF of the “system” and deploy the database over multiple systems. Theoretically this works nicely and even provides dynamic provisioning of system resources (you can utilize all available kit, and you can add more kit as needed). In practice, many have pointed out the relative complex setup, the high price, and the other shortcomings of RAC (link to Miraculous, Famous Danish company).

We have indeed seen successful deployments of multi-instance database on some very large kit. In some cases, the impressive amount of hardware was able to hide badly designed application-code for quite some time. And by constantly distributing the (mainly CPU-) workload over the available unix systems, the coders got away with some appallingly inefficient constructions. Some of these cases have demonstrated the viability (and sometimes vulnerability) of RAC quite nicely, although a better design or implementation might have been cheaper (I prefer brains over iron, always!, in case of doubt: reduce the size of the hardware and tell the IT crowd to JFDI).
Note however, that it is important to let the hardware boys and vendors it their way a bit too. Riding in some extra hardware makes them happy, and is good for our relationship with these vendors. They might invite us on future projects.

And to pour further praise on the Oracle techies, TAF has saved our systems several times when a node got in trouble and died. Database-nodes die mostly through software errors, core dumps or memory leaks, and sometimes through human errors. The underlying hardware is rarely a problem, as these high-end-hardware systems are built to keep running run even if a salty ocean wave runs through the lower floors of the building : unintentionally (Kathryn, are you reading this?).
Please remark that even when you have RAC and TAF-capabilities, you still need to code your application to correctly trap and handle the failover-events. Otherwise, it only works on “idle” connections.

Note that most databases can (be made to) run on RAC, provided there are no obvious bottlenecks such as an ordered-no-cache sequence or some very hot blocks with running-totals. And those bottlenecks can generally be un-designed.
NB: Oracle currently seems to have the following position on RAC: if it doesn’t run (or scale) on RAC, your design is wrong.

Suffice to say that where Very Fast Failover (TAF or FCF/FAN) are needed, RAC has no equal. And for systems that have extreme hardware (CPU) requirements, the RAC scale-out model is also beneficial.

Question : Now what to choose ?

The classic answer: It Depends.
However, I will try to provide some guidance and some opinions.

The only factor that really matters is “The Business” (duh).
What does your business need and what can it afford. For simple or undemanding SLAs the traditional m:1 configuration is often sufficient and cost-effective. For businesses that have more stringent demands or for providers that risk being sued by their (business-)customers over a broken SLA, a 1:1 is advised, possibly with some cold-failover mechanism. And finally, if you really need the additional 10 minutes, or if you need the scale-out features, AND if you can afford the resources for testing, training and ongoing maintenance, a 1:m configuration (aka RAC or grid) can be your choice.

To figure out what your business needs (and can afford), you can either think for yourself, or you can give yourself and your department more credibility by engaging (principle-, business-) consultants to do cost-benefit analysis, risk-assessments or FEMA (that is: Failure Escalation Mode Analysis, not the other FEMA). They will especially stress the business-cost of downtime and any related loss of data/productivity/customers/orders/MegaWatts/ That is enough FUD for the moment. Back to more practical matters.

On the practical level, there are some factors that come into play. There is a) the preference (eeh: dictate) of your system administrators, your SAN engineers, your ASP, or your hosting provider. Then there is b) your commercial relationship with Oracle which will determine how high your licensing cost will turn out. But there may be others, such as c) the capabilities and preferences of your DBA. We will not even go into items like d) the availability of test-systems to prove and maintain your architecture, or e) the available rack-space.

The first important factor that often comes into play is the preference (or the pricing-policy) of the system-admin team or the Hosting-Organization (the ASP). Are they capable of handling many systems or do they prefer a low number of unixes ? Can they quickly build and clone systems for provisioning? What price do your ASP’s charge for additional systems ? This may determine your capability to run “multiple” systems.
Some organizations, by choice or by force, still get away with running just 1 large unix box with everything on it: HR, CRM, Logistics, and sometimes they even have their dev/test/uat environments on the very same box. Feasible, but with most of the drawback of a m:1 configuration.

The next factor is often License cost. How is your relation with Oracle, commercially? If you have to pay list-price, you will want to stick with “conventional”. Here, Oracle shoots itself in the foot: a lower price on RAC would speed up acceptance of the RAC and GRID model.
Machiavelli did suspect this was done to buy sufficient Beta-time to find all quirks, possibly to find a solution for instance-affinity and to give customers the time to come up with a solution for the friendly delivered “Your Design is Wrong” consultancy-audit outcomes.

And last but not least, what do your DBA’s prefer, and how trained and comfortable are they with RAC/grid? The traditional choice, m:1, despite its disadvantages (contention, domino-effects) is still the easiest to maintain for a DBA. Choosing a 1:1 configuration brings on a slightly higher workload, but has the advantage of more robustness and easier, isolated, troubleshooting since databases and systems do not affect one-another when trouble or maintenance occurs (yes, yes, someone must shout “utilization” now, thank you).

The choice for a RAC or grid configuration tends to create a significant overhead. We politely disagree with Oracle at this point that the new grid-control alleviates all problems.
And even if GC and its agents do try to take away a lot of the routine-tasks, Knowledge and Experience can never be completely replaced by a GUI. This aspect, the Human- or Operator dimension, tends to be the most under-estimated factor when (prematurely) implementing RAC/Grid.

Recommendations:

For customers and end-user businesses:
Move carefully from m:1 to 1:1. The 1:1 configuraion is at this moment arguably the most robust way to run a database. Consider using virtual systems to support a 1:1 deployment, but beware of the possible contention and SPoF on the underlying physical layers. Move on to 1:m (RAC) for cases with specific needs (failover or scale-out). Only use RAC in cases where you must, but then don’t hesitate to use it. It can (be made to) work, and it will work (eventually), and you may have to learn these tricks eventually. Start on the first valid occasion.

For ASP and hosting organizations:
Learn how to handle clusters, clones and virtual systems. These tools will give you an edge in flexibility. Then offer your customers the possibility to host many _identical_ systems at a rebate. Your customers will buy more as 1:1 and 1:m systems proliferate, and in the long run you will benefit. Hosting Companies and some vendor pricing-policies are the largest obstacle when moving from m:1 to 1:1 or even on to 1:m. Innovative customers will try to move to cheaper and more flexible platforms, and even lagging customers will eventually follow. If the hosting provider can quickly “provision” at acceptable cost, he can be seen as a partner in commoditization, rather then as an obstacle to flexibility.

For System-admins:
Learn how to handle a multitude of systems, learn how to keep them in sync, and how to clone or (re-)build systems quickly.
NB: for the addicts: investigate the use of Clustered File Systems (CFS).

For DBA’s and system-admins:
Aim to deploy databases and systems in a 1:1 fashion. The “isolation” of each database and system greatly facilitates admin- and troubleshooting activities.
Also get used to replicating or sharing software through OUI or other mechanisms.
And if possible, start to work with a CFS.
Sharing storage at the “filesystem” level can facilitate the juggling of multiple systems very much. Even NFS (supported, but not recommended is a usable alternative.
A CFS can offer great advantages by sharing files across nodes and this can simplify software-deployment and distribution. You will always need two copies of software for redundancy-purposes, but please think before making he 3rd, 4th or 42nd copy. Sharing is better the copying, especially at high numbers. It is easier to manage a small number of shared oracle_home trees then to have 42 or more copies that need to be rsynced or otherwise kept identical.

For Oracle:
It is appalling to find that a grid (aka an OPS database) was easier to build and maintain under VMS or Tru64 with Oracle 8174 then it is with the current 10g versions.
Please pursue the development of OCFS and facilitate shared-binary installs on OCFS and other CFS platforms. This will help proliferation of your GRID strategy, and will get you more market-share and revenue in the long run. The oracle-inventory mechanism and the configuration of agents tend to make life difficult for deployment of shared-binaries. Any viable grid should IMHO include the shared use of a software tree and not depend on endless replication of executables.

Shutdown (normal)

By adopting a grid-strategy, Oracle has greatly increased the options for its customers. And for those of you who don’t know it yet: find the Oracle Sponsored GRID-INDEX. Various hard- and software vendors have added to the palette of choices by implementing their own versions of Grid, clusters, or virtualization. All this new (eh, apologies: innovative but proven) technology can be put to good use, but only when making the correct choices.

We hope the preceding information can offer some help, and we would like to close down by adding one more item of advice: try to base you choices on simplicity

Relevant links

The usual brownie-point partner-links and marketing buzzwords on grid etc :

http://www.oracle.com/global/nl/corporate/press/200634.html
http://www.oracle.com/global/eu/pressroom/emeagridreport4.pdf
http://www.hp.com/techservers/grid/index.html
http://www.ibm.com
http://www.sun.com

More on server-virtualisation (start here!):
http://encyclopedia.thefreedictionary.com/virtualization

An introduction to Clustered File Systems:
http://www.beedub.com/clusterfs.html
http://www.oracle.com/technology/pub/articles/calish_filesys2.html

Some good, albeit biassed arguments for CFS can also be found here:
http://www.polyserver.com (look for articles by Kevin Closson).

A classic on RAC:
http://www.miracleas.dk/WritingsFromMogens/YouProbablyDontNeedRACUSVersion.pdf

Just for fun:
http://www.userfriendly.org
http://www.dilbert.com

Last bootnote for all Alliance Managers and other people in control of party-invites: take note of my cross-links to our “powerful partners”. And don’t worry, since you didn’t bother to read all of the text, neither did the real decision-makers, hence no damage is done. Oh, and please keep me on the invite-lists, I still appreciate good food and quality entertainment.

Now shoot me (over lunch?).

Monday, 7 May 2007

sitemap of SimpleoOracleDBA

Here are the introductions to PdvFirstBlog and SimpleOracleDBA

Intro - if available - This is where I started my my blogging era

Simple oracle DBA - The main topic.

Various Oracle Tips and Rants and Opinions, all to do with practicalities.

Simple Oracle DBA - Availability, Scalability, Manageability... It has to be Simple

Bitmap Indexes on an OLTP system - ORA-00060... Classic Deadlock

Backup on a different Level - in the SOA layer.

How To Pool Connections - One of the challenges

Databases Everywhwere - Systems and Databases, how many databases.

ILM, Information LifeCycle Management - another Three Letter Acronym.

Index Organized Tables - Many Benefits

txt - txt.

Various Rants, all to do with practicalities.

In times of Crisis you call the DBA - Who you gonna call...

Rationalize TOAD - Toad will find a way...

M O O W - Miracle Oracle Open World...

My blog-plug for the Conference - UK Oracle USergroup...

Travel related stuff.

Angel of the North - My most visited leasure page...

The usual links to groups etc..

My own linked in profile (shameless plug)

my agenda (normally at the right hand side of the blog)

UKOUG main url

Miracle BV

the home of Oracle-L : Best Mailing list

link to Oracle Forum.

Sunday, 6 May 2007

Index Organized Tables IOTs, the forgotten benefits

Check if your database can benefit from IOTs. You could be surprised.

Index Organized tables are possibly the most under used feature after the clustered table (now there is another underused item …). IOT’s also have some additional benefits that the manual seems to forget.

In brief: If you have tables with parent-child relations, or dependent tables where you often retrieve all related records at once, then you should read up on Index Organized Tables. Notice that this includes nearly every database out there.

Wake up call and Warning
This feature is available since 8i, and only now, recently, does it get a number of Bugs reported against itself (not in the last place because one of our major projects boldly went out using IOTs). Have those bugs been “dormant” all the time ? Check out the list of Expected fixes for 9.2.0.8 and see for yourself (Note:358776.1, when I checked it 28-Mar-06). 3 out of the 7 fixes mentioned under IOT are bugs found by us, and a bug-search for IOT turns up even more issues. The information and the dates on the bugs and fixes suggest that most of them have also been around in 10.x. Be Careful.
Despite all these issues, IOT remains a highly recommended feature.

First a recap.

Briefly described, the IOT stores all data of a table in the leaf-blocks of the PK-index. This means there is no table-segment anymore, only an index segment. You can verify this from user_segments.

On a well defined IOT, all records with the same parent key are grouped together in one single block or in a low number of adjacent blocks. This is an advantage on retrieval. All related data is generally retrieved in just one IO action. This speeds up retrieval (less IO is needed), but also makes more efficient use of the buffer-cache because the related records do not get dispersed over many blocks. The overhead of unused cache space is reduced. So far the known advantages.

Now the bonus features.

Bonus Feature #1: More data in the secondary index. As with normal tables, You can define other indexes on the IOT table. But those indexes now have an additional feature: The secondary index doesn’t contain a physical rowid, instead it contains the actual values of the PK fields. This means the 2ndary index is often ideal for index-only-lookup or just to join additional tables after retrieving only the index-blocks. There is no need to visit the table to fetch the PK fields. For those who used to inflate the 2ndary indexes to obtain
Index-Only-Lookups: Use of IOTs can automatically lead to more IOL.

A good example is an m:n relation where the linking table contains just two id’s. A conventional design would call for a heap table with two indexes. A total of 3 segments. An IOT would result in two segments: the PK and one index. Again, there is no table-segment. The PK-segment contains both columns in the leaf-blocks. And the 2ndary index contains the PK-values as an alternative to the ROWID. Any query always needs to search only one (index-) segment, and in each case, the required data will be grouped together in a single or low number of blocks, resulting in minimal IO and efficient use of buffer cache.

Bonus Feature #2: Reduced density, avoiding of hot-blocks. Because there is no separate PK-segment, and because data and index are together in one segment, the nr of records per block is lower. Lower Density.
Normally, the PKs are the most dense segments in a system, and therefore most likely to cause contention on DML transactions most likely to show up in gv$cache_transfer on a RAC system. On an IOT, the PK automatically contains fewer records per block because of the extra space required to store the “data” columns. If paired with a higher then default value of pctfree, an IOT can be used to reduce the impact of a “hot” index. This is especially useful in a RAC environment where hot blocks need to be transferred between instances.

Some of the downsides.

Inserts and updates are slower. This is partly a myth. In like for like comparisons, you may find the difference is not that big. When people complain about the slow loading of data into IOTs they often compare loading an IOT against a table with no indexes at all. If you can do without indexes, you can also do without IOTs. A more honest comparison would be to include the additional time required to create the indexes after loading, or to compare against a table with a PK already defined.
Another way to speed up loading is to use the “ordered” option.

A quirk: the physical guess.
The 2ndary index on a IOT contains a so called row-guess to allow fast access to the leaf block of the PK were the row was originally stored. These row-guesses can grow stale, and the index statistics reflect this in the pct_direct_access value. I have never been able to demonstrate the effect, let alone the benefit of this “row guess”.
A Challenge to investigative minds out there: Can someone prove me the usage and benefit of the row-guess. Ideally, a 10049 trace should be able to show the impact of the row-guess, and of “migrated” rows and stale row-guesses. (Update in 2009, With thanks to Barry Jones, who actually demonstrated 2ndary index deteriorated over time. Impressive).

Some Best Practices for IOT:

1) bad news first: check the bugs, even on 10.2.
Although we have reaped good benefits from IOTs on a multi-Terabyte database, we did stumble across a number of Nasty bugs. We had notable problems on the gathering of statistics (dbms_stats) and with the use of Function Based Indexes (FBIs). On IOTs you can use both dbms_stats and FBIs, but beware of the bugs. Metalink is your friend.

2) Determine a good pctfree.
Set this value higher then default. We tend to use 30 instead of the default 10. This will at least partly avoid the general problem of slower DML on IOTs. Your optimal value will vary, depending in the amount of updates and inserts you do.

3) create your own overflow segments.
If your table contains many columns or large records, consider defining an Overflow segment and use it to store the columns that you will need less-frequently. If you avoid creating overflow segments, you may find that oracle does this for you. You probably want control over this yourself to avoid surprises.

4) Keep blevel down, use partitioning.
As access to the table-data will always go via the PK, the table-access as shown in an explain-plan is actually another PK lookup (note that I’m very skeptical about the row-guess). Therefore, the blevel of the PK has an impact on the number of blocks that need visiting before a data-record is retrieved. By partitioning the IOT you can make sure that the blevel stays sufficiently low. IMHO, a blevel of 2 is generally achievable, 3 is acceptable, and 4 is too high.
Nb: monitoring blevel seems an often overlooked item, and it still applies to conventional indexes as well. But the impact of a high blevel on the PK of an IOT can be especially devastating.

Final words:

Start by reading up on the IOT feature, but then.Test, Verify, Tweak, and re-Test for your specific situation. IOTs are useful in a specific set of circumstances: Retrieval of related sets, avoidance of hot-spots, RAC scalability.
As always, Your Mileage May Vary.

Further reading:
- RTFM (a bit of a duh, but yep, those pdfs are very useful)
- metalink (same duh, but there is some really good info out there too, bugs and all)
- OTN : http://www.oracle.com/technology/products/oracle9i/daily/sept04.html

Friday, 4 May 2007

Simple Oracle DBA

Introduction ? Simple !

This blog will try to keep life simple for Oracle DBA's.

Because Simplicity is the Ultimate Sophistication (L. da Vinci)

Because Complexity sells better (E.W. Dijkstra)

Because life is complicated enough. Simple Oracle : Oxymoron ?

Update : eeeh.. It has been some 18 years since I started this blog.

I found out most of the original links below are now Dead.

Will Fix Soon-ish...

note: this is what blogger does when you try to edit a post that is 16 years old...

This presenter had a Very Drastic way to keep the audience awake! have a look...

Presentations Been there and done that, and got the presentations and papers to show for it. <! -- http://www.jmorganmarketing.com/wp-content/uploads/2008/03/man-with-megaphone.gif or http://school.discoveryeducation.com/clipart/images/presentation-boy.gif --> Below are the links to my most Favored ~~Rants~~ material. Most of my public presentations have grown from either client-pitches, project-blunders or "architectural" frustrations. Hence most of them are slightly missionary: I want to convince the audience of something. Something Simple. High Availability: How to keep it simple (ppt) Talk customers through the options for high-availability. Start with "What do you really need", going through "what can you afford", and then match that with the possibilities. Ever more Complex... When in doubt: Simplify! The link is to my presentation from the Miracle DBF in 2006, but the message still stands. Databases Everywhere (ppt) Everywhere indeed. What can we do to keep the systems running, and keep life simple for the DBA? DBA Two-dot-Oh even. Upgrade Nightmares (pdf) My down-to-earth vision on upgrades. And what you can do to keep life simple. Is my Backup Covered (pdf) It is about Recovery! Can you Recover? Can someone else do your recovery for you? The most boring subject, but surprisingly turned into my most asked-for ppt Backup, at a different level A paper/blogpost, aimed a SOA "architects". Rather then cover "backup" at the disk- or database level, you can use messaging systems (SOA, ESB, BEPL) to distribute your data over two or more systems. Redundancy at a higher level in the stack. Use at own Risk! Index Organized Tables, and all the benefits. Blogpost/paper, aimed at developers/implementors and ... DBAs. I have driven many to despair with a strong belief in the efficiency of IOTs and other seemingly excotic table- and index definitions to store data physically. And if you feel like presenting at a usergroup, but dont know what about: Remember your last big frustration or problem ? Try turning that into a presentation. Surprisingly, that worked for me.

Simple Oracle Dba