Showing posts with label IOT. Show all posts
Showing posts with label IOT. Show all posts

Saturday, 23 May 2009

Single Table Hash Clusters: Size Matters.

OK, so we know that in certain situations, with the right design, the right size-assumption, and the correct usage, a Single Table Hash Cluster seens to be the most efficient way to get to a row, using a single block get.


I'll have to ask Richard Foote for confirmation on "the fastest way" as I still think that Index Organized Tables are very efficient indeed, and have less downsides. But since IOTs are B-tree indexes, they will result in 2 or more block-gets when the blevel increases.
IOTs are an old hobby of me, but they also nearly ended my career once. Complicated story. Some of you know it. Better not divulge.



I now want to know what happens if we get outside the "comfort zone" of an STHC. What if the well-intended design-assumptions about the number and the size of the records grow wrong.
In both cases, I expect the hash-mechanism to become less efficient.

Caveat: I can only test a limited number of cases, and will focus on the overflow-case. The case where too-many records end up in one block is harder to prove, as it will probably involve hash-collisions and extra CPU effort to retrieve a record. I'm lazy, and will focus on stuff I can prove with just simple queries and the odd autotrace.


Recap

Our initial STHC was created as a table with 64K records and an average recordsize of aporox 75 bytes. The original script to create it is still here (demo_sthc.sql). It was developed on 10.2.0.3 (winXP) with an 8k blocksize.

To play with sizing, and get a feeling for the storage-efficiency, I'll first do some sizing-reports to compare STHC to other options. I'm not worried about disk-storage (disk is cheap, innit?). But blocks in memory still take up valuable cache, and the reading, getting and scanning of blocks is still "work". Effort that can be avoided. Hence I always assume small (and Simple) is beautiful.


To view sizes of objects, I use an auxiliary script segsize

In the original demo, the segements for heap, iot and cluster had the following sizes:

TABLE_NAME    NUM_ROWS   BLOCKS empblck avgspc chncnt AVG_ROW_LEN COMPRES
---------- ----------- -------- ------- ------ ------ ----------- -------
HEAP 65,535 720 47 855 0 77 DISABLED
IOTB 65,535 0 79 DISABLED
CLUT 65,535 1,014 9 2,886 0 78 DISABLED


SEGMENT_NAME KBYTES BLOCKS EXTENTS
-------------------- --------- -------- -------
HEAP 6144 768 21
HEAP_PK 4096 512 19
------- --------- -------- -------
Total for HEAP 10240 1,280 40


IOTB_PK 11264 1,408 26
-------- --------- -------- -------
Total for IOT 11264 1,408 26


CLU 8192 1,024 23
CLUT_PK 6144 768 21
-------- --------- -------- -------
Total for Cluster 14336 1,792 44




The cluster I created for my first demo seems relatively large and inefficient compared to the heap-table with the same data, as it takes up roughly 30% more space. That was probably because I didnt care for the size- and hashkey parameters yet.


All indexes are relatively large compared to their tables, due to the long and rather funny VC2 field that is the PK.

If we replace the key-values with, say, 'abcd'||to_char(rownum) or a sequenced number(n) field instead of the spelled-out numbers, all segments and notably the indexes turn out much smaller. But my original "Business Case" has long-ish strings of format AAAAnnnnnnnnn in a VC2 field as the PK (and search-criteria). GUID-strings of 64 hex-chars stored in a VC2 would be a comparable case.
And now I'm stuck with this demo...

btw: bad practice to store numbers in VC2 if you already _know_ they will be numbers, please dont.




Setting the SIZE

My initial errors were to not specify SIZE and to specify the wrong nr of HASHKEYS, supposedly also leading to hash-collisions. I'm a bit surprised nobody spotted it. But I guess there are not that many readers eh ?

I had already created the cluster with HASHKEYS 100, 200, 500, 1200 and 2000 and I still did not specify size (it is not mandatory, and I'm lazeeee).

Go on, run the script (demo_sthc_hashkeys.sql) for yourself
(At your own risk! Please dont sue me if it F***s Cleans up your data...)

SQL > @http://www.hotlinkfiles.com/files/2562031_70t6l/demo_sthc_hashkeys

(you did not have a table called CLU01 out there did you ?)

From this, I found that Oracle allocated the storage (in extents of 64KB) such that the nr of blocks in the cluster was just higher then the nr of specified hashkeys. It seems to assume that one hashkey per blocks will be required
(Question: Is this default behaviour when you omit the SIZE parameter? Consistent over all versions? Does this hold for all blocksizes? I dont know yet...).

I now decided to do it by-the-book and specified SIZE and HASHKEYS. And my next script (demo_sthc_75x64k.sql) would, I hoped, create a proper cluster ready to hold 64K records of 75 bytes each:

Oh come on, run it! (at your own risk etc...)

SQL > @http://www.hotlinkfiles.com/files/2562028_kvses/demo_sthc_75x64k 


For those of you too lazy or too scared, part of it looks like this:
CREATE CLUSTER clu ( key varchar2(78 byte) ) 
SINGLE TABLE
SIZE 75
HASHKEYS 65536;

CREATE TABLE clut (
key VARCHAR2(78 BYTE),
num_key NUMBER,
date_modified DATE,
padding VARCHAR2(300 BYTE)
, constraint clut_pk primary key ( key)
)
CLUSTER clu ( key );



And the allocated storage for the resulting (still-empty) cluster is ....:



SEGMENT_NAME KBYTES BLOCKS EXTENTS
-------------------- --------- -------- -------
CLU 5120 640 20
CLUT_PK 64 8 1
--------- --------- -------- -------
Total 5184 648 21



That 5 MB (in my case) is surprisingly close to that of the original heap-table (6 MB in my case).
But also surprisingly close to 65535 x 75 = 4.8 MB.

Will this pre-created cluster-segment contain sufficient free-space?
And will it be able to cater for contingencies ?

Let me insert the data...

(if you had run the script at your own SQL-prompt, you would know all the answers by now...)
SYSTEM @ DB10  > insert into clut select * from heap ;

65535 rows created.

SYSTEM @ DB10 > commit ;

Commit complete.

SYSTEM @ DB10 >
SYSTEM @ DB10 > analyze table clut compute statistics ;

Table analyzed.

SYSTEM @ DB10 > set echo off

SEGMENT_NAME KB BLOCKS
-------------------- ---------- --------
CLU 8192 1,024
CLUT_PK 6144 768




Heeeey.
After I inserted the same 64K rows, the cluster has increased in size to the same 8M size of my previously created cluster.

Next time, I might as well create my STHC with a bit of contingency. And for further experimenting I will aim to create this particular cluster with SIZE 100 and HASHKEYS 80000 to end up with the correct size straight away (demo_sthc_100x80k.sql):

One more time I will try to tempt you to run one of my scritps...
(at your own risk, of course...)

SQL > @http://www.hotlinkfiles.com/files/2562061_v4rto/demo_sthc_100x80k



This cluster is now finally created at the right size (space allocated) and if you ran the script (did you? on 8k blocks?) you can verify that it absorbed all the rows without growing.
Supposedly, every row now really IS accessible by a single block get, and hash-collisions should be avoided.


I did verify a few selects, similar to the ones in the previous episode and I did find the 1(one) consistent get again - At least that still worked ...


Lessons from this so far:
- Pay attention to SIZE and HASHKEYS, and dont run with defaults here.
- Even properly(?) specified parameters dont seem to suffice. Always Verify!
- And (minor issue but...) my STHC still takes up more space then my heap-table.

PS: I will not be surprised or offended if someone comes out and says:
That is all wrong, and it is very much different on this other system... Please do!
Such is life with Oracle.
And we may be able to learn something still.


In the next experiment, I will try to overfill the STCH with data to see if any problems arise from that.


It takes some bloody long train-journeys, and a significant follow-up to investigate all this obscure stuff. And all the searching leaves you with a feeling you know even less.
Arcane stuff that nobody uses anyway.







Oh, btw, For those of you who want to set up a so-called mission-critical system where my demo-scripts would "accidentally" delete the data owned by your boss:
You Do Run this stuff AT YOUR OWN RISK. Period.


Sunday, 10 May 2009

Oracle Single Table Hash Clusters are Faster.

This post is about Single Table Hash Clusters.

It is also about The Fastest way to get to 1 single Record of data.
I will try to prove that with a demo.

This post (a bit of a rant, as usual) is prompted by several things:
Prompted by Richard Foote (Recommended reading), who justly argues that Every Block Get counts and who is on a mission to Educate us on the correct use of Indexes.
Prompted by a few reactions to my Endinburgh Presentation about (cost based-) optimized access to data. It all starts with good access-paths to your data.
Prompted by a client with 24 CPUs in a database-server, trying to squeeze the last milliseconds out of 25 million (business-)transactions per hour. (how many app-servers, JVMs and CPU's does it take to process 25 Million items per hour....?? - I don't want to divulge, but the numbers on the "estate" are multiples of 24...).
Prompted by a good lunch-discussion with same client on IOTs and other nerdy but sometimes crucial optimizations (more material for posts and demos, thx J)

In this particular case, the client had a requirement to (repeatedly??) check existence and/or content of a record based on PK-access. This particular data was relatively static (lookup-type data) and apparently only modified overnight.


There is also the need to check on certain values for de-duplication, and that is a different and more challenging matter. Such de-dup-data is modified (inserted) constantly, and needs to be purged based on time, hence "needs" another index. I might eleborate on that some other time.

For the moment, I will also deliberately discard the possibilities of using SQL-resultset-caching by Oracle, or the option of Hash-map tables or other (indexed-)arrays cached by some Java component, or Oracle Coherence. All of those I consider usable, but much more complex and intrusive then a Simple, well-designed database-solution.
I am a data-beast after all.



Why (not) clusters...

Single Table Hash Clusters can fetch a row with a Single Block Get. That beats any other method of access.

Of course, those who have read and understood the manuals, metalink, and various books on the subject already know all of this, and dont need to waste their time on here. But I see surprisingly few implementations of Clusters and IOTs ...

Normally, I shy away from using clusters because you need to know up front:
1) How many records go into your tables (parents and children for multi-table clusters), and
2) How big those records will be and
3) For Hash-clusters, Critical access is on the clusterkey.
Those three conditions are rarely met, and even more difficult to enforce over the lifetime of a system. But sometimes you get lucky.
And if any of the Oracle Cluster-constructions is useful, my vote goes to Single Table Hash Clusters. Simple, Elegant and ... Fast!

If you already know your theory about Single-Table-Hash-clusters, you dont need to read on. You know we will end up showing 1 single consistent get...


The DIY Demo

The rest of the post will allow you yourself to demonstrate on your database how Fast Single Table Hash Clusters can be.

Copy this into your SQL*Plus:

SQL> @http://www.hotlinkfiles.com/files/2531452_b8v0g/demo_sthc.sql

Press enter a few times, and you can re-read the output in your own spool file...

The script creates 3 tables; a heaptable, an IOT and a Single-Table-Hash-Clustered one. All tables are filled with the same 64K records. The Key is deliberately chosen to be a Varchar2 with funny distribution to resemble client case. And as it uses the Julian date 'JSP' format to generate spelled-out-numbers, your results may vary depending on your NLS settings. Any quirks or issues: send me a comment or a postcard.


If anyone spots flaws in the demo: let us know in a comment or an email. My knowledge is only skindeep and errors are easily made. Nor am I as skilled in education as Jonathan Lewis or Tom Kyte. My original demo was only whipped up to show to an Architect. Getting stuff out ready for the dba-community-at-large is more work then I anticipated, and my knowlege on the subject of Clusters is not as deep as I wished.

And if my demo-script fails you, check out the only other mention of STHC-with-demo that I recall seeing, by Joze Senegacnik here.



The demo uses on only 64k records as that was an amount I could test and re-test on my laptop in reasonable timeframes (10 seconds, I'm an impatient person). But it will work on Any size table, as long as the same conditions are met: knowing your data (numbers and size of records), and accessing on (PK-)equal condtion. Single Table Hash Clusters scale nicely.

The demo script can be found Here: demo_sthc.sql. It can be run with no modification in just about any schema able to create tables, indexes and clusters (of course, I always log on as system, but I have tested on 9.2.0.8, 10.2.0.3 and 11.1.0.6...).


The Single table Has cluster is created like this:

create cluster clu
( key varchar2(78 byte)
)
single table
hashkeys 1000;

CREATE TABLE clut
( key VARCHAR2(78 BYTE),
num_key NUMBER,
date_modified DATE,
padding VARCHAR2(300 BYTE),
constraint clut_pk primary key ( key)
)
CLUSTER clu ( key );

Note that the 1000 hashkeys I use is based on the fact that I cold fit approx 64 records in my 8K blocks, your results will vary if you use different blocksize or different size records. All those nitty-gritty details are part of why this is a rather nerdy and underused concept. Material for a follow-up here...

When the table is filled, I then fire some selects with a Key=value condition at the tables, and I use Autotrace to examine the 2nd execute of each query, like this:


select * from heap
where key = 'FIVE HUNDRED'
/
set autotrace on
/
set autotrace off



So, How Fast is this thing ... ?

(Did you run the demo in your own database yet... ?)


For a Single Table Hash Cluster, the result is 1 (one) Consistent get.


The result for a conventional heap table is 4 consistent gets. For an IOT it is 3 consistent gets (same index, but no table-access). And these numbers risk to increase by 1 if the blevel of the indexes goes from 2 to 3.

That may not seem like a big improvement, but it happens to be 3 or 4 times less Logical IO then the alternatives. And if executed at 25-million/hr, that is still a significant saving in LIO.


Of course, there is more to it. Such as the additional CPU-usage, chooseing the hash-key and guesstimating the size-value. And how does it perform when your data-volume unexpectedly explodes and it all goes horribly wrong....
I might re-run the demo a bit and experiment. I see a follow-up coming on the next long train journey.

Sunday, 6 May 2007

Index Organized Tables IOTs, the forgotten benefits

Check if your database can benefit from IOTs. You could be surprised.

Index Organized tables are possibly the most under used feature after the clustered table (now there is another underused item …). IOT’s also have some additional benefits that the manual seems to forget.

In brief: If you have tables with parent-child relations, or dependent tables where you often retrieve all related records at once, then you should read up on Index Organized Tables. Notice that this includes nearly every database out there.

Wake up call and Warning
This feature is available since 8i, and only now, recently, does it get a number of Bugs reported against itself (not in the last place because one of our major projects boldly went out using IOTs). Have those bugs been “dormant” all the time ? Check out the list of Expected fixes for 9.2.0.8 and see for yourself (Note:358776.1, when I checked it 28-Mar-06). 3 out of the 7 fixes mentioned under IOT are bugs found by us, and a bug-search for IOT turns up even more issues. The information and the dates on the bugs and fixes suggest that most of them have also been around in 10.x. Be Careful.
Despite all these issues, IOT remains a highly recommended feature.


First a recap.

Briefly described, the IOT stores all data of a table in the leaf-blocks of the PK-index. This means there is no table-segment anymore, only an index segment. You can verify this from user_segments.

On a well defined IOT, all records with the same parent key are grouped together in one single block or in a low number of adjacent blocks. This is an advantage on retrieval. All related data is generally retrieved in just one IO action. This speeds up retrieval (less IO is needed), but also makes more efficient use of the buffer-cache because the related records do not get dispersed over many blocks. The overhead of unused cache space is reduced. So far the known advantages.


Now the bonus features.

Bonus Feature #1: More data in the secondary index. As with normal tables, You can define other indexes on the IOT table. But those indexes now have an additional feature: The secondary index doesn’t contain a physical rowid, instead it contains the actual values of the PK fields. This means the 2ndary index is often ideal for index-only-lookup or just to join additional tables after retrieving only the index-blocks. There is no need to visit the table to fetch the PK fields. For those who used to inflate the 2ndary indexes to obtain
Index-Only-Lookups: Use of IOTs can automatically lead to more IOL.

A good example is an m:n relation where the linking table contains just two id’s. A conventional design would call for a heap table with two indexes. A total of 3 segments. An IOT would result in two segments: the PK and one index. Again, there is no table-segment. The PK-segment contains both columns in the leaf-blocks. And the 2ndary index contains the PK-values as an alternative to the ROWID. Any query always needs to search only one (index-) segment, and in each case, the required data will be grouped together in a single or low number of blocks, resulting in minimal IO and efficient use of buffer cache.

Bonus Feature #2: Reduced density, avoiding of hot-blocks. Because there is no separate PK-segment, and because data and index are together in one segment, the nr of records per block is lower. Lower Density.
Normally, the PKs are the most dense segments in a system, and therefore most likely to cause contention on DML transactions most likely to show up in gv$cache_transfer on a RAC system. On an IOT, the PK automatically contains fewer records per block because of the extra space required to store the “data” columns. If paired with a higher then default value of pctfree, an IOT can be used to reduce the impact of a “hot” index. This is especially useful in a RAC environment where hot blocks need to be transferred between instances.


Some of the downsides.

Inserts and updates are slower. This is partly a myth. In like for like comparisons, you may find the difference is not that big. When people complain about the slow loading of data into IOTs they often compare loading an IOT against a table with no indexes at all. If you can do without indexes, you can also do without IOTs. A more honest comparison would be to include the additional time required to create the indexes after loading, or to compare against a table with a PK already defined.
Another way to speed up loading is to use the “ordered” option.

A quirk: the physical guess.
The 2ndary index on a IOT contains a so called row-guess to allow fast access to the leaf block of the PK were the row was originally stored. These row-guesses can grow stale, and the index statistics reflect this in the pct_direct_access value. I have never been able to demonstrate the effect, let alone the benefit of this “row guess”.
A Challenge to investigative minds out there: Can someone prove me the usage and benefit of the row-guess. Ideally, a 10049 trace should be able to show the impact of the row-guess, and of “migrated” rows and stale row-guesses. (Update in 2009, With thanks to Barry Jones, who actually demonstrated 2ndary index deteriorated over time. Impressive).

Some Best Practices for IOT:

1) bad news first: check the bugs, even on 10.2.
Although we have reaped good benefits from IOTs on a multi-Terabyte database, we did stumble across a number of Nasty bugs. We had notable problems on the gathering of statistics (dbms_stats) and with the use of Function Based Indexes (FBIs). On IOTs you can use both dbms_stats and FBIs, but beware of the bugs. Metalink is your friend.

2) Determine a good pctfree.
Set this value higher then default. We tend to use 30 instead of the default 10. This will at least partly avoid the general problem of slower DML on IOTs. Your optimal value will vary, depending in the amount of updates and inserts you do.

3) create your own overflow segments.
If your table contains many columns or large records, consider defining an Overflow segment and use it to store the columns that you will need less-frequently. If you avoid creating overflow segments, you may find that oracle does this for you. You probably want control over this yourself to avoid surprises.

4) Keep blevel down, use partitioning.
As access to the table-data will always go via the PK, the table-access as shown in an explain-plan is actually another PK lookup (note that I’m very skeptical about the row-guess). Therefore, the blevel of the PK has an impact on the number of blocks that need visiting before a data-record is retrieved. By partitioning the IOT you can make sure that the blevel stays sufficiently low. IMHO, a blevel of 2 is generally achievable, 3 is acceptable, and 4 is too high.
Nb: monitoring blevel seems an often overlooked item, and it still applies to conventional indexes as well. But the impact of a high blevel on the PK of an IOT can be especially devastating.


Final words:

Start by reading up on the IOT feature, but then.Test, Verify, Tweak, and re-Test for your specific situation. IOTs are useful in a specific set of circumstances: Retrieval of related sets, avoidance of hot-spots, RAC scalability.
As always, Your Mileage May Vary.

Further reading:
- RTFM (a bit of a duh, but yep, those pdfs are very useful)
- metalink (same duh, but there is some really good info out there too, bugs and all)
- OTN : http://www.oracle.com/technology/products/oracle9i/daily/sept04.html