Monday, 16 December 2024

Oracle 23ai - enums, what is behind the scenes ?

 TL;DR: The DOMAINs in Oracle 23ai are an additional way to add constraints to your Datamodel and you should investigate this to make your database "Smarter". When testing the simplest of domains, the ENUM, they appear Quite Efficient. And as I can never resist trying some extreme cases, I did hit some limit as well.


Background 

Domains are introduced as a way to define complex and sophisticated (check-)Constraints, and to (re)use those in multiple places in a database. Enforcement of the constraint is by the RDBMS, hence this keeps your Logic as close to the Data as possible and avoid re-coding things in various application-components. Domains can be Very Useful in this respect. You can start your own investigation here (link).

This blog is about the simplest possible case: a List of Values (LoV) constraint. For a relatively simple LoV we have the choice between using a Domain of type ENUM or an FK-constraint to limit the values for a column. The nature of an FK to a lookup- table is very much "Relational", as you can define the valid options in a separate table and let the RDBMS enforce the constraint. The nature of a DOMAIN is more declarative, and resembles a sophisticated CHECK-Constraint.

Note: A word of Thanks to Chris Saxon, who did the presentation that triggered me to doodle the scripts that ultimately lead to this set of tests. It was Fun to do.

A reference to Asterix, because of the Roman Numerals in this blog...


Testing, a small and a large Enum.

I could not resist so I created two domain-ENUMs and did some SQL on them. Here is what I did (I'll put links to full scripts at the bottom..).

My first Enum, a list of colors:


That was Easy. And now for a really long one: the list of valid Roman Numbers in Oracle...



This shows I can potentially turn a LoV table of 4000 into a "DOMAIN", and maybe gain some performance by avoiding a join to lookup-table ?

And after creations, you can query them as normal tables. Let's see how that goes: The list of values can be SELECTed and sorted at will:



Because the queries on the rather large "roman_enum" were a bit sluggish, and as I am Always curious, I checked on the effort behind the scenes. The autoexplain looked ok: no buffers-gets or disk-io:



This query is causing a lot of round-trips because the result-set of 3999 records has to be pushed to the client, but Still no buffer-gets or disk-io visible. Hence operations on DOMAINS should be buffer- and IO-Efficient. Promising. 

But queries on the large domain did use up a lot of cpu-time, which could be a warning. I ignored the warning at that time, because a full-query on such a large domain is probably a weird case anyway.

Here are the statistics from v$sqlarea from the shared pool, they showed me similar good information:



The Good Part: No logical- or physical-IO. That two of the queries showed buffer-gets was, apparently due to them being the first-time-query to the DOMAINS after the purging of the sh-pool to allow clean measurement of the statistics.

But the fact that the "composition" of the relatively large set for roman-numbers had taken a lot of CPU- and Elapsed-time was "interesting".


Now use DOMAIN on Columns in a Table:

To get some real usage, I needed tables with columns that use the (enum) domains in their type-definition. So I created two very simple tables:



Notice how the 2nd column of each table only has a definition that refers to an enum-domain. There is no other infomation to say that the values in that column are acutally numeric (in my cases). If you RTFM deeper, you will find that the actual enum_values that go in are not limited to numbers. But I have not tested that (yet).

But  let's insert some data and see what it looks like. Fist the Colors...



As you can see, you can insert data using either a number (e.g. the enum_Value) or the "constant" from the DOMAIN (e.g. the enum_Name). And on query, the value is reported as the enum_value (e.g. similar to reporting the key of an FK table). If you want to display the enum_value, you need to use a function: 

Select DOMAIN_DATA ( enum_value_column ) from  Table;

The table-definition will then use the domain defined for that column to "look up" and display the description. The lookup or display does not seem to incur additional IO, it will probably need a little extra CPU ticks, I might test that later.


A Quirk to display the large domain?

Here is what happend is I query the table with the roman_enum, first using select-star:



No surprise (yet). 

You can see I used numbers that could represent years, and because of the domain-definition, each enum_value or "row-number" corresponds to the enum_name with the equivalent roman-number. I know some ppl like to use Roman-Numerals for years, hence my use of this domain.

But now let me try to display the "roman numeral value" using the DOMAIN_DATA function:



Resulting in an Error ?? 

Also, the error was "instant", there was no noticable (cpu-)time consumed before throwing the error. And the error was always like a syntax- or typo-style error. Well, as far as I could see, there were no syntax-errors in my queries... ? (I am notirious for typos, so I did check several times, and re-typed the statement in a few ways, no relief, just errors while trying to retrieve my "roman number"...). 

Almost as if the RDBMS told me I should just revert to TO_CHAR( rn, 'RN'), which is a lot simpler anyway.

I defined a few more, different, smaller, enums, which all Worked Fine... ???

Then did a little searching (in my own database, the datamodel, data-dictionary), and found that the data-dictionary had a view "USER_DOMAINS". It contains a CLOB-column called DATA_DISPLAY, in which there is a CASE statement that seems to compose my enum:



I then got suspicious and checked the size of that column, using dbms_clob.getlength() for my domains:



Hmmm. I went back and cut my roman-numeral-domain in half, and in half again, and then did some testing to find the max-size where display_data still worked... 

And the workaround was more or less as I suspected: Smaller Domain. In my case, if I got the size of the clob below 32K, I could display the data. Here is a small demo, contained in a separate file: demo_enum_size.sql:


Voila. That works. But in this case I seem to be limited to about 1400 values.


Summary: Domains are Useful. but Test !

In practice I would probably never use a DOMAIN that large. And there are other practicalities, like you can not (yet) alter a domain without (probably) causing a window-of-non-constraint, or some outage. This needs more Testing (...). In general for such long lists, I (still) think a real LoV TABLE with FK is more appropriate. And I know an LoV table can be "edited" in a LIVE system, BTDT ;-).

But the fact is that DOMAINS have advantages too. DOMANs are declarative, that domains do not need access to "data" (buffers or disk), and  that domains can implement constraints that are much more sophisticated than old fashioned "check", that makes them potentially Very Interesting. I should test more...

So maybe TBC.


-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

Appendix: Scripts used to compose this blog..

demo_enum.sql : self running, just hit enter some 15 times..

demo_enum_2size.sql: self running, shorter, just to work around the error. 


Edit: Almost fogot:

demo_enum_io.sql : to display statistics from v$sqlarea, show the CPU + IO

No comments: