Monday, 11 November 2024

Oracle 23ai: a Vector is not a CLOB, Versions are Important.

TL;DR: When experimenting with new (brand)features in Oracle 23ai, in my case VECTOR-datatype, I ran into some small anomalies. I'll present some tips, tricks, and workarounds in case someone needs them.

In short: Always Use Latest versions of client-tools like SQL*Plus and SQLcl if possible.

Note: As I write + experiment, I find more+more, this can turn into a small series...


Background: 

Discovering the VECTOR type and the promise of "AI-applications", I did the RTFM (link). And also dug up some (python) code to generate and experiment with vectors from commonly-used AI models. More on that in later blogs when can get it to Do Exactly What I Want...

My confusion started when I toyed with scripts to create tables with vector-type columns, and put data in them. I wanted to try-out operations, queries, indexes, and whatever else there is to discover on this new datatype (hint: There is a Lot! The future is going to be Interesting).

Further background: My Test-Platform is often a trusty old MB-Pro, which is a "vintage" Intel-based model from 2013. On that laptop, the client-stack is version 19.8 and the included binary sqlplus executable seems Identical (size and checksum) to the one in the latest download, which is labelled 19.16 (as we write: Nov 2024...).

For "server", I sometimes use VBoxes but mostly I use containers generated from the images provided by Oracle. For this test I was using : 

    podman pull container-registry.oracle.com/database/free:latest

This one contained Oracle database Version 23.5.0.24.07, the Free version because GA is still pushed into the future (I'll let Tim comment on that, I Agree).

And when I experiment, I want everything to be "scripted" and repeatable. Ideally, the scripts are also self-explanatory. That way, in future, I can pick up the script, and run it to see, to repeat/review, or to demo what I was trying to achieve.


First Anomaly, VECOR in SQL*Plus shows up as ... CLOB ?

The script I had in mind to generate my tables with VECTOR columns worked fine (... Table Created, no errors). But when I did a DESC on the table it showed... CLOB ? 

Check this small demo-table with some vectors and a json column:

Notice how the table is created, but the VECTOR columns are reported as CLOB.

Also, as aside, the JSON type is reported as UNDEFINED ?

The DESC is relevant here, because the AI models are quite Specific in the Length and Type of vectors they generate and use. The Length has to match Precisely with the vectors form the model otherwise they won't insert (link). And the type of the vectors ( FLOAT32 or FLOAT64) has to match that of the model. Furthermore, vectors of non-matching length+type can not be used in vector-to-vector arithmetic operations. Hence my frequent use of DESC to verify my exact vector definitions.

Note: I can elaborate on type-conversions and the tricky-ness of using python-models with numpy and other libraries as described here (link), but that is not the topic of this post.

Just to do the modern-thing, I also checked this using SQLcl, only to find that the DESC of the table still didnt report all the details. Hmmmm... But when I downloaded (Aha!) the Latest version of SQLcl, "Release 24.3.1.0 Production Build: 24.3.1.311.1631", downloaded on 11Nov2024, finally the DESC came out Correct:

This was the first hint that You Always Need Latest Versions. But what if your corporate-lockdown-PC doesnt allow you to...? I wanted to look a little further. Not in the least because I am kind of addicted to running all demos as scripts in SQL*Plus from my terminal, or Any 80x25 Terminal for that matter. Simple is Always Best...


Reason: VECTORS (and JSON) are stored similarly to LOBs.

Using SQL*Plus, checking the view USER_TAB_COLUMNS: this showed me the definition, type and dimensions reported by the catalog were "correct". 

Furthermore, the USER_INDEXES told me there were 4 indexes of type LOB on the table. This leads me to suppose that the data in the VECTOR columns (and the JSON column) is stored very similar to LOB-data (link to lob-storage-docu).

There is your first "workaround": query the data-dictionary from the catalog.



Workarounds: Use Latest Client-tools, or ... use the dictionary.

At this point, I went "into the server" by attaching to the container where my database runs. 

    podman exec -it ora23 bash

I correctly assumed that a container-image that provides a certain version, also holds the SQL*Plus and other client tools for Exactly That version.

And Yes, running the CREATE and DESCribe in SQL*Plus from the container (from "the server" if you like), the columns show up as with correct datatype and dimensions, exactly as per the create-statement:


From the server-side: Notice the versions of sqlplus and of the database. And the correct DESC results after the table-create.

Main cause of confusion is the fact that the client-tools for my antique "Vintage" intel-macbook are stuck on v19. I just have to work around that.

All Good.


Lesson: Always aim for the Latest Versions, even your client-tools.

Use the Latest Versions of client-stack. Always. And Verify. Don't hesitate to be suspicious and Re-Verify. In my, rather suspicious mind, I recommend to run your latest scripts with the SQL*Plus version that comes with the database-release. But you may need to upgrade on you PC or obtain server-access for that. Safest is to use the client-tools that come with the version itself, e.g. from the same install(er) or the same container. That is probably the executable that has the latest enhancements and improvements built inside. 

And, if you can, always getting the latest version of SQLcl is a Good Practice. 

 

No comments: