Friday, 6 December 2024

Oracle 23ai - unloading of vector data.

 TL;DR: For demo and repeatable-test cases, I like to use scripts to set/reset my test-cases. Vectors can pose a problem here bcse they are not always easily "scriptable". Not Yet anyway. SQLcl can help with the unload commands.



Pop-in an image of "unloading", because I am in the end just a farm-boy who likes tractors and combine-harversters and stuff...


Background: Scripts, Reproducible cases.

Always looking for ways to quickly test or demo something, I am a Big Fan of script-tools like SQLcl and SQL*Plus (and bash, and sometimes python). Good Demo- and Test-cases should be easy to reproduce, on any platform, even 10yrs later. The oldest "test" I got is a few scripts and one bench.dmp file from 2001, created with Oracle v8.1.7, and it is still "usable" (not very relevant anymore, but still usable, that is Backward Compatibility for you...). 

The creation and manipulation of test-data for VECTOR columns sometimes poses a problem. See also earlier blog (link) where I pointed out that the (ancient) exp/imp tools do not handle VECTOR data, but datapump does.

This blog is partly a call for help to Oracle (Hi Jeff!) to help facilitate the simple spool+reload of Vector-Data. I'm fairly sure there are several solutions, but not sure which ones to explore or use in my specific cases.


This Worked: SQLcl unload (as insert-into)

SQL*Plus and earlier versions of SQLcl had given me several errors, seemingly treating the VECTOR as a CLOB or LOB. But hey: Use Latest Version! Right?

Using SQLcl, version 24.2.0 I found the following to work, using the unload :


This generated a runnable sql-file that would correctly re-insert my data, even when the vector itself was (much) larger than 4K. My Tested case used vectors that were still under 32K. Might try bigger ones later.

The secret was, apparently, in the trick to split the vector into smaller chunks, which each occupied a line of just under 512 characters (linesize 512?). Each line was a set of numbers converted TO_CLOB ( ... ), and concatenated with its peers. The resulting CLOB was inserted into the VECTOR column, and this gave the desired result: my vectors were re-inserted. This is a screenshot of the unloaded-file:

Notice the repeated TO_CLOB(...) statements that are concatenated together to form the complete vector (which is apparently the equivalent of a clob...). 

Note: If you look inside the user_indexes and user_segments,  you can see that vector-columns have corresponding ilob and lob indexes: Storage-wise they are apparently treated similar to LOB/BLOB/CLOB columns.

I was relatively happy with this "concatenate-clobs trick" for several reasons:

. The unload generated correct, runnable scripts.

. No intermediate (staging) tables or additional pl/sql (helper-) procedures needed.

. Even SQL*Plus was able to insert this data. I had a Very Portable Demo!

But there were some clouds on the horizon...


Possible Problem with new version SQLcl ?

Then, I ran into an error after downloading sqlcl v24.3. The unload-command had changed slightly: it generated an insert of a single VECTOR. 

Hence, unloading with SQLcl version 24.3.2.0, build ..3301718, then trying to reload resulted in this error:

ORA-01704: string literal too long...

When I opened the unloaded file, the 3rd line in the sql-file was now 30.621 characters long (hm.. just under 32K, that seemed a coincidence - I should try with longer vector some time... ). There were no more TO_CLOB (...) concatenations, and the vector was just a quoted string of values, looking like this:

This simple quoting of the vector will work for small vectors (below 4K or even 32K), but not for seriously large vectors where the string-representation will surpass these values. Also, I am not sure what would happen to binary vectors: how would they be represented as quoted string ? (More stuff to test!.. I know, I know, but Time... Maybe later).


Final notes: Multiple Solutions.

From working with LOBS and CLOBS myself long time ago, I know how Tricky these things can be: They mostly require use of some other language than just SQL(*Plus) to manipulate them. They need to be split into chunks that are manageable by whichever language/system/host is doing the processing. But as a simple SQL user who just wants to run Easy Test- and Demo-scripts, I am hoping the solution will remain Simple.

By documenting this "case" I hope Oracle can either fix this (by reverting back to using multiple TO_CLOB?), or work around this problem in some other way.

One suggestion I got in private-message was to try using JSON: get the VECTOR in some JSON format and use that to unload/load the data, possibly in smaller chunks. And of course you can also revert to additional PL/SQL, Java(script) or Python to manipulate vectors. But all those solutions involve some additional (bespoke) coding on my part. And I just want Simple-Demo cases.

Suggestions Always Welcome ! 


----- End of Blogpost ------ 


Attachements, Addendums (just some files)


Below are links several links to files to test/demo for yourself:

ini_unload.sql:  Set up the test-case, create table and load nine records with VECTOR(2048, FLOAT64), The data is generated from some of my experiments with vectors.

tst_unload.sql:  Demonstrate the working (or error) of unload-as-inserts using SQLcl. This script will run the actual demo and possibly hit the "error" when it runs the script generated from the unload-command.

If you want see the two unloaded files: 

The one with "working" inserts and 

The one that generated the error..


Happy Scripting.



No comments: