Sunday, 17 November 2024

Vector functions in 23ai - and unexpected insights.

TL;DR: I loaded and VECTORized my 19.000+ photos into my Oracle 23ai database. Converged Database, yes (link). Some Surprising findings came out. Discovering things about Vectors, python, ai-models, and SQL-Operators for Vectors.


Background: I want Text-Searching through my photos.

After 25 years of taking digital pictures, I have way too many photos. I need some clever mechanism to search them...

So when Oracle 23ai came along with the some "ai" built in, I started tweaking around with VECTORs and Vector-Search.

End-Goal is to be able to enter a text-question, and find photos related to that question. (this may take some time / tweaks / blogs...)

I am Not There Yet, not by far. And after months of tweaking, I'm not sure if I will ever reach my goal. But I did Learn some interesting things, tech and non-tech, in the process.


Loading the image: The Easy Part.

Oracle 23ai is a "Converged Database" and should be able to handle any Data. So naturally, I want to load my data into a Table. The Table:

Next, the example python-code, to open and insert the image from a file:

This is of course very much a standard operation. Easy. If you set out to build something like this, check the documentation on the oracle-python driver, and pay attention to the input/output handlers. Maybe more on those later.

In the subsequent version of my experiments, I decided to not load the binary-images anymore, as loading took time and bloated my (containerised) database.  There are arguments pro- and contra-, but it is good to know you Can Effortlessly store the images in your Oracle Database.

For your own real-world applications, you can make your own decision: keep images on separate storage or store them in your Converged Database (often Recommended!). 


Loading a Vector: some caveat around datatype. RTFM...

The Table to hold vectors, in this case for a model that generates 2048 elements of Float32, like mine did (I used ResNet50 obtained via Hugging Face):

In this case I store the vectors separate from the image-table, because I expect to generate multiple vectors, from different models, for every image. Hence a separate table, related to the image-table via img_id (an FK, of course).

The example python-code (you can see I did some debugging/checking):

Notice a few things:

Python Datatype: Some models generate numpy-arrays, and other produce (standard?) python arrays of <class 'list'>. I'm converting everything to list, and sometimes I print the contents to a file, dbgdata.

Oracle VECTOR datatype: in my case I ended up with "garbage" data in my vectors a few times. That was mostly fixed by trial+error: I tried FLOAT32 and FLOAT64 in the vector-definitions, and I tried  "d" and "f" in the array-constructors. The resulting combi of "d" and FLOAT32 finally worked (d-for-Double was counterintuitive for a float-32, but would be a different blog, and not relevant here... )

If you go about inserting VECTORs from python, I strongly Recommend to read The Fantastic Manual (RTFM link). Also, because this version is not net GA at time of Writing (Nov2024), things may still change slightly. My type-conversion might have been caused by old-versions or outdated-release.

Notably about input-handlers and output-handlers you will need to convert between python-classes and oracle-column-type. RTFM and Test! I had some trouble + confusion there. Hence I sometimes ended up printing vectors out from python, and even built a small utility to inspect vectors inside my database (link to victor.py).

Also, the python-array Must be of same length as the defined VECTOR column. You may find out the hard way, especially if you start using various models to generate vectors, and their lengths differ. This is probably a reasons to create a separate table to hold the vectors from each model.

After some trial+err (okok, some of the dbg-code is still there), my vectors got correctly inserted, All Good.


This Works: Vector-similarity search.

When doing Cosine-Similarity comparisons on the image-vectors, I get the "expected" results: Similar images yield Cos-Similar vectors. 

To simplify, I created a view to encapsulate the Cosine-Similarity:

Here the calculation is embedded in the view, which makes the typing of SQL a lot easier. The test-code immediately following the create-view sort-of proves that it works. In the limited test-set of my first 200 images it found two that it thinks, according to this "model", are very similar and the filenames are probably no coincidence.

Notice the use of <=> as an operator, rather than calling:

vector_distance ( vec1, vec2)

I prefer an Operator, it reads more like a Query. The available operators are: 

vector1 <=> vector2  : Cosine Distance

vector1 <-> vector2 : Euclidean Distance

vector1 <#> vector2 : Negative Dot Product.

Those operators are identical to what is being used in some other SQL implementations.  Not sure if they are part of any Standard yet.

And this View worked like a Charm. The two files, 1-view and 2-view in my test-set were indeed Very Similar. I opened them to double-check:


More testing: I also happen to know that image nr 7 is a powerline-pylon (dutch site on pylons). Let's use the view to check if I have more photos of pylons:

After a testrun, I included the "easy open" statements for the files with a high Cosine-Similarity, all above 0.9. And Bingo:

It found 4 other files, and ... the file itself, which of course has a Cos-Similarity of 1.000 with itself. Yes, all Pylons.

All this looks hopeful: I can find images, when I can provide an "Example". The Cosine-Similarity will find images that resemble. The higher the cos-sim, the more the image should look alike.

The view works, I can encapsulate the vector-operators inside views, which is a Great Convenience for writing complex queries.

And more importantly: Similar Images have a high Cosine-Similarity. This is as close to QED as I get at this point.

Of course, I'm not the first to try this. Notably That JeffSmith (Hi Jeff!) did something very similar, using a REST-API looking for a cute dog. As did lots of others on t'interweb.


The Limitation: no Text-Search yet. 

OK, now I can "embed" or generate vectors on the images and use them for "similarity" search. But to search the (embeddings of the) images with "type text to search" has not worked yet: None of the models I explored could encode my Text good enough so that a the text-vector and the image-vector have a satisfactory high cosine-similarity.

This means I was not able to use "question-text" yet to search my images, despite the "promise" of several models that it would be possible. 

The details would be boring, and not very useful yet as I dont understand all the details enough myself yet.  

But in the process I stumbled across several other interesting "findings". More in subsequent blogs.

The search continues. 


Some Learnings:

The "models" used in embedding are a whole separate topic. Let me just Recommend the Hugging Face site (link). But there are many more sources and this field will probale evolve rapidly. 

Models and Vectors: They only work if you compare like-for-like, e.g. vectors from the same model (duh). To this end, it is probably better to have separate tables for each model. I'm experimenting. Hence ...

Datamodel: must be flexible. I will experiment (already have) with various models for "embedding" and I will have to adjust my datamodel accordingly. For example, it may or may not be smart to store vectors of variable lengths (different models) in the same table. And in several models there are "captions" and "keywords" generated that need to be stored somewhere, preferably in a way that is easily searchable by SQL and by some vector-embedding-mechanism. This datamodel will Evolve.

Indexing: No need yet, to my surprise. So far, all my experiments with vector-search have been (sub)second queries, even on 100K or larger tables. This was a nice surprise. But in future, I think we should explore vector-indexing. The manual on those is worth a study (link) already. Also, I'm sure other DBA-geeks will soon dig into the indexing and explain-plans for vector-search queries. Either from necessity or CTD. Or simply becaus that is what DBA's do, right?

SQL Rules: Once the VECTOR-data is inside the Database, you can use the VECTOR-column and vector-operations in SQL-statements like any other (Thankfully!). I used a VIEW to simplify things, encapsulate the joins, calculate the Cosine-Similarity, and facilitate searching.  Neat, Simple, Convenient!


Conclusion: It is a Journey...

[small]

Caveat: This is not one of those smooth (self- marketing-) blogs where the author has prepared 7 blogposts, neatly leading up to a "desired outcome" to demonstrate how Wonderful their huggingface-model is, and how clever the products they use... This is just a DBA-grunt searching his way around new-ish technology. 

[/small]

I've done some interesting coding and running with various models. I need (want) to do More on this topic to Explore.

For example: the generation of Captions and Keywords: This may provide the key to Text-Searching of image. And I found some surprisingly Accurate captions on my own collection. 

More Later.

But I'm Enjoying the Journey.



Friday, 15 November 2024

Look Inside my Vector, Victor. 23ai vector inspection.

 TL;DR: I want to see the element-values of my vectors. For larger vectors, this poses a problem, as SQL*Plus does not easily display the values in the far-end of a sizeable VECTOR. I'll put some SQL and Python code at end of blog.


Background: Inspecting VECTORs

Since Oracle 23ai can store and process VECTOR columns, I am experimenting with some of that. Notably "models" that turn text and images into vectors so they can be compared (searched) using cosine-similarity and other hip methods. The searching sometimes yields remarkable results (surprisingly Clever stuff!), and sometimes just ...  Nothing (???). This vector-search to detect similarity, the heart of the Artificial Intelligence in this gizmo, is notably dependent on the models that generate the VECTORs and on the vector-functions in the database.

Those very clever models are to me "just python code" that I call to generate vector-arrays. However, the type of the elements inside the arrays ( or lists / matrices / tensors) depend on their origin-packages: Float, Binary, Integer, even String-values are possible as vector-elements.

From the conversions between one datatype and another, and from the plumbing of data between python and oracle, I have sometimes ended up with errors or just "wrong data" in my vector-columns a few times.

And, understandably, if you try to compare Pears and Oranges, you end up with un-reliable results, or even straight up errors.

Hence the need to sometimes verify, to "inspect" my VECTORs in detail.


The Problem: Trust No 1.

Some of the models I used generate normal python lists, other use numpy (link) and generate numpy-arrays. Then there is Oracle that wants to know if a VECTOR will contain FLOAT32, or FLOAT64, or BINARY values. And if the values from python do not match the values defined for the column, I seem to end up with either errors, or with scrambled (wrong!) values in the vector-column. 

Hence the Need for input-handlers and output-handlers. You have to know this when processing data between python and oracle (link to docu)

Combine that complexity with the fact that sometimes my cosine-values were suspicious (e.g. probably wrong). 

And all this lead me to want to "inspect" my vector-values using both python and SQL*Plus or SQLcl, just to see if the "generated values" had also ended up correctly inside the VECTOR columns in my database. 


Attempt nr 1: SQL*Plus to view a vector.

For small vectors, there is no problem. Here is my super-simple demo-set (also used to calculate known cos-distance and euclidean-dist, Highschool-Math, but different story): 

As long as I can sensibly fit the values of the vector-elements on a screen, I can display them. With some effort, I can even read the floating-point notation of more complicated vectors:

Now, at some point, the values will become too many and too large to display. Others have suggested to increase buffers, linesize, pagesize etc... But There is a Limit to what you can sensibly Display and Decipher.

If this works for you: You are Good. No need to read further. But in my case: Not Suitable for vectors of 512, 784, or 2048 elements... 


Attempt nr 2: regex to un-pivot a vector.

when looking at the values from SQL*Plus, SQLcl and SQLDeveloper, the data is presented as CSV. And I happen to know a trick to un-pivot CSV data:

That works for the small vectors (I should also remove the '[' but I'm Lazy...).

It also works with larger vectors, with more elements. But with a vector of 512 elements:

The fact that there is an implicit use of VARCHAR with maxsize (32K?) will limit my regex options.

So, at this point, if You are able to display Your Vectors: You are Good. 

But in my case, I had to keep looking...


My Solution: a Python program to "print out" the values.

Since most (all) of my models are currently in python, I looked at something python-y to read+display my vectors. The Requirements that came to mind were:

1) Query any vector, in an ad-hoc way, as I would normally do with SQL.

2) Display All Elements in a readable way to a terminal or write to file, whereby a tee or redirect would also work. 

3) Some easy way to check and compare VECTOR contents (a hashtotal came to mind as a first, crude, way to check a vector).

After some experimenting, this resulted in the program called victor.py. It does approximately the follwing:

a) Connect to the database (I'll work on command-line or connection-name later) and report about the connect: username, global_name, so that the user can confirm "where the program is".

b) Determine which SQL to use. The Query must have a VECTOR (or list) as first column in the return-cursor. It will not complain if there are more columns, but they will be ignored (for the moment).

b1) IF there is an arg[1], THEN assume that is the SQL to use to get the vector(s). Currently, there is a need to "quote" the SQL statement to make it just arg[1].

b2) ELSE ask for an SQL statement. 

c) Open a cursor with the statement, and start returning values. If the SQL contains a syntax error, that will result in a nasty error-stack, but the actual error will be at the bottom of the stack, Readable. 

c1) IF the first column is not a list (vector), THEN report a warning.

c2) Else start processing the VECTOR, report its length, the type of elements (float or...), and print the elements 1-by-1. At the end of the element-list: print some summary (vectorlength, sumtotal, rownum in cursor..).

c3) Prompt for Next-record or Control-C to stop

d) Report some closing message: nr of records, and the SQL-used in case the user forgot... Then Exit the program, which closes cursors and disconnects the session (maybe check this!).

I'll try to include the py-source and a demo-sql at the bottom of this blog.

So far, the resulting code, victor.py, worked for me. Let me show some runs. Or as they say at #POUG: 


This is Blog, Let's Demo:

Calling the program with no arguments: It will prompt for a query. And we'll give it a very simple query: select * from dual ;

Notice the gentle warning. The program needs a Query that returns a VECTOR as the first column of the query. 

So, let's give it a vector. We'll use the new-and-improved feature of a select without a from, and give it just a vector, a vector of Two Elements, two numeric elements with values 1 and 2:

Result! 

The program detects a list (vector) and reports "success": Vector Found. It then reports the python-type of the objects (class 'float'), and the detected dimension, the length of the vector. This vector seems to have Two Elements.

Next it prints out the elements each on a separate line. This list can be quite long as typical vectors are 100s of elements or more, hence this will typically scroll off screen.

And it closes with a "summary": it found a vector of 2 elements, of which the sum-total was 3, and this was the 1st record of the resultset. 

I'll let you test for yourself what happens with : 

select to_vector ( '[1,2]' ) 

union all 

select to_vector ( '[3,4,5]' )

And then re-try that but leave out the "all" keyword, hehehe.


Now for Real Serious Data:

Next it was time for some real Tables. Start small, using the example from the start of the blog. The data in table DV1 consists of 3 records, inserted as: ...

For the query, we answer the prompt with 

select vct, id from dv1 ; 

and the result is:

And Voila: The first of the vectors pops out from a table, I can "read" every element of it, and it waits for me to press "enter" before fetching the next record. 

Also note that the ID field in the select is ignored: it is returned from the cursor, but not used in any of the code. The advantage is this allows you to copy/paste queries with some leftover-columns in case you are in a hurry and dont want to do too much editing. In later versions, I might try to use those additional columns for more information.

If we hit "enter" three times, all three records will be fetched, displayed, and when the cursor is finished, Victor.py will tell you and present a summary:

Three records processed, and just to remind you it re-displays the SQL you made it process to get those results. 

At this point, notice that the use of a semicolon to terminate the query is not needed (python doesn't want it), but if you happen to type or copy a ";" at the end the program will ignore it (I got too many errors from leftover semicolons while testing with copy-paste of queries..). In future, I may want to use multi-line input and then the ";" becomes an important End-Signal. Later.

But Now I want to see a Real Vector: the table IMG_VECTOR contains a VECTOR column with data from the CLIP-model, vectors of dimension 512. The table typically holds 100K records so I may not want to see all of the result-set. At this point I really needed a "Control-C" whenever I forgot to limit the results via a Where-Clause. 

Let me try and inspect the first few records of this table, and I will use the "command line" to pass the SQL. 

Notice the double-quotes around the SQL, something I might remove in future releases. but for the moment I need it to treat the SQL as a single (argv[1]) item.

When the cmd-line contains the SQL, Victor doesn't need to prompt for an SQL, and can start straight away to fetch and display the first vector. The 512 lines will scrolllll all the way, but I'll show the top:

And at the bottom, I can either hit "Enter" after each record, or Control-C to stop fetching. Ive hit Control-C after the 2nd record:

Stopping the fetch means Victor will once more show the SQL, will report how many rows were processed, and will then gracefully Exit. 

Job done.


Closing Remarks: It worked for me.

In your case,  you may be happy with just SQLcl, SQLDeveloper, or SQL*Plus, or you may already have your own utility. 

I do expect Oracle to come up with some "vector inspection" option in its Tools (SQLDeveloper, Apex, SQLcl, PL/SQL-arrays...). There may already be a few when you read this.

I'll try to include or link the python code below, and may even put it out on github when it is ready for that. 

The code contains a number of comments from ideas that I think _could_ be done to improve it, but for the moment it works for me. I'm happy at this point, not sure if I need more.

Feel free to send me suggestions with improvements. But chances are you can improve it yourself, or some other tool(s) out there is already doing a better job. 

----------------------- notes and source code below ------------

Note1: "Trust No 1" is a famous quote from Agent Mulder, which he apparently also used as his computer-password. Those Were The Days, before Tokens and 2FA and Biometrics.

Note2: "What is our Vector Victor" is a quote from the 1980 movie Airplane (link). You will hear a lot of greybeards using that phrase when talking VECTOR. I could not resist...

Below: the SQL to set up the demo, and the py-code from my python-thingy.. Feel free to plagiarise.

------------------------------- victor.sql ---------------------------

Link to the file...

------------------------------- victor.py ---------------------------

Link to the file...




Tuesday, 12 November 2024

Oracle 23ai: Backup of Vector Data.

TL;DR: If you have datatype VECTOR in your tables, you can no longer use exp/imp for quick-backup of (small-ish test)sets. Switch to datapump, or use other ways to save your testsets.


Background:

Experimenting with the new VECTOR datatype in Oracle 23ai, I did a lot of testing and tweaking. And sometimes, I want to save a (test)set: tables with data in them, often with related pieces of PL/SQL that act on the data. The old-fashioned exp tool was ideal for that: I will backup a schema-owner with all objects and data in just one file, and restoring is dead-simple (if you are a legacy-dba like me). 

Some of my testing still uses exp. And That became a problem when I got columns of type VECTOR.


Oops: exp throws errors.

My trusted go-to solution for quickly saving datasets is good-old exp. It did not work in this case:


That table did not get exported...

Notice that I am using the latest version of both the database and the exp-tool, I;m running this from the sever, but I got the same results at my client-side with a slightly older exp version.

This error is more or less expected, as the exp/imp toolset is no longer supported "for general use" (link). The tool is not aware of the new VECTOR type, and apparently it refuses to treat is as "just a BLOB". I can ask Oracle if they would consider upgrading exp, but I suspect the answer is "No". 

NB: One of the reasons exp was deprecated is notably the security-risk: anyone with SQL*Net access could use it to Very Easily extract data from a database. And we all know "security is king".

I needed to look for Alternatives...


SQLdeveloper: Always Helpful.

My first thought was: Scripts. I like scripts, so I would want a file with INSERT statements. SQLDeveloper is Good for That, it is free to use, and you can find the latest version here (link). It can generate insert-scripts to recreate and populate a table. And that worked for some of my vector-tables but not for all. 

Of corse the "insert of a vector" can be bulky: Some of mine are 512 or even 2048 floating point values of type FLOAT64, and that takes a lot of text when written out as an insert-statement.

So, for some of my vector-columns I would Not get the actual "data" in but rather an insert of an EMPTY_CLOB():

My guess is those vectors (these are size 2048) were too large for an 32K varchar2, and the tool just decided it couldnt handle it. I'm also guessing there is a limit to the size of any individual quoted vector that SQL*Plus can re-insert. After all, you can not spool BLOB columns for insertion via SQL either.

So this was only a solution for Some cases, but not for All of my testsets. 

But if this scripting works in your case: You Are already Good. SQLDeveloper can do the job for you.


Re-generate the embeddings from source:

For small datasets, I can re-generate the vectors from the original image or text. It may take 2min, 10min, or even 60min, depending on the model and the dataset. But for small and moderate test-sets, I can always re-run the embedding. Just be patient. Major dowside is: I now need to save my source-data my test-sets. In my case, a lot of images and textfiles. This increases the size of my "tests" by a few orders of magnitude, and it makes it much more tedious to (wait for) restore testsets when I mess them up or want to re-run tests frequently.

Generally Possible, but not Practical.


Use Datapump: This worked in All Cases - Problem Solved.

The replacement of exp and imp is of course the DataPump variety: expdp and impdp. And they are included in the Client-Stack if you download that, as every DEV and DBA should from here (link).

Here is my expdp, started from my local laptop:

No issues.

The biggest Caveat of datapump is that the files will be "server side". In the example, it is located in the dpdump directory of the pdb where my user-data is located.

Your dmp-file ends up in a Directory, usually the dpdump directory, on the Server. You can run datapump itself from your client-machine (I do that a lot). but to access it the dmp-files beyond just expdp/impdp you need access to the location on the server where datapump stores them. I for one like to kick them into the git-repo with my testcases. 

Your DBA will know where the files go by defaults, but you might want to create a separate directory for them.

My preferred way of working would be to "map" the dpdump-directory or the volume where expdp and impdp have their files. Check if that is possible (and allowed) in your situation. Alternatively, you can resort to some copy- or transfer-scheme to get/put your files in a place where you want to keep them.

For assistance with those "files on the server side", you may need to contact your DBA or Sysadmin. 


Lesson: Use SQL-Scripts or Datapump (and say Hi to your DBA)

The VECTOR datatype may be challenging when you need to backup/restore testsets quick and flexible. My preferred way is be to have "sql-scripts", with insert-statements. When such scripts can not do the job, because your vectors are too large, or for any other reasons, I would recommend using expdp and impdp

In using those tools you may need some help from your local, friendly DBA. Always a good idea to be on Friendly Terms with these ppl.


Monday, 11 November 2024

Oracle 23ai: a Vector is not a CLOB, Versions are Important.

TL;DR: When experimenting with new (brand)features in Oracle 23ai, in my case VECTOR-datatype, I ran into some small anomalies. I'll present some tips, tricks, and workarounds in case someone needs them.

In short: Always Use Latest versions of client-tools like SQL*Plus and SQLcl if possible.

Note: As I write + experiment, I find more+more, this can turn into a small series...


Background: 

Discovering the VECTOR type and the promise of "AI-applications", I did the RTFM (link). And also dug up some (python) code to generate and experiment with vectors from commonly-used AI models. More on that in later blogs when can get it to Do Exactly What I Want...

My confusion started when I toyed with scripts to create tables with vector-type columns, and put data in them. I wanted to try-out operations, queries, indexes, and whatever else there is to discover on this new datatype (hint: There is a Lot! The future is going to be Interesting).

Further background: My Test-Platform is often a trusty old MB-Pro, which is a "vintage" Intel-based model from 2013. On that laptop, the client-stack is version 19.8 and the included binary sqlplus executable seems Identical (size and checksum) to the one in the latest download, which is labelled 19.16 (as we write: Nov 2024...).

For "server", I sometimes use VBoxes but mostly I use containers generated from the images provided by Oracle. For this test I was using : 

    podman pull container-registry.oracle.com/database/free:latest

This one contained Oracle database Version 23.5.0.24.07, the Free version because GA is still pushed into the future (I'll let Tim comment on that, I Agree).

And when I experiment, I want everything to be "scripted" and repeatable. Ideally, the scripts are also self-explanatory. That way, in future, I can pick up the script, and run it to see, to repeat/review, or to demo what I was trying to achieve.


First Anomaly, VECOR in SQL*Plus shows up as ... CLOB ?

The script I had in mind to generate my tables with VECTOR columns worked fine (... Table Created, no errors). But when I did a DESC on the table it showed... CLOB ? 

Check this small demo-table with some vectors and a json column:

Notice how the table is created, but the VECTOR columns are reported as CLOB.

Also, as aside, the JSON type is reported as UNDEFINED ?

The DESC is relevant here, because the AI models are quite Specific in the Length and Type of vectors they generate and use. The Length has to match Precisely with the vectors form the model otherwise they won't insert (link). And the type of the vectors ( FLOAT32 or FLOAT64) has to match that of the model. Furthermore, vectors of non-matching length+type can not be used in vector-to-vector arithmetic operations. Hence my frequent use of DESC to verify my exact vector definitions.

Note: I can elaborate on type-conversions and the tricky-ness of using python-models with numpy and other libraries as described here (link), but that is not the topic of this post.

Just to do the modern-thing, I also checked this using SQLcl, only to find that the DESC of the table still didnt report all the details. Hmmmm... But when I downloaded (Aha!) the Latest version of SQLcl, "Release 24.3.1.0 Production Build: 24.3.1.311.1631", downloaded on 11Nov2024, finally the DESC came out Correct:

This was the first hint that You Always Need Latest Versions. But what if your corporate-lockdown-PC doesnt allow you to...? I wanted to look a little further. Not in the least because I am kind of addicted to running all demos as scripts in SQL*Plus from my terminal, or Any 80x25 Terminal for that matter. Simple is Always Best...


Reason: VECTORS (and JSON) are stored similarly to LOBs.

Using SQL*Plus, checking the view USER_TAB_COLUMNS: this showed me the definition, type and dimensions reported by the catalog were "correct". 

Furthermore, the USER_INDEXES told me there were 4 indexes of type LOB on the table. This leads me to suppose that the data in the VECTOR columns (and the JSON column) is stored very similar to LOB-data (link to lob-storage-docu).

There is your first "workaround": query the data-dictionary from the catalog.



Workarounds: Use Latest Client-tools, or ... use the dictionary.

At this point, I went "into the server" by attaching to the container where my database runs. 

    podman exec -it ora23 bash

I correctly assumed that a container-image that provides a certain version, also holds the SQL*Plus and other client tools for Exactly That version.

And Yes, running the CREATE and DESCribe in SQL*Plus from the container (from "the server" if you like), the columns show up as with correct datatype and dimensions, exactly as per the create-statement:


From the server-side: Notice the versions of sqlplus and of the database. And the correct DESC results after the table-create.

Main cause of confusion is the fact that the client-tools for my antique "Vintage" intel-macbook are stuck on v19. I just have to work around that.

All Good.


Lesson: Always aim for the Latest Versions, even your client-tools.

Use the Latest Versions of client-stack. Always. And Verify. Don't hesitate to be suspicious and Re-Verify. In my, rather suspicious mind, I recommend to run your latest scripts with the SQL*Plus version that comes with the database-release. But you may need to upgrade on you PC or obtain server-access for that. Safest is to use the client-tools that come with the version itself, e.g. from the same install(er) or the same container. That is probably the executable that has the latest enhancements and improvements built inside. 

And, if you can, always getting the latest version of SQLcl is a Good Practice. 

 

Tuesday, 15 October 2024

Serverless, Docker, Podman, k8s #JoelKallmanDay

 TL;DR: Soon you will run Production databases from a "serverless" platform. 


For several years now, I run most of my tests and demos from docker or podman, and it is just Convenient. It is even easier than using VBox, although I still keep my VBoxes around to give me more of a "sysadmin" experience when I want that.


But should you run "Production" from containers?

Usual answer: It Depends.

Practical Answer: Soon you will. 

So Get Ready to run your (Mission Critical) Databases from K8s or similar container environment.




What I didnt like about Containers, at first:

  •  Layers: Another abstraction layer to "cloud my view", things get hidden deeper and deeper.
  • Ephemeral: Data wants to be "kept", not lost on the first restart-glitch. Mapped volumes can fix that, but they are just another item to worry about.


What there is to like about containers:

  • Abstraction: Easier to run your software/system/database on many platforms.
  • Flexibility: Easier to move around and Scale.
I'm sure you can see the humour of the arguments.  And the lists can be even longer, much longer. I might dedicate a blog to "scale out" of Databases, and the problems that arise, especially on shared-nothing clusters. Some Other Time.


Why I think you will (soon) be running on containers: Because the rest of the IT world, e.g. most developers and administrators, will be running "Containerised".

Some (hip, trendy) organisations already dictate the use of some K8s or other container-platform, and their sys-admins are not able or willing to facilitate anything else anymore.


Also, the next generation of databases is under development, and most of them are compatible with mainstream Open Source systems (either forks or wire-compatible, which also merits a whole separate blog-discussion). The deployment, their use-case, is more and more Native-Containerised. Their whole setup is geared towards container-deployment. 

Of course, performance-geeks (me too..) will state that more layers and more components will Complicate things, and that nothing beats deployment on Iron. This is True. But I suspect that only the extreme-cases (high-volume OLTP) will stick with "raw iron". All other usages might well end up on containers by approx 2030 (in 6yrs from now?)



Also allow me a look-back: In around 2004, we didnt think VMs (virtual machines), were solid enough to run "real databases". By 2014, everything was on VMs, simply because "that is what we standardized on". And we take the advantages of VMs (isolation, layering, flexibility) for granted, we could not live without it anymore.

And by now, Oct 2024, I feel confident enough to run an Oracle Database via Podman, Docker or K8s. As long as I can :"control" the volume mapping and locate my files in a safe storage area somewhere. I suspect this technology is now "Database-Ready"

Anecdote: I recall a cstmr where the statement "we dont want servers anymore" was in vogue, this was 2018, some 6yrs ago. They went to Large-Vendor-DBaaS for databases, and some of their "architects" went to great pains to eliminate the last few servers (batch-processing systems). They even bullied a few of their grey-bearded developers out of the company to forcibly "kill those batches". It wasnt an efficient move, and it wasnt pretty to see.

Luckily, you can start a container (I think we even used an oracle-container image) to act as a linux-server. And the command:

docker exec -it node42 "bash -c yourcommand --options and args"

works Fine in 99% of cases. This is one way to hide your server and keep your legacy (unix, linux, even solaris) software from running. It may not be "The Container Way", and you might get some critical remarks from the zealots. But it Works Fine.

Using these workarounds, you wont need a server anymore. On which some will say: You just run on someone else's server (oh, Hi Kamil!)

But There you have it: You can Run Serverless or Contanerized.



Sunday, 13 August 2023

Adding PDBs is Easy also in v23c.

TL;DR: Success! I've got several PDBs plugged in a CDB, all created "just from scripts", no "template-recovery". From here on it is all is Dead-Simple. Or is it.


Spoiler: Finished. Success. But there is Always More ToDo.

And... Need I state it: Do Not Try This at Work. (but if you Do, let me know how it went. I'm sure you can improve on my attempts...).

Update: Oracle has, on 02-May-2024, announced to rename version 23c as 23ai. This series of blogs was researched and written with the early releases of v23, when the suffix was still "c" (for cloud), but I am too lazy to go and modify all the referneces to the new 23ai (artificial intelligence). 
As soon as time permits, I will repeat my "testing" on the new versions, but on first impression, it all works identically on 23ai.



Background:

In the first three "episodes", I created the CDB + Seed, added the "catalog", and added the more or less mandatory components. Now it is time to add (and maybe delete) some PDBs and see what is happening.

To avoid the use of "templates-to-recover" (which, btw, is Highly Recommended, and much more efficient), I've been creating a database manually, from scratch, just using scripts. In all cases, I have tried to choose the "Minimal Effort" and. "Minimal Configuration Work". Minimal RTFM even (which is kinda risky... You Should Always RTFM, Always). 

My list of parameters is kept minimal, thus using defaults wherever possible, my typing effort was minimal, and the management of files (creation, placement, naming, deletion) is totally left to Oracle (using defaults and OMF wherever possible). 

As software I'm using the "Free Developer Release v23c". But since that is an early (preview) release, not all components are "in", and the option to "generate scripts" wasn't included in the DBCA from v23c yet. Hence I used scripts generated from a v21c as examples and edited them where needed.

Because I'm still working with the list of generated scripts from DBCA-v21c (not the same version! hence use at own risk) I had to edit. In part-3, I eliminated a few obsolete items (intermedia was deprecated in 12c, and "removed" in 19c, how many years ago...). Then I consolidated everything up-to-but-not-including any PDB-creation in one Simplified script, script nr3. That worked well, resulting in 0 (zero) invalid objects. Check previous blogs for the details.

Note: I Hate long command-lines and complicated scripts, and I especially Hate hardcoded things in scripts (like 10s of repetitions of a file-path with a version-number in it). We have $ORACL_HOME and $ORACLE_BASE and $ORACLE_SID, and we have them For a Reason.
On top of that, editing is Risky Business: Typos! - Apologies if I repeat that too often. 



Finally, Creating the PDBs:

Most demos and examples show how to create the CDB and then add 1, single, PDB to it. I'm going to be a little more ambitions: I want 3 PDBs and by using OMF (Oracle Managed Files) and by avoiding FILE_NAME_CONVERT parameters, I don't want to worry about Anything on the file-system.

Now from the "generated" create-scripts from DBCA v21c, there are three scripts remaining. Time for another Eyeball Mark-1 Inspection, and let's see what we can do and maybe improve.

The v21c generated PDBCreation: 


Strangely, this one contains little relevant items. Fine. let's move on.

Next would be the generated plug_freepdb1: 


This actually creates a PDB (yesss). 
With the name "hardcoded" in the script, on no less then 6 lines, including in the file-name-convert. 
The spool-command seems a double (is fixable?). 
The select shows data- and temp-files (can go?).

And then, Voila, we get to the Create, and Open, and Register of the first real Pluggable Database. That looks deliciously Simple, and it even announced itself to the listener straight away. That is the good part.

Hmm, yeah. I can work with that. Will still try to improve on the script though.


Last script from the generated set was postPDBCreation_freepdb1: 


This one is funny. It tries to add a USERS-tablespace that already exists in the PDB$SEED (hence error!), with a hardcoded datafile-path *), hm.  
That USERS-tablespace is set as default for users (Good Idea). 
It then does a call to OPatch/datapatch (hardcoded 21c path and hardcoded names! what about using $ORACLE_HOME and $ORACLE_SID? ). 
And it does a number of reconnects and selects, checks to report things to screen or spoolfile.
It has _only_ 11 lines with the PDB-name hardcoded in them.
OK. All Seems "Fixable" to me. 

Some "good news" as well: apart from the OPatch call, these scripts are "self contained". There is no call to perl-catcon or other callout. 
All I would do is to simplify a little, for Readability + Portability. Maybe, use a DEFINE or some form of arg1 (or $1 or &1) to minimise use of hardcoded names and make the scripts more portable (I'm not obsessed,  not obsessed, ...)

After some more looking (on a larger screen), and some coffee, I concatenated the first two scripts and edited that into crdb4_pdb.sql:

(pre and code for crdb4_pdb) 
/* -------------------------
 crdb4_pdb.sql: create the first PDB(s), with minimal code + config..

notes:
 - in future: define the pdb-name(s) once or pass as args. 
  --------------------------- 
*/

-- get the passwords 
@accpws

SET VERIFY OFF
connect "SYS"/"&&sysPassword" as SYSDBA
set echo on

spool crdb4_pdb.log append

-- found out this "bootstrap" was needed. MOS note: 2538591.1
host $ORACLE_HOME/OPatch/datapatch --verbose
 
-- let's do multiple plugs straight away.
-- future: create 42 plugs in a pl/sql loop? would that work?

CREATE PLUGGABLE DATABASE FREEPDB1 ADMIN USER PDBADMIN IDENTIFIED BY "&&pdbAdminPassword" ;
CREATE PLUGGABLE DATABASE FREEPDB2 ADMIN USER PDBADMIN IDENTIFIED BY "&&pdbAdminPassword" ;
CREATE PLUGGABLE DATABASE ORCL     ADMIN USER PDBADMIN IDENTIFIED BY "&&pdbAdminPassword" ;

-- only see 2 of the 3 new ones, wonder why ...
select name from v$containers where upper(name) like 'FREEPDB%';

alter pluggable database FREEPDB1 open;
alter pluggable database FREEPDB2 open;
alter pluggable database ORCL     open;

show pdbs                

alter system register;   

prompt ---- End of crdb4_pdbs. Please check, properties, files, listener etc... -----

spool off

Here is what I hope to achieve:
There are no file-paths at all, I'm not going to mess with files if I can help it.
The remaining commands are readable (even on 80x25 terminal) and straightforward understandable, I hope.
On testing, I (re-)found that OPatch/datapatch needs a "bootstrap" (MOS note: 2538591.1, known issue...). I'm doing that before any further PDB-creations.
I then create and open the PDBs.. no big deal actually, and I register them with the listener. The select-checks and the can wait (for next script).

In my current script I am left with just 3 PDB names, which are "hardcoded" 2x each. That still makes 6 items to correct later. ;-(.

That user-tablespace, the selects and checks from the postPDBCreation, I'll isolate in a separate check-script, to be run per PDB. One day, I'm hoping for a generic, portable script to "add 1 PDB" to a CDB, in the simplest fashion, for now, I'm in a hurry: I want my PDBs, to observe and play with...

Bonus "feature": I _could_  try run these last script even from  just-a-client when connected to the CDB$ROOT, but it would mean ignoring the datapatch and editing every "connect". Too Lazy at the moment.  But good to know it can be done (yep, I checked, later, works fine, but ignoring datapatch, so maybe not recommended)


I'm impatient, Let's Do This:

If you are like me, you might check disk-usage on  "the server" and check the listener status, before and after. But was in a hurry...

The datapatch part went down fine:



And the creation + opening:


You can see the creation of 3 PDBs, with not drama at all. One of them has the "classic" name ORCL, just because I can. I do now wonder if I can't find a way to leave out the admin-user part as well. I also left the original "select" statement from the generated script (hint: something about a name "hardcoded" maybe).

We then witness the OPEN-ing of each PDB and if you run the script for yourself, the command "show pdbs" will display them, all of them this time. In the end there is also a "register" to let the listener know what was added.

If you go back and check your filesystem and listener: Creating will have taken up about 3G of diskspace, and the listener will show the new PDBs as services. 

Checking from a "client machine" to connect over the listener:


Of course, from a client machine, I can connect to any of the open PDBs.  Just to show how Easy.
 

Almost Done... 

We now have 3 PDBs and I can play with them. Nice. I got more or less what I wanted. We are Nearly There.

Just a little bit of housekeeping to do: remember that postPDBCreation.sql? It did set the default to USER-tablespace, did OPatch and did a number of selects. I am going to respect the same commands and same order of events, more or less. But I have edited that to become a script to be run per-PDB, with the PDB-name as argument (the &1 in the oracle sql-script world):

The script crdb4_postcheck.sql:

/* ---------------  

  crdb4_postcheck.sql pdb_name: set user-ts, do OPatch and run some selects

    arg1 : PDBNAME to check, e.g. FREEPDB1 or ORCLPDB2
   
notes: 
  - adapted from postPDBCreate_freepdb1.sql, as genrated by dbca-v21c.

   -----------------
*/

-- pick arg from cmdline.
define targetPDB=&1;

-- get the pwds, we need m
@accpws

-- stmnt and their sequence-order copied from generated postPDBCreate

SET VERIFY OFF
connect "SYS"/"&&sysPassword" as SYSDBA

spool crdb4_postchecks.log append

alter session set container="&&targetPDB";

prompt Adding + setting dflt tablespace USERS, If needed...

set echo on

CREATE SMALLFILE TABLESPACE USERS LOGGING  
DATAFILE  '/opt/oracle/oradata/FREE/freepdb1/users01.dbf' 
SIZE 25M REUSE AUTOEXTEND ON NEXT  20M MAXSIZE 1G   ;

ALTER DATABASE DEFAULT TABLESPACE USERS;

-- next, the call to OPatch, added OH and SID

host $ORACLE_HOME/OPatch/datapatch -skip_upgrade_check -db $ORACLE_SID -pdbs &&targetPDB 

-- queries, copied straight from genereated v21c original. 
-- not sure if essential

connect "SYS"/"&&sysPassword" as SYSDBA

-- expect TRUE
select property_value 
from database_properties 
where property_name='LOCAL_UNDO_ENABLED';

connect "SYS"/"&&sysPassword" as SYSDBA

alter session set container="&&targetPDB";

select TABLESPACE_NAME 
from cdb_tablespaces a,dba_pdbs b 
where a.con_id=b.con_id and UPPER(b.pdb_name)=UPPER('&&targetPDB');

connect "SYS"/"&&sysPassword" as SYSDBA

alter session set container="&&targetPDB";

set echo on

Select count(*)  nr_valid_DV
from dba_registry 
where comp_id = 'DV' 
and status='VALID';

-- go back to root.. why?, does exit straight after.. 
alter session set container=CDB$ROOT;

exit
And ... running that script, 3 times, Went Fine: the USER-tablespace came from the PDB$SEED, it was already there. If it is ever missing, the error will notify us. Setting the default tablespace is no problem, and running Opatch/datapatch works Fine. The last select, checking for DV (DataVault, installable per catols.sql) isnt present here because it wasnt included in the "generated scripts" for v21c. Something I could correct, but was too Lazy...

Plus, I can now Rapidly provide dev/test systems (mostly containers) and customise them to my hearts content... 

Also: (in scope, but this blog is too long already): Dropping a PDB including datafiles, gives you back the disk-space immediately, which I think is quite nice.

I've run most of my ppt-demo SQL scripts, and they work fine agains 23c! (I had not expected anything else). I can use this container in my next presentations and be one of the Uber-Hip crowd using v23c.  How Cool is That?


I would say: This is Success !

So in summary, I was able to create several PDBs, with minimal configuration and minimal typing-effort. Mostly due to the OMF parameters in my spfile and using as many defaults as I can.

My main "obstacle" was actually the "typing effort" in crdb3_com.sql: installing all the various components. And the testing and removal of typos to make the whole exercise look even more Simple.

I will now sagely, as a well behaved Oracle-DBA, wait for the actual GA (General Availability) of v23c.  A GA which hopefully still comes with a DBCA that can generate scripts that I can read and alter. 



Warning: writing this up took me much more time then anticipated. Actually creating the system was easy. But tweaking the scripts to be 0-error, and Portable, and chasing down some of the nitty-gritty details took waay too much time... Anyway, it was interesting enough to do it. Maybe it will be useful knowledge one day.

The List of "Always More ToDo"...

I wanted a quick-create, .. reading + investigating turned out 4 waaay to long blogposts... And ... There is always more. The list got so long it would merit a separate blog+discussion.. Here are some of the items you can think about, or maybe I'll do some writeups later.

Experiment with Read-Only Oracle-Home. Ensure maximum integrity of the software-tree. Separate Software from Data was always a good concept.

Experiment with mounted volume and then store the databases, and the admin- and the admin-output (diag!) on a mounted-volume. That could make the container much smaller still. (This would be my Prio if I was deploying via Containers...?)

Completeness.... I'd really like to see the scripts generated by v23c to make sure I got all the components and activities Correct: complete and in the right order etc... Up to now, this is just a hobby-project with a lot of Guesstimation involved.

Directory-names: I dont like the overly-long generated names for PDB-subdirs. I'll have to investigate the "simplest possibly" way to use FILE_NAME_CONVERT...

The UNDO=Local setting merits some investigation. Can that save me more effort+space? What is best/easiest, and when? For the moment, I'll just point you to Tim "Oracle-base" Hall for reading up on the topic.

Housekeeping, files: My "disk system" got full a few times during these experiments. There are MAX_SIZE clauses for data, audit and diag, and those Will be Useful. Alternatively: Check your subdirectories under the "diag" tree, and the output-log-dir for catcon. Especially when providing "dev-/test-containers" to developers, you dont want your system to run into "mysterious errors" due to disk-full problems.

Housekeeping, drop PDB: Demonstrate how dropping a PDB frees up disk-space, Automatically. Lazy-me is Really Happy with this (OMF-)feature. But dropping a PDB leaves an empty subdirectory. Not a  big problem, but I'd rather have that cleaned anyway. Requires another trick (script).

Script-beautifying: Evidently. Also, the original (generated) scripts do a lot of (re)connects and spools.. I can still make my scripts cleaner by removing seemingly useless "connect"  and housekeeping stmnts. (I like small, clean scripts...).

LockAccount: The crdb3_com.sql now contains a copy of the generated code. On close inspection: this is identical, repeating code. Make it a call to a script.

Make a single, better, crdb4-script: a Simple script, taking the PDB-Name as input, should be able to add (or later remove..) a PDB, including the OPatch and some of the checks.

The OPatch, needed a bootstrap. Doesnt seem like a critical problem to me, but would not discard it. Easily Fixable, MOS note: 2538591.1. Best add that to crdb3_com at some point (after CDB, but before 1st PDB).

Test a patch or an upgrade: This will Inevitable need to be done sometime...

On perl: I'm still not very happy with the perl-stuff as it is yet another component, and yet another dependency. The scripts have a hardcoded path to the oracle-provided perl-executable in them.  And another, different-version of perl is found in /usr/bin, which apparently the scripts don't trust or don't want you to use. It makes sense, but it is a bit cumbersome.


But in the end: Happy! 

At least I got my kick of creating a Database from Scratch using scripts. I got the learnings, now documented in a set of scripts. And my scripts seem Portable enough to be able to create a v23c DB on just about any (linux)platform (VMs, containers) even without the templates.

If anyone wants a copy: copy/paste from the blog seems to work, or drop me a message for the repository.

------------ end of part 4 -------------

Disclaimer etc. ... nah. 
OK. Find disclaimer at the bottom of the starting-post.

And remember what I wrote earlier: Dont Do This At Work (or... not yet).


*) Footnote: 
On the use of hardcoded-file-path-name for database-files (do I have to say this..?): How often did you end up with one or two files (crucial, needed datafiles) in a rogue location because of forgotten or badly-replaced filenames in re-used scripts ? Folks have overwritten files because of this, folks have ended up with "windows file-path-names" in the DBS-directory of a linux-systems bcse of this. (do I have to do a blog on this with bad-examples... )
Minimise hardocoded paths, please.


-------- the Real End of Part 4 ------- Maybe parts 5, 6 etc., but Later --------

Thursday, 10 August 2023

Create a Database v23c - Installing components Manually

TL;DR: I simplified the assistant-generated scripts from v21c, I learned a few things, and it all still worked. We are slowly Getting There  in v23c 

Spoiler: still a long-ish script, but it worked Fine.

Note: Aug 2023, still using the Free Developer Release of 23c.

Update: Oracle has, on 02-May-2024, announced to rename version 23c as 23ai. This series of blogs was researched and written with the early releases of v23, when the suffix was still "c" (for cloud), but I am too lazy to go and modify all the referneces to the new 23ai (artificial intelligence). 
As soon as time permits, I will repeat my "testing" on the new versions, but on first impression, it all works identically on 23ai.

Background:

Still on my quest of trying to create a database, manually-scripted, without using (recovering) "the template". But with minimal effort, minimal scripting, minimal typing (even minimal RTFM, which is never a safe idea...). In parts 1 and 2. Earlier I explained how+why I am using a set of scripts generated by the previous version, v21c.

It was more work than I expected. In the next stage I am going to get angry with "too many, too complicated and too-hardcoded scripts". But it seems the old-ish code Works Fine against the New Version. Legacy ;-)

Again: Keep in Mind that I am using the "Free Developer Release v23c". No guarantee this will work correctly, and nog guarantee this will still work when v23c comes out for General Availability.

I can't wait to get a G.A version and check if the new DBCA still agrees with my approach.

But... So far it is going Well..  I got to the stage where "Catalog" is created in the CDB and in the SEED with little problems. Here is how:


The components (list taken from v21c)

When I check the generated script for creation of a v21c database, it shows the following:


Mind the long-ish list of items after "CreateDBCatalog" (covered in part-2).  Those scripts install the "additional components" that the DBCA wanted me to install. That list looks familiar: that same list has been there since waay back. I guess I'd better run (most of) those scripts on v23c as well...

The first script, JServer.sql, looks like:


As you can see, similarly long commands as we have seen in the "CreateDBCatalog" in part-2, except that I already put in $ORACLE_HOME to simplify. And by opening the scripts one after another, they all look more or less similar (and equally hard to read on first attempt) Note: I Hate long command-lines and complicated scripts - apologies if I repeat that too often.

On first attempt, they ran fine after just editing the file-paths to adjust to 23c, Good sign. But I wanted to do a little more Inspection using Eyeball Mark-I on a large-screen (e.g. have a good look at the subject...). So I concatenated them all into one big script and viewed them on another editor, to better horizontally compare the various lines.

The following of the generated scripts were very similar: 
JServer.sql 
context.sql 
ordinst.sql and interMedia.sql (huh?)
cwmlite.sql (olap), 
spatial.sql 
... are all very much alike (but not quite).  They contain a lot of repeating, identical-looking calls with generally only two understandable differences: The -b <logname> and the end of the line the "script + options" are different. 

That makes sense, in multi-tenant, those components are installed via catcon.pl in both CDB$ROOT and PDB$SEED.


Some remarks on the scripts:

ordinst and interMedia: Deprecated, checked them anyway. Those are now just stubs, and can probably be removed from the list of scripts to run. So I deleted them from the calling-script.

CreateClusterDBViews: Doesn't use the -c (or -C) option, but is otherwise identical. Apparently, leaving out -c and -C runs the scripts against both CDB$ROOT and PDB$SEED. It runs both catclust and catfinal (which is effectively empty), thereby sort-of signalling the "End of Catalog stuff".

LockAccount: Does a stop/start of PDB$SEED and then runs a block of PL/SQL in both CDB$ROOT and PDB$SEED to presumably lock-down default accounts. OK. We can keep that code as is for the moment (would v23c contain more dflt-accounts, more items to lock or protect...?).

PostDBCreation: Generates the spfile and does the classic utlrp. OK, that should fix  the "invalid objects" we had after doing the catalog in part-2.


Then a bit of a surprise, good and bad: The last three scripts are: 

PDBCreation: Only contains a spool-append command to a logfile.

plug_freepdb1: Creates an Actual PDB (yesss, finally...), with the name "hardcoded" in the script, and  file-name-convert directives included. I'd like to tweak this stmnt to at least get rid of hardcoded names + paths.

postPDBCreation_freepdb1: Adds a user-tablespace with hardcoded datafile-path (hm), it does an OPatch, and does a number of selects to report things to spoolfile.

I am going to keep the PDB-related scripts (the last 3) for later, and first run the scripts to add all the (mandatory or recommended?) components to CDB and SEED. 


To Simplify...

Already I had them all concatenated into one, so I decide to consolidate the scripts up-to, but not including, any Create-PDB work. I'll keep everything in one single script, call it crdb3_comp(onents).sql. And I cannot resist to simplify it just a little.

I am also going to keep the spfile-creation under my own (manual, but scripted) control, putting a "comment" in front of that command, for the moment (yes, I understand the risk.. some script may want to set parameters to spfile/scope-both).

The idea for crdb3_comp.sql is born.. now just had to do a bit of editing+testing.

The main changes I made are reduction of hardcoded file-paths and smaller commands over-all, for readability and editability. I then removed a lot of lines that did just spool-append. I'm keeping everything the main logfile. There will also be the logfiles from catcon.pl which I directed to /tmp.

The resulting script, crdb3_comp.sql:

 /* -----------------
 
crdb3_comp.sql : install basic components before creating more pdbs

notes:
 - wasnt happy with the long commandlines, still not happy
 - after utlrp: 0 invalids. Good Sign.
 - why repeated connect? to verify un/pw ?, or Need for fresh session??
 - option -a is undocumented (Martin berx Berger traced: probably GUI/windows related)

  --------------- */ 

-- get passwords
@accpws

-- one main logfile for this script...
spool crdb3_comp 

-- Wanted shorter commands to Run CatCon.. Hence define rcc
DEFINE rcc="$ORACLE_HOME/perl/bin/perl $ORACLE_HOME/rdbms/admin/catcon.pl -n 1 -l /tmp -v "

prompt first JServer

SET VERIFY OFF
connect "SYS"/"&&sysPassword" as SYSDBA
set echo on

host &&rcc -b initjvm  -c 'PDB$SEED CDB$ROOT' -U "SYS"/"&&sysPassword" $ORACLE_HOME/javavm/install/initjvm.sql;
host &&rcc -b initxml  -c 'PDB$SEED CDB$ROOT' -U "SYS"/"&&sysPassword" $ORACLE_HOME/xdk/admin/initxml.sql;
host &&rcc -b xmlja    -c 'PDB$SEED CDB$ROOT' -U "SYS"/"&&sysPassword" $ORACLE_HOME/xdk/admin/xmlja.sql;
host &&rcc -b catjava  -c 'PDB$SEED CDB$ROOT' -U "SYS"/"&&sysPassword" $ORACLE_HOME/rdbms/admin/catjava.sql;

prompt next is context

-- keeping the connect, as each "script" originally did a fresh connection
connect "SYS"/"&&sysPassword" as SYSDBA

set echo on

host &&rcc -b catctx   -c 'PDB$SEED CDB$ROOT' -U "SYS"/"&&sysPassword" -a 1  $ORACLE_HOME/ctx/admin/catctx.sql 1Xbkfsdcdf1ggh_123 1SYSAUX 1TEMP 1LOCK;
host &&rcc -b dr0defin -c 'PDB$SEED CDB$ROOT' -U "SYS"/"&&sysPassword" -a 1  $ORACLE_HOME/ctx/admin/defaults/dr0defin.sql 1\"AMERICAN\";
host &&rcc -b dbmsxdbt -c 'PDB$SEED CDB$ROOT' -U "SYS"/"&&sysPassword" $ORACLE_HOME/rdbms/admin/dbmsxdbt.sql;

prompt ordinst and interMedia: stubbs removed, product no longer avaiable.
prompt next is cwmlite.sql

SET VERIFY OFF
connect "SYS"/"&&sysPassword" as SYSDBA
set echo on

host &&rcc   -b olap -c  'PDB$SEED CDB$ROOT'      -U "SYS"/"&&sysPassword" -a 1  $ORACLE_HOME/olap/admin/olap.sql 1SYSAUX 1TEMP;

prompt next is spatial.sql

SET VERIFY OFF
connect "SYS"/"&&sysPassword" as SYSDBA
set echo on

host &&rcc -b mdinst -c  'PDB$SEED CDB$ROOT'   -U "SYS"/"&&sysPassword" $ORACLE_HOME/md/admin/mdinst.sql;

prompt next is CreateClustDBViews.sql

SET VERIFY OFF
connect "SYS"/"&&sysPassword" as SYSDBA
set echo on

host &&rcc -b catclust  -U "SYS"/"&&sysPassword" $ORACLE_HOME/rdbms/admin/catclust.sql;
host &&rcc -b catfinal  -U "SYS"/"&&sysPassword" $ORACLE_HOME/rdbms/admin/catfinal.sql;

connect "SYS"/"&&sysPassword" as SYSDBA
set echo on

prompt next is lockAccount.sql

SET VERIFY OFF
connect "SYS"/"&&sysPassword" as SYSDBA

set echo on

alter session set "_oracle_script"=true;
alter pluggable database pdb$seed close;
alter pluggable database pdb$seed open;

BEGIN 
 FOR item IN ( SELECT USERNAME, AUTHENTICATION_TYPE FROM DBA_USERS WHERE ACCOUNT_STATUS IN ('OPEN', 'LOCKED', 'EXPIRED') AND USERNAME NOT IN ( 
'SYS','SYSTEM','SYSRAC','XS$NULL') ) 
 LOOP 
IF item.AUTHENTICATION_TYPE='PASSWORD' THEN
  dbms_output.put_line('Locking and Expiring: ' || item.USERNAME); 
  execute immediate 'alter user ' ||
 	 sys.dbms_assert.enquote_name(
 	 sys.dbms_assert.schema_name(
 	 item.USERNAME),false) || ' password expire account lock' ;
 ELSE 
  dbms_output.put_line('Locking: ' || item.USERNAME); 
  execute immediate 'alter user ' ||
 	 sys.dbms_assert.enquote_name(
 	 sys.dbms_assert.schema_name(
 	 item.USERNAME),false) || ' account lock' ;
 END IF;
 END LOOP;
END;
/

alter session set container=pdb$seed;

BEGIN 
 FOR item IN ( SELECT USERNAME, AUTHENTICATION_TYPE FROM DBA_USERS WHERE ACCOUNT_STATUS IN ('OPEN', 'LOCKED', 'EXPIRED') AND USERNAME NOT IN ( 
'SYS','SYSTEM','SYSRAC','XS$NULL') ) 
 LOOP 
IF item.AUTHENTICATION_TYPE='PASSWORD' THEN
  dbms_output.put_line('Locking and Expiring: ' || item.USERNAME); 
  execute immediate 'alter user ' ||
 	 sys.dbms_assert.enquote_name(
 	 sys.dbms_assert.schema_name(
 	 item.USERNAME),false) || ' password expire account lock' ;
 ELSE 
  dbms_output.put_line('Locking: ' || item.USERNAME); 
  execute immediate 'alter user ' ||
 	 sys.dbms_assert.enquote_name(
 	 sys.dbms_assert.schema_name(
 	 item.USERNAME),false) || ' account lock' ;
 END IF;
 END LOOP;
END;
/

alter session set container=cdb$root;

prompt next is postDBCreation.sql

SET VERIFY OFF

host $ORACLE_HOME/OPatch/datapatch -skip_upgrade_check -db FREE;

connect "SYS"/"&&sysPassword" as SYSDBA

prompt note on spfile: I prefer to do my spfile manually, 
prompt and at the moment in dflt location
prompt I might  consider moving it out of OH later.
--  create spfile='/opt/oracle/dbs/spfilefree.ora' FROM pfile='/opt/oracle/admin/free/scripts/init.ora';

connect "SYS"/"&&sysPassword" as SYSDBA

host &&rcc -b utlrp  -U "SYS"/"&&sysPassword" $ORACLE_HOME/rdbms/admin/utlrp.sql;

select comp_id, status from dba_registry;

shutdown immediate;

connect "SYS"/"&&sysPassword" as SYSDBA

startup ;

spool off

prompt Done crdb3_comp. Please Verify: errors, invalids objs etc...
  
That is the script I did the actually run with.
(feel free to copy, I checked: Chrome, Firefox and Opera.. allow copy-paste. One day I'll get around to using proper code-windows in blogger..)

Now... Guess how many invalid-objects? .... None in the CDB.

And in the PDB$SEED, where we still had 600+ Invalid objects after stage 2... :


Really Happy so far. Off course, I have no idea if v23c has or will introduced additional components or scripts to run at Create-Time. I havn't even checked the fairly long page of RTFM in all its details on this yet....

First, I want my Databases! 
With Maximum-Lazyness-Default, minimal effort and Minimal-scripting (but Readable and Portable scripts, Please...).


So far So Good.

All in all, it looks like the generated scripts from v21c are still working. This combination of CDB + Seed should now be ready for PDB-creation. I will of course try to minimalise and simplify that create-process, just a little.

Next blog will describe the creation + running of crdb4_pdb.sql in which I finally manage to create 2 PDBs. It worked. But I wasn't quite happy with the result, yet.

------- End of Part 3 ------

Disclaimer: Don't Try this at Work.

But if you seriously plan to try this at work: Read The Disclaimer underneath Part 1.

PS: Special Thanks to Martin "berx" Berger for digging down the catcon.pl -a option. 

----- Real End of Part 3 ----