Friday 16 November 2007

Databases and Systems, what kind of Relation.

Over the years, I’ve had several interesting discussion on the relationship between “databases” and “systems”. It comes down to this: Should there be many databases on a single system (m:1, the traditional approach), or should there be many systems underneath one database (1:m, grid)? And when is 1:1 appropriate?

Current hardware is powerful enough to allow “supernodes” that can run hundreds of databases. We are also confronted with Virtualization or “carve-up” of hardware by way of XEN, VMware, or vendor specific product that create domains or partitions.
With the current push towards “virtualization” of systems and the (in)capabilities of Oracle, it may just be worth to re-start some of the n:m discussion (did it ever go away?).

It is time to take a position on the many-to-many relationship between databases and systems.

This very story can also be found is actually the text version of my presentation "Databases Everywhere", and can be found from various UKOUG and other sites. (include a link...)

To summarize it for those architects and (account-)managers who are in a hurry:
I am in favor of 1:1 wherever possible, I support 1:m (RAC, grid) if really, really needed (but just please think about it one more time). RAC is a wonderful piece of technology that can serve many other vendors as a good example. It can be made to work. I will try to indicate under which conditions I think RAC can or cannot be applied.

Finally, I will only tolerate the old-fashioned m:1 for non-critical situations or on systems where there is some sort of risk-mitigation against interference between the (instances of) multiple databases.

After explaining these positions, I also have a list of recommendations for customers, providers and even for the Big-Oracle itself.

Those of you interested: read on. Possibly prove me wrong.
Others, keep browsing, the truth is out there, whichever version or vendor you want to see.

For the record: the bottom of the text contains links to all our major “partners”. Please keep the invites coming.

Intro, and some definitions.

On conventional systems, we generally find 1 or more databases running on a single system (*nix or even windows). For example, for concept-testing of a DG setup with cascaded standby, my laptop has run 5 databases simultaneously. Slow, but running. It just takes loads of memory and careful (memory-) parameter setting. It illustrates the capabilities of the Oracle database that the concept, once proven on a laptop could then be used to clone a 3TB production system (this time not on Windows). But the laptop Proof-of-Concept also illustrated an important issue when running multiple databases on a single system: Contention. The databases were visibly (and audibly from disk and fan) competing for IO and CPU resource.


NB: Can’t resist to put one in for Deb and Nigel: Let the world know that Deb got more memory into her laptop than Nigel, but then both David and I still have more memory in our laptops then the two of you together, so there ;-). Size matters.
Techies rule.
Back to serious business.



With the current possibilities for virtualization, you can take hardware and split it into many “systems” using VMware, OracleVM, any other Xen or vendor-specific tools for domains, lpars, etc..

Each resulting system can then be used to run instance(s) of 1 or more databases. When is this useful and how far should we take this?
With Oracle RAC (GRID, anyone?) you can take a database and distribute its instances over many system. When does this add benefits?

First a few quick definitions to delimit the playing field. Please not that the definitions are for the purpose of this argument only. They don’t pretend to be scientific or final. Or even to be correct.

System: a running *nix (or win*) instance containing a process list and an amount of addressable memory. On short: a running instance of an “operating system”.
A system can be Virtual if the hardware seen by the operating system is not identical to the actual underlying kit. This is the case when a larger system is split into “virtual” units by use of Xen, VMware or some vendor-specific layer of software or firmware. Some of those can also be modified dynamically (e.g.: Rolling). Some definitions and a good description can be found here:
http://encyclopedia.thefreedictionary.com/virtualization.

Database: an (Oracle) database, containing one system tablespace, and one user called SYS (I’m still trying to find the “essence” of an Oracle database, how about the sys.obj$ entries?). Note that the use of DataGuard can mystify the definition of a “database”, because each DG-clone can represent “the database”. The actual Single Point of Truth (SPoT) is where the current “primary instantiation” of the database resides.

Also note that my definitions do not include the binaries, or the ORACLE_HOME, as part of the database or the system. Indeed, systems and databases can be used in situations where the software needed to run them is “shared”. Most system only need a few “system specific” files in /var and /etc. I will always point out VMS as the ultimate mother of all clustered systems whereby files are shared between multiple nodes. But that requires a Clustered File System (CFS), and that opens a different discussion altogether. Suffice to say that a CFS is very suitable to ensure that all machines can be connected to the same, identical software and are guaranteed to run the same version of the binaries.

Now let me briefly elaborate on the different options.

Conventional deployment - many:1.

On conventional systems, we often see many databases running on a single (unix) instace.

The DBA can look after the databases, and the unix administrators have only one entity to watch. Any collision between databases will have to be handled by the DBA. Note that in most of these conventional cases all databases share the same software-tree (multi databases running from the same oracle_home).
These systems tend to have a relaxed SLA. The utilization varies and sometimes we see high percentages of CPU or IO bandwidth being consumed by a single database.

The main disadvantage of conventional systems is the fact that there is only one system, and all databases meet there. A problem with either a single database or with the system itself can quickly contaminate all databases on the system. And upgrades of system- or oracle-software lead to simultaneous outages of all databases on the system.

In case of system- or hardware failures many databases must be recovered simultaneously or re-started on one or more (other) systems (requiring some prioritization). The simultaneous recovery of many databases may lead to a brief period of overload and/or chaos an possibly a domino effect on other systems or components.

The advantages of an m:1 configuration are the simplicity of a “single system”, which is easy on the system-admin, and the often cheaper license structure when using per-system license.

Simple and Robust 1:1

On system with a high load or a stringent SLA, we tend to see 1:1 relationships: a single database, whereby the sole and single instance of the database runs in a single unix system. This is sometimes referred to as the monogamous configuration.
By having the whole system to itself, the database can benefit from all the resources available. There is no interaction (disturbance) from adjacent instances or other processes. Determining parameters is relatively simple.

However, as Oracle (-sales) will point out, this 1:1 configuration generally means the system is grossly under-utilized. It also means the database can still suffer from unix- or hardware failures.
But I like the simplicity of this configuration, I think this is the most “robust” solution and is applicable to the majority of databases. Whenever a problem occurs on one of the systems, only one database is affected, and recovery-efforts can be concentrated on a single database and application, reducing the risk of a domino-effect.

The 1:1 situation also lends itself very well to hardware-clustering or “cold failover” whereby a database is re-started on another system (node) in case the underlying system or hardware fails. Only one database needs to be re-started or recovered.

Since a 1:1 configuration requires many “systems”, it is attractive to use server virtualization. By running multiple “virtual” systems on a single piece of hardware, you can quickly create the required number of “separate” or isolated systems. When doing this, keep in mind that the underlying hardware remains the single point of failure. When one or more virtual system are meant to replace one-another in case of failure, they should preferably run on separated hardware.

Real Application Clusters, RAC - 1 : many

As a techie, I like the technology behind RAC. It is a wonderful thing to play with and I like the challenge to master this thoroughbred in real-life system. But I have to be careful not to be running a “solution looking for a problem”.

We tend to see RAC databases in organizations with formal and very stringent SLA’s and with the budget and the resources to try and meet these requirements.

OPS and RAC eliminate the SPoF of the “system” and deploy the database over multiple systems. Theoretically this works nicely and even provides dynamic provisioning of system resources (you can utilize all available kit, and you can add more kit as needed). In practice, many have pointed out the relative complex setup, the high price, and the other shortcomings of RAC (link to Miraculous, Famous Danish company).

We have indeed seen successful deployments of multi-instance database on some very large kit. In some cases, the impressive amount of hardware was able to hide badly designed application-code for quite some time. And by constantly distributing the (mainly CPU-) workload over the available unix systems, the coders got away with some appallingly inefficient constructions. Some of these cases have demonstrated the viability (and sometimes vulnerability) of RAC quite nicely, although a better design or implementation might have been cheaper (I prefer brains over iron, always!, in case of doubt: reduce the size of the hardware and tell the IT crowd to JFDI).
Note however, that it is important to let the hardware boys and vendors it their way a bit too. Riding in some extra hardware makes them happy, and is good for our relationship with these vendors. They might invite us on future projects.

And to pour further praise on the Oracle techies, TAF has saved our systems several times when a node got in trouble and died. Database-nodes die mostly through software errors, core dumps or memory leaks, and sometimes through human errors. The underlying hardware is rarely a problem, as these high-end-hardware systems are built to keep running run even if a salty ocean wave runs through the lower floors of the building : unintentionally (Kathryn, are you reading this?).
Please remark that even when you have RAC and TAF-capabilities, you still need to code your application to correctly trap and handle the failover-events. Otherwise, it only works on “idle” connections.

Note that most databases can (be made to) run on RAC, provided there are no obvious bottlenecks such as an ordered-no-cache sequence or some very hot blocks with running-totals. And those bottlenecks can generally be un-designed.
NB: Oracle currently seems to have the following position on RAC: if it doesn’t run (or scale) on RAC, your design is wrong.

Suffice to say that where Very Fast Failover (TAF or FCF/FAN) are needed, RAC has no equal. And for systems that have extreme hardware (CPU) requirements, the RAC scale-out model is also beneficial.

Question : Now what to choose ?

The classic answer: It Depends.
However, I will try to provide some guidance and some opinions.

The only factor that really matters is “The Business” (duh).
What does your business need and what can it afford. For simple or undemanding SLAs the traditional m:1 configuration is often sufficient and cost-effective. For businesses that have more stringent demands or for providers that risk being sued by their (business-)customers over a broken SLA, a 1:1 is advised, possibly with some cold-failover mechanism. And finally, if you really need the additional 10 minutes, or if you need the scale-out features, AND if you can afford the resources for testing, training and ongoing maintenance, a 1:m configuration (aka RAC or grid) can be your choice.

To figure out what your business needs (and can afford), you can either think for yourself, or you can give yourself and your department more credibility by engaging (principle-, business-) consultants to do cost-benefit analysis, risk-assessments or FEMA (that is: Failure Escalation Mode Analysis, not the other FEMA). They will especially stress the business-cost of downtime and any related loss of data/productivity/customers/orders/MegaWatts/ That is enough FUD for the moment. Back to more practical matters.

On the practical level, there are some factors that come into play. There is a) the preference (eeh: dictate) of your system administrators, your SAN engineers, your ASP, or your hosting provider. Then there is b) your commercial relationship with Oracle which will determine how high your licensing cost will turn out. But there may be others, such as c) the capabilities and preferences of your DBA. We will not even go into items like d) the availability of test-systems to prove and maintain your architecture, or e) the available rack-space.

The first important factor that often comes into play is the preference (or the pricing-policy) of the system-admin team or the Hosting-Organization (the ASP). Are they capable of handling many systems or do they prefer a low number of unixes ? Can they quickly build and clone systems for provisioning? What price do your ASP’s charge for additional systems ? This may determine your capability to run “multiple” systems.
Some organizations, by choice or by force, still get away with running just 1 large unix box with everything on it: HR, CRM, Logistics, and sometimes they even have their dev/test/uat environments on the very same box. Feasible, but with most of the drawback of a m:1 configuration.

The next factor is often License cost. How is your relation with Oracle, commercially? If you have to pay list-price, you will want to stick with “conventional”. Here, Oracle shoots itself in the foot: a lower price on RAC would speed up acceptance of the RAC and GRID model.
Machiavelli did suspect this was done to buy sufficient Beta-time to find all quirks, possibly to find a solution for instance-affinity and to give customers the time to come up with a solution for the friendly delivered “Your Design is Wrong” consultancy-audit outcomes.

And last but not least, what do your DBA’s prefer, and how trained and comfortable are they with RAC/grid? The traditional choice, m:1, despite its disadvantages (contention, domino-effects) is still the easiest to maintain for a DBA. Choosing a 1:1 configuration brings on a slightly higher workload, but has the advantage of more robustness and easier, isolated, troubleshooting since databases and systems do not affect one-another when trouble or maintenance occurs (yes, yes, someone must shout “utilization” now, thank you).

The choice for a RAC or grid configuration tends to create a significant overhead. We politely disagree with Oracle at this point that the new grid-control alleviates all problems.
And even if GC and its agents do try to take away a lot of the routine-tasks, Knowledge and Experience can never be completely replaced by a GUI. This aspect, the Human- or Operator dimension, tends to be the most under-estimated factor when (prematurely) implementing RAC/Grid.

Recommendations:

For customers and end-user businesses:
Move carefully from m:1 to 1:1. The 1:1 configuraion is at this moment arguably the most robust way to run a database. Consider using virtual systems to support a 1:1 deployment, but beware of the possible contention and SPoF on the underlying physical layers. Move on to 1:m (RAC) for cases with specific needs (failover or scale-out). Only use RAC in cases where you must, but then don’t hesitate to use it. It can (be made to) work, and it will work (eventually), and you may have to learn these tricks eventually. Start on the first valid occasion.

For ASP and hosting organizations:
Learn how to handle clusters, clones and virtual systems. These tools will give you an edge in flexibility. Then offer your customers the possibility to host many _identical_ systems at a rebate. Your customers will buy more as 1:1 and 1:m systems proliferate, and in the long run you will benefit. Hosting Companies and some vendor pricing-policies are the largest obstacle when moving from m:1 to 1:1 or even on to 1:m. Innovative customers will try to move to cheaper and more flexible platforms, and even lagging customers will eventually follow. If the hosting provider can quickly “provision” at acceptable cost, he can be seen as a partner in commoditization, rather then as an obstacle to flexibility.

For System-admins:
Learn how to handle a multitude of systems, learn how to keep them in sync, and how to clone or (re-)build systems quickly.
NB: for the addicts: investigate the use of Clustered File Systems (CFS).

For DBA’s and system-admins:
Aim to deploy databases and systems in a 1:1 fashion. The “isolation” of each database and system greatly facilitates admin- and troubleshooting activities.
Also get used to replicating or sharing software through OUI or other mechanisms.
And if possible, start to work with a CFS.
Sharing storage at the “filesystem” level can facilitate the juggling of multiple systems very much. Even NFS (supported, but not recommended is a usable alternative.
A CFS can offer great advantages by sharing files across nodes and this can simplify software-deployment and distribution. You will always need two copies of software for redundancy-purposes, but please think before making he 3rd, 4th or 42nd copy. Sharing is better the copying, especially at high numbers. It is easier to manage a small number of shared oracle_home trees then to have 42 or more copies that need to be rsynced or otherwise kept identical.

For Oracle:
It is appalling to find that a grid (aka an OPS database) was easier to build and maintain under VMS or Tru64 with Oracle 8174 then it is with the current 10g versions.
Please pursue the development of OCFS and facilitate shared-binary installs on OCFS and other CFS platforms. This will help proliferation of your GRID strategy, and will get you more market-share and revenue in the long run. The oracle-inventory mechanism and the configuration of agents tend to make life difficult for deployment of shared-binaries. Any viable grid should IMHO include the shared use of a software tree and not depend on endless replication of executables.

Shutdown (normal)

By adopting a grid-strategy, Oracle has greatly increased the options for its customers. And for those of you who don’t know it yet: find the Oracle Sponsored GRID-INDEX. Various hard- and software vendors have added to the palette of choices by implementing their own versions of Grid, clusters, or virtualization. All this new (eh, apologies: innovative but proven) technology can be put to good use, but only when making the correct choices.

We hope the preceding information can offer some help, and we would like to close down by adding one more item of advice: try to base you choices on simplicity

Relevant links

The usual brownie-point partner-links and marketing buzzwords on grid etc :

http://www.oracle.com/global/nl/corporate/press/200634.html
http://www.oracle.com/global/eu/pressroom/emeagridreport4.pdf
http://www.hp.com/techservers/grid/index.html
http://www.ibm.com
http://www.sun.com

More on server-virtualisation (start here!):
http://encyclopedia.thefreedictionary.com/virtualization

An introduction to Clustered File Systems:
http://www.beedub.com/clusterfs.html
http://www.oracle.com/technology/pub/articles/calish_filesys2.html

Some good, albeit biassed arguments for CFS can also be found here:
http://www.polyserver.com (look for articles by Kevin Closson).

A classic on RAC:
http://www.miracleas.dk/WritingsFromMogens/YouProbablyDontNeedRACUSVersion.pdf

Just for fun:
http://www.userfriendly.org
http://www.dilbert.com

Last bootnote for all Alliance Managers and other people in control of party-invites: take note of my cross-links to our “powerful partners”. And don’t worry, since you didn’t bother to read all of the text, neither did the real decision-makers, hence no damage is done. Oh, and please keep me on the invite-lists, I still appreciate good food and quality entertainment.

Now shoot me (over lunch?).