Monday, 6 October 2008

Backup and Recovery, at what level

Yes, this is Yet Another Replication Discussion (YARD).

Indeed, this beast seems to pop up all the time. Today’s trigger was a discussion with some architects on Disaster Recovery in a SOA environment. I’ll try to give it a nice twist, and I will also keep my lunch-budget in mind.

To the serious reader, this article is actually a DataGuard plug.
But with all the buzzwords that spin around these days, this article can be seen as Art: It makes you laugh, it makes you cry, it makes you THINK.
And maybe those SOA/BEPL types do have some good ideas sometimes.

Marketing Tagline:
With SOA and ESB, the DR-replication can move upwards in the technology stack. (and whatever the flavor of the month is: Grid, anyone ? Cloud computing ? The sky is the limit!).

On an even lighter note:
You should know the difference between an architect and a terrorist. It is at the bottom of this page.

(Health Warning: many buzzword-abbreviations ahead)

In most cases, I will advise anyone with HA (high availability) or DR (Disaster Recovery) plans to first consider a DG (DataGuard) construction. Replication of your data to a remote site offers many (documented and repeated) advantages:

DG offers you an off-site backup (but you should still consider tape or disk backups!)

DG gives you reporting on read-only-opened database (even better in 11g real time...)

DG allows you to use the 2nd or 3rd site/system to do maintenance or upgrades (but be careful, you may want 3 or even 4 system in that case, but you can tick some very fancy uptimes in your SLA box).

As an alternative to DG, most organizations will consider using SAN technology for replication. A surprising number of organizations seem to mix the two to complicate life.

Sideline: Some are even dreaming over sales-brochures (and some are entertained at seminars) on “stretched” clusters. When it comes to Oracle databases and RAC, these stretched-beasts are worth a separate story and a very carefully worded one indeed: I can’t kick powerful vendors/partners/colleagues, but I can’t put customers at risk either (?). Maybe Later. I have learned the hard way to Never criticize powerful vendors; it limits your lunch-invites. See bottom of page for lunch-items.

First on DataGuard

Despite my own confidence in Physical Standby (and current limitations of LogicalStandby), Oracle seems to be moving towards Logical (SQL-apply) rather then Physical (Redo-apply). Because of the lesser ties between primary and replicas, “Logical” will offer more possibilities for maintenance and management. The logical replicas do not need to be physically compatible and can span different versions. A trend we can see with open-source databases as well: logical replication is conceptually more powerful then the redo-apply mechanism originally used in Oracle Standby.Dataguard would merit a whole blog all by itself, but I have a different point to make today. I’ll repeat what I said before: DG is a very good start to improve availability of your system.

Now on SAN replication

The other route chosen by many organizations is to “outsource” the replication to the SAN. The SAN system will take care of block level replication of all storage. Attractive, often simple to set up (just shout at the SAN engineers, they will click and drag until it works), and heavily advertised (and paid for). SAN replication is especially attractive if you have to replicate more then just your Oracle data, as is often the case. The main downside of SAN replication is the complete dependency on the SAN technology, the license cost, and the higher volume of data to be transferred (when comparing to Logical or Physical DataGuard). SAN replication works if you have sufficient bandwidth, low latency and capable storage engineers (not to mention the friendly smiling vendor).

SAN replication and DG replication, if applied properly, can both be very powerful mechanisms. I recommend choosing the one you are familiar with (or get the best license-deal on) and stick with it. I would not recommend using both SAN and DG in an inter-dependent mix (a surprising nr of databases are indeed replicated by both concepts, and that tends to add to the confusion in case of recovery).

For Oracle/RAC on stretched clusters: suffice say that you shouldn’t pay for this yet, but if the (hardware) vendors wants to demonstrate this at fancy locations/parties: do encourage them to invite you. The prices of this technology warrant a Business class ticket (with spouse, and partner-program organized).

Coming to the point: The next alternative.

We now have a 3rd alternative: SOA-layer replication. To most of you this may be old news, or indeed an old concept re-worded: you are correct. Read on. And tell me how I got it all wrong when I became entangled in the latest hype without really understanding the concepts (or even knowing the jargon).

Stack-wise, I would situate SAN replication at the bottom of the technology stack (deep down in the infra-structure layer). And I would place DataGuard somewhat higher up in the stack (closer to the app-layer). Hold this layer-idea for a moment.

Enter stage: the Service Oriented Architecture (SOA) with the concept of an Enterprise Service Bus (ESB). Apart from the number of new abbreviations, what really struck me on SOA were the concepts of “above” and “below” the Layer (anyone read Tom Clancy in the nineties?). For some architects, the database is just a “service” that takes care of persistent storage. They are right (they are, after all architects). Google: “Database + ACID”, possibly add “Codd + rules + 13” for some very basic stuff on databases. They are correct; ACID-ity is the essence of a database.

Now think of replication for DR purposes as a means to store things in multiple locations, and possibly as a means to become technology-independent. And imagine a SOA request to store data to be sent not to a single service, but to two or more services. Borrowing from the DG concept: as long as more then 2 services have acknowledged storage we can consider that the data is safe.

The next level of "Replication", and one that the SOA crowd will understand!

Following this line of thought, the storage layer becomes Yet Another form of the Grid Concept (YAGC?) or Yet Another Computing Cloud (YACC!). And the calling process doesn’t care anymore where the data is stored. Just as your GmailFS file system doesn’t know or care where the data is stored. It knows data is stored, and that is sufficient. (nerds and architects alike: take note of GmailFS, I’m really curious: is Gmail the ultimate ESB?).

Like many architects that refuse to look “below the layer”, we can now state that we have a concept (eeh, a resource?) for replication and that this DR pattern is “Pure SOA” and “fully ESB compliant”. It even relates to “GRID” and will thus gain us brownie points in the Grid-Indexes of various partners (more links below).

Myself, I will of course stick with Physical Standby for the moment. But it doesn’t hurt to explore the possibilities. A manager with some spare budget might even turn it into a seminar (with lunch, I would presume).

The Relevant Links are

(SQL> select url from relvant_stuff order by credibility desc;):

Dataguard and Oracle high availability (valuable stuff here):

Gmail fs (think about this one):

SOA and ESB (…):

More on Grid hype (…):

Note for all Alliance Managers: take note of my cross-links to our “powerful partners”, and please keep me on the invite-lists, I do appreciate quality food, I’ll even tolerate the mail and spam for it.

In the pipe-cleaning department or “mopping up loose ends”, I owe you these:

Lunch: There is no such thing as a free lunch (unless you are the lunch).

Architects: The difference is that you can (sometimes) negotiate with the terrorists.

Now shoot me (over lunch?).