Simple Oracle Dba: databases

Showing posts with label databases. Show all posts

Thursday, 2 October 2014

Riding to SIOUG and HROUG

Usergroups are the lifeblood of a good community. And I like to help a little, but visiting or speaking if I can.

Hence I will be speaking at SIOUG.si and HROUG.hr usergroup conferences together with some some old friends and lots in interesting new people that I look forward to meeting.

Check the links to the conferences here. SIOUG is in Lubljana, the lovely capital of Slovenia:

And HROUG is in Rovinj, on the Adriatic coast:

This trip will not be as long as expedition I did last summer, visiting 3 conferences, doing 12000km including some classic mountain passes. But I still look forward to riding a lot of interesting roads on
the motorcycle.

The previous trip was a Great Success. So let's do it again.

Here is the approximate route-plan for the coming weeks:

If anyone who lives along the route wants a quick consulting visit or simply has good coffee, drop me a note. I always like to stop by hospitable folks (and I may need to dry some clothes if the weather isn't cooperating)

In case of moderate weather, I will have to use the toll-roads and tunnels like everybody else, but if the weather is good, I will search out some real motorcycle roads !

Best thing I ever did: re-start my motorcycle-life after so many years!

Saturday, 6 March 2010

multible blocksizes in an Oracle Database

In short: Not useful.

More elaborated: Messy, waste of memory-space and admin-effort.

Let me explain:

I've come across this discussion again and I considered myself lucky that Charles Hooper has done all the research already.

I'm summarizing his findings here:

Multiple Blocksizes do NOT offer Any Proven Advantage.

The theory about more efficient indexes and better managed cache is good, and I dont deny there is good reasoning behind it. But in practice, having multiple blocksizes and multiple db-nK-caches doesnt make a difference.

It is a waste of the extra (little bit of) work.

And most likely, you end up wasteing cache-space because you hard-divide the cache space into chunks and you prevent the Oracle LRU-mechanism from utilizing all available cache as it sees fit.

Of course, you may be the exception that proves the rule, and I would like to hear from you in that case, but until further notice, I'll stick with my motto:

Simple = Better.

Saturday, 6 February 2010

sqldeveloper - the swiss knife for Oracle.

SQL Developer is the Swiss Army Knife for Oracle Developers (and DBAs and other interested parties, for that matter)

This post is about SQLDeveloper and about "The Book" that introduces it.

In short: Recommended. The book and the tool.

Recommended for Newbies and experienced folk alike.

Recommended for Developers, DBA's and other users of the database.

Let me explain why.

Manuals are sooo last century.
Remember the boxes that came with Oracle7 and Oracle8/i?
(looks for picture of pallet)

Then in Oracle 9, there was a CD with .pdfs. I've seen people consume a tree or two on that, and I'm guilty myself of selectively printing chapters to study on long-distance trains and put yellow-stickies on. Laptop batteries were brief in those days.

I would be interested to know how many ppl dont carry books at all anymore, just .pdfs on laptops, e-readers or ipads [jokelink]. I have switched to laptop-reading for most manuals, but not for leasure-books. And the person who used to say: "From their bookshelf, I recognize the type of DBA/Developer" will have to flip through his customer's laptop or ipad to do an assesment"

Normally, I hestitate to recommend books to developers (or anyone) because there is a lot of rambling rubbish out there. Books tend to follow the hobby-horse of the author (at best), books may contains outdated information (a lot, given the pace of Oracle 7-8-9-10-11, the opt_cost_adj=120 and the go_faster=true|false come to mind). At worst, the information in books can be rampantly wrong. Notably Jonathan Lewis points out that you should be critical of information on the internet, the same caveat applies to the bookshop: Dont believe everything that is printed.

Oracle SQL Developer by Sue Harper, another link to the publisher ...

But the SQLDeveloper book by Sue Harper is one of the Good Exceptions.

This is how a good introductory book should be.
It simply presents the tool and all its possibilities.
It doesnt impose a methodology.
It doesnt try to convince you of anything.

Like any truly good book, it allows you to do your own thinking.

And it does a very good job of introducing the tool and its many possible uses.
Check it out, and decide for yourself. You might be surprised.

To get an impression: you can read the first chapter Here at the publishers website.

To the Newbie developer it is a good introduction, to the more seasoned SQL-Plus user (or users of other tools, for that matter) it is a good reference.

As a die-hard command-liner, I will of course stick to my #$% prompts, but for those of you who have to be more productive, or who are forced off other tools for cost-cutting reasons this book can help you on your road to fortune and glory.

I'd also like to mention that there is a trend: Sue is not the first oracle Product Manager to write a book, and I had already enjoyed the writing of Larry Carpenter who runs the team that nurtured DataGuard into bloom. I can heartily recommend his book as well. I have recently used that too, to convince Customers/managers/leaders/victims/architects of the benefits of Dataguard (and soon to come: GoldenGate, running on a system near you). Larrys book was reviewd by Harold "Prutser" van Brederode

A tip of the hat and a big "Thank You" to both authors for going through the process of writing those books!

Friday, 30 October 2009

Oracle Performance Tuning with hit ratios - Risky.

The Ratio, as drawn by Leonardo da Vinci.

While he was sitting in front of a fireplace somewhere in New England, Martin Widlake prompted this quicky-post.

First my Confession: I still use "Ratios" in assessing and fixing performance.

I know current wisdom is that ratios and aggregated numbers are not good enough and should not be used in tuning. I agree that ratios can be abused, and can be hugely decepive.

But when guestimating the scalability or performance of an OLTP database-application, I use these three numbers:

Buffer-gets per row, 10 or less.
Buffer-gets per execute, 100 or less.
Buffer-gets per transaction, 1000 or less.

Martin will immediatly see why, IMHO, his result of 380 for the 1st or 2nd ratio was too high.

I picked those numbers for several, Simple,, reasons.

Firstly, these numbers are simple to remember.

Secondly, these numbers can simply be determined. You can get them from v$-views, from traces, from AWR (Licensed), or from statspack (free!).

Finally, these numbers (10, 100, 1000) have generally served me well in places where I looked into heavily loaded OLTP systems. I dont recall a single problem with any statement or transaction that managed to stay within those limits.

Now be careful. Your situation will be different. And you should not strive for these numbers per-se, but rather for happy customers. And whether these numbers really are appropriate for you depends on the number of calls you do, the type of work your users/systems do, on the capacity of your hardware, and the demands placed on your system. Some systems operate comfortable with one zero added to each number.

But if you need millisecond response times, or need to serve thousands of requests per minute, then my advice is: Strive for those numbers for every statement and transaction on critical the path.

Any query or transaction that consistently breaks one of those ratios, should be looked into.

And if your system doesnt meet the numbers, there are three fixes:
You either Eliminate the statements (just dont do it),
Optimize the offending statements (improve where-clause or indexes), or
Contain those statements (do it less frequent, and dont do it at peak-load periods).

A further word of warning on the general danger of "Ratios" is appropriate.

Phi, or the Golden ratio of 1.6180... Maybe we should look for some relevant way to use that, just for fun.

We have had a clear demonstration, by Connor McDonald I think it was, that Hit-Ratios are not a good indicator for "performance". Connor was able to "set" the hit-ratio to whatever his manager wanted, with very little impact on performance [link?].

Subsequent other performance specialists, notably Cary Millsap, have indeed proven beyond doubt that ratios and other "aggregate" metrics are not good indicators of performance.

Back in the nineties, my colleagues and myself have used scripts based on numerous book (from the nineties...) to determine all sort of (hit)ratios for cache_size, shared-pool, redo en other settings. The ratios were derived by high-tech scripts we crafted, based on the books we bought. And from the nicely, SQL*Plus formatted, output we determined that we needed more memory, larger shared-pool, more redo-buffers or larger redo-files.

In very few cases did those ratios point us to the real problem and in even fewer cases did the advice make any difference.

In almost every case, the real solution came from B/Estat, tkprof, Statspack, AWR or ASH. And most of the time, we were brought straight to the heaviest query in v$sql.

That query was then either made more efficient (less buffer gets per execute) or was eliminated (just dont do it). In 90% of cases, that solved the problem (and sometimes improved a few ratios as a by-product).

My Conclusions:
Most Ratios are indeed of little use - they do not address the real problem.
But Some ratios can help you, if you know what you are doing (and if you know what your customer is trying to do).

With or without Ratios, you still need to find out what your system does, and use common sense to fix it.

I'll save a few other examples of Ratios for later, notably one I use to find locking problems. I think Martin can already guess which one.

Friday, 25 September 2009

Happy Birthday OFA

Cary Millsap from Method-R just reminded us that the OFA standard is now 14 years old.

Check out his original post here:
http://carymillsap.blogspot.com/2009/09/happy-birthday-ofa-standard.html

The Good Thing about Optimal Flexible Architecture was that it created a standardized way to install and maintain the Oracle (database) software stack.

And in writing down OFA Cary used his meticulous, thorough approach. Every aspect of OFA was derived from a real "Requirement". And he paid special attention to Robustness and Maintainability. The configuration of Oracle software, and the life of the DBA became a lot simpler.

Many DBAs owe Cary for writing up OFA!

Wednesday, 18 March 2009

Simple lessons from the DBMS SIG

Crisis or not, there was a reasonably good turnout on the last DBMS Special Interest group this Tuesday (17th March).

If you need to justify membership cost and time to attend SIGs like this, the one single item is probably Phil Davies from Oracle Support who talks you through the latest scary bugs. Some of those, I can almost recite by now, but I know full well how useful this horror-stories can be for project who are preparing to upgrade or migrate.

The XPLAN presetation by David Kurtz was instructive, as there are still a few options in dbms_xplan that I plan to explore when time or necessity arise.

For monitoring, the rapidly delivered presentation from Jason Lester was refreshing. We all know what Grid Control can do by now, but there is always need for more, for different and for more customizable tools. I do not need to be convinced of the advantages of Open Source Software (I did a similar topic myself in ... eeeh... 2004, for a different forum and I stil stand by that message).

Although I have seen Zabbix (http://www.zabbix.org/) used more often then Nagios (www.nagios.org), I would like to add my own endorsement for any OSS or otherwise "independent" monitoring tool. You need some independent tool, next to GridControl and other vendor supplied solutions (openview still sprouts up everywhere - sure we use it if someone has paid for it, or enforced it onto us.)

But quite often, for the actual, Real Application Monitoring we will also implement our own home-grown or OSS solution next to the GridControl. You generally need a tool where you can add/tweak your own monitoring, and that you can rely on to do (or build) the stuff that the commecial tools dont do, or dont want to do. And despite what the book, the courses, and the management say, any DBA or sysadmin worth his salt will DIY in this area. I am also still guilty of jotting up my own unix and sqlplus scripts in various situations. They will do Exactly what I need when I need it. Some of my unix/linux scripts have dates in them going back to 1994 (sad eh?).

Mental note: do a topic on "simple monitoring" sometime.

Pete Finnigan had his moment with explaining the uses and pitfalls of VPD (or is that RLS or FGAC? ... ). Always nice to hear PXF himself explain (the lack of) security in his down to earth way. He keeps going on about security being a "layered" thing, and how adding more security can also itself against you if you dont do it properly and exactly Right. Well done, Cheers Pete!

The cherry on the cake was Jonathan -scratchpad- Lewis explaining how Simple (and yet Complicated) the analysis of Statspack (aka AWR, if you have the budget) really is.
Jonathan did a Great job of letting statspack-reports explain themselves, and with constant "challenging" and checking of his own assumptions.

Key messages that I retained, somewhat biased by my own experience, were:

1. Read from the top, get a feeling for overall functionality and the load of the system in hand. How much time, how many CPUs, how much memory and IO was used. Quite basic.
2. Relate the work done, as shown in statspack, to the capacity of the underlying
system: was the database challenging the hardware or not ? Is it a capacity problem or a single-user, single-application problem.
3. Dont be afraid to ask questions. always.

Jonathan refuses to "write the book" on statspack, with the excuse that there will be too much material to cover, and he is afraid to leave out items that are deemed critically important. My reply to him is along the lines of: Real Application Statspackreading is about common sense.
And if Jonahtan can convey some good messages in a 1 hour presentation, surely
it must be possible to write a not-too-complex book to help the average
reader out there in 90% of the cases? For the other 10% you can always hire a specialist.

I become more and more tempted to write a few "simple" things about Statspack, the Real Fantastic CBO, and the blessings of proper physical design for Real Appliations. The book by Tom Kyte is all you really need (need link).
Hm, will I too then get sucked into the wonderful CBO? (Real Application CBO; it gets it right most of the time... )
My next presentation maybe.

Monday, 6 October 2008

Backup and Recovery, at what level

Yes, this is Yet Another Replication Discussion (YARD).

Indeed, this beast seems to pop up all the time. Today’s trigger was a discussion with some architects on Disaster Recovery in a SOA environment. I’ll try to give it a nice twist, and I will also keep my lunch-budget in mind.

To the serious reader, this article is actually a DataGuard plug.
But with all the buzzwords that spin around these days, this article can be seen as Art: It makes you laugh, it makes you cry, it makes you THINK.
And maybe those SOA/BEPL types do have some good ideas sometimes.

Marketing Tagline:
With SOA and ESB, the DR-replication can move upwards in the technology stack. (and whatever the flavor of the month is: Grid, anyone ? Cloud computing ? The sky is the limit!).

On an even lighter note:
You should know the difference between an architect and a terrorist. It is at the bottom of this page.

(Health Warning: many buzzword-abbreviations ahead)

In most cases, I will advise anyone with HA (high availability) or DR (Disaster Recovery) plans to first consider a DG (DataGuard) construction. Replication of your data to a remote site offers many (documented and repeated) advantages:

DG offers you an off-site backup (but you should still consider tape or disk backups!)

DG gives you reporting on read-only-opened database (even better in 11g real time...)

DG allows you to use the 2nd or 3rd site/system to do maintenance or upgrades (but be careful, you may want 3 or even 4 system in that case, but you can tick some very fancy uptimes in your SLA box).

As an alternative to DG, most organizations will consider using SAN technology for replication. A surprising number of organizations seem to mix the two to complicate life.

Sideline: Some are even dreaming over sales-brochures (and some are entertained at seminars) on “stretched” clusters. When it comes to Oracle databases and RAC, these stretched-beasts are worth a separate story and a very carefully worded one indeed: I can’t kick powerful vendors/partners/colleagues, but I can’t put customers at risk either (?). Maybe Later. I have learned the hard way to Never criticize powerful vendors; it limits your lunch-invites. See bottom of page for lunch-items.

First on DataGuard

Despite my own confidence in Physical Standby (and current limitations of LogicalStandby), Oracle seems to be moving towards Logical (SQL-apply) rather then Physical (Redo-apply). Because of the lesser ties between primary and replicas, “Logical” will offer more possibilities for maintenance and management. The logical replicas do not need to be physically compatible and can span different versions. A trend we can see with open-source databases as well: logical replication is conceptually more powerful then the redo-apply mechanism originally used in Oracle Standby.Dataguard would merit a whole blog all by itself, but I have a different point to make today. I’ll repeat what I said before: DG is a very good start to improve availability of your system.

Now on SAN replication

The other route chosen by many organizations is to “outsource” the replication to the SAN. The SAN system will take care of block level replication of all storage. Attractive, often simple to set up (just shout at the SAN engineers, they will click and drag until it works), and heavily advertised (and paid for). SAN replication is especially attractive if you have to replicate more then just your Oracle data, as is often the case. The main downside of SAN replication is the complete dependency on the SAN technology, the license cost, and the higher volume of data to be transferred (when comparing to Logical or Physical DataGuard). SAN replication works if you have sufficient bandwidth, low latency and capable storage engineers (not to mention the friendly smiling vendor).

SAN replication and DG replication, if applied properly, can both be very powerful mechanisms. I recommend choosing the one you are familiar with (or get the best license-deal on) and stick with it. I would not recommend using both SAN and DG in an inter-dependent mix (a surprising nr of databases are indeed replicated by both concepts, and that tends to add to the confusion in case of recovery).

For Oracle/RAC on stretched clusters: suffice say that you shouldn’t pay for this yet, but if the (hardware) vendors wants to demonstrate this at fancy locations/parties: do encourage them to invite you. The prices of this technology warrant a Business class ticket (with spouse, and partner-program organized).

Coming to the point: The next alternative.

We now have a 3rd alternative: SOA-layer replication. To most of you this may be old news, or indeed an old concept re-worded: you are correct. Read on. And tell me how I got it all wrong when I became entangled in the latest hype without really understanding the concepts (or even knowing the jargon).

Stack-wise, I would situate SAN replication at the bottom of the technology stack (deep down in the infra-structure layer). And I would place DataGuard somewhat higher up in the stack (closer to the app-layer). Hold this layer-idea for a moment.

Enter stage: the Service Oriented Architecture (SOA) with the concept of an Enterprise Service Bus (ESB). Apart from the number of new abbreviations, what really struck me on SOA were the concepts of “above” and “below” the Layer (anyone read Tom Clancy in the nineties?). For some architects, the database is just a “service” that takes care of persistent storage. They are right (they are, after all architects). Google: “Database + ACID”, possibly add “Codd + rules + 13” for some very basic stuff on databases. They are correct; ACID-ity is the essence of a database.

Now think of replication for DR purposes as a means to store things in multiple locations, and possibly as a means to become technology-independent. And imagine a SOA request to store data to be sent not to a single service, but to two or more services. Borrowing from the DG concept: as long as more then 2 services have acknowledged storage we can consider that the data is safe.

The next level of "Replication", and one that the SOA crowd will understand!

Following this line of thought, the storage layer becomes Yet Another form of the Grid Concept (YAGC?) or Yet Another Computing Cloud (YACC!). And the calling process doesn’t care anymore where the data is stored. Just as your GmailFS file system doesn’t know or care where the data is stored. It knows data is stored, and that is sufficient. (nerds and architects alike: take note of GmailFS, I’m really curious: is Gmail the ultimate ESB?).

Like many architects that refuse to look “below the layer”, we can now state that we have a concept (eeh, a resource?) for replication and that this DR pattern is “Pure SOA” and “fully ESB compliant”. It even relates to “GRID” and will thus gain us brownie points in the Grid-Indexes of various partners (more links below).

Myself, I will of course stick with Physical Standby for the moment. But it doesn’t hurt to explore the possibilities. A manager with some spare budget might even turn it into a seminar (with lunch, I would presume).

The Relevant Links are

(SQL> select url from relvant_stuff order by credibility desc;):

Dataguard and Oracle high availability (valuable stuff here):

http://www.oracle.com/technology/deploy/availability/htdocs/DataGuardOverview.html

Gmail fs (think about this one):

http://richard.jones.name/google-hacks/gmail-filesystem/gmail-filesystem.html

SOA and ESB (…):

http://www-128.ibm.com/developerworks/webservices/library/ws-soa-design1

http://www.managementsoftware.hp.com/products/soa/ds/soa_ds.pdf

More on Grid hype (…):

www.hp.com/techservers/grid/index.html

http://www.oracle.com/corporate/press/2005_apr/emeagridindex2.html

Note for all Alliance Managers: take note of my cross-links to our “powerful partners”, and please keep me on the invite-lists, I do appreciate quality food, I’ll even tolerate the mail and spam for it.

In the pipe-cleaning department or “mopping up loose ends”, I owe you these:

Lunch: There is no such thing as a free lunch (unless you are the lunch).

Architects: The difference is that you can (sometimes) negotiate with the terrorists.

Now shoot me (over lunch?).

Friday, 8 August 2008

throttling and Triage, where do I make my difficult decisions

Every now and then the discussion about the "processes" parameter flares up.Setting this parameter too low results in ORA-00020 "maximum number of processes 1024 exceeded", but too high a setting will strangle the database-server with processes.

My favorite application manager always wants me to increase the number of processes. For an app-server and for its end users it is bad publicity if ORA-00020 appears in a java-logfiles or even in front of a (web-based) customer (now who is to blame for that type of error handling ??). Hence the knee-jerk reaction to increase the "processes" parameter in the spfile. This demand is often accompanied by the assurance that the application will never (actively) use the high number of connections, but needs the high number to make very sure the error never will (re)appear.

App-jockeys and managers generally refuse to take responsibility for setting or decreasing the upper-limit on the nr of connections in their JDBC connection pool. Some of the more exotic app-servers don’t even respect their own settings, and happily explode the nr of connections to something in the 4-digit range per app-server-instance.

Luckily, these app-servers will generally melt down by themselves, and that saves us from a database-brownout with more disastrous consequences. DCD or active killing then has to take care of the remaining connections, and preferably before the clusterware (automatically, unstoppable) or the operators (eager to stay within SLA) fire up the next application server who will also need to initiate his JDBC pool, hence needs the connections.

However, if we are unlucky, the app-server doesn’t melt down, and the database hits the max-processes, whereby other app-servers with genuine need increase connections will also suffer. Not Good.And one reason why pools should be conservative in changing their number of connections.

For the DBA, it makes sense to set the parameter to a value whereby the database can still operate "normally". Allowing too much processes, even inactive or near-dead ones, makes no sense and consumes unnecessary server-memory, sockets and cpu-cycles.Database and Unix zealots should now pop in and say that the processes-parameter controls many more derived values (transactions, sessions, enqueue_resources) and therefore requires careful consideration than just the shouting of the deployment team. I will stop there by saying: too high a setting is simply not beneficial. Vision the number of CPU’s in your system, and imagine the overhead of keeping a high number of processes alive, whether you use PGA and Workload parameters or not. (I can se a whole set of nerdy comments coming: Fine! As long as we agree on this: processes should be set lower, rather then higher, small is beautiful).

In a legacy Client-server environment, it often makes sense to use shared-server or its predecessor MTS (Multi Threaded Server). The shared-server construct is ideal to handle a large load of relatively quiet connections. As the "clients" in C/S are often unaware of each others workload and existence, it is the database that needs to take on the job of sharing (pooling, queuing) the connections. Note that MTS or Shared-Server is equal to pooling connections on the database-server. Do we want the database-server to be busy juggling connections? (IMHO: only if we have no choice, but in C/S, the Shared server can make sense).

In a J2EE environment, it makes more sense to use dedicated connections. Each call to the database should be handled Fast and the connection should be made available for the next thread that needs it.The database should ideally focus on doing its ACID task, and not be bothered with load- or connection-sharing. The Connection-pool can handle that. The JDBC pool mechanism is the component that should limit the number of connections and take care of the throttling. A JDBC pool should ideally open its maximum (=optimal) number of connections and keep those open, as the creation of a new connection takes time (when in a hurry, you can’t afford to wait for a new connection to be opened up). Provided the nr of processes (connections) is not allowed to go over a workable limit, there is no reason why each connection-pools should not be allowed to pre-create a fair number of connections. But the upper limit should be firmly set to a value where both the app- and the database server can still operate efficiently.

Allowing too many connections under increasing load will result in degraded performance, and worse, in a meltdown of the instance. If that happens, everybody suffers, and not just the last-clicked-user.

Two common causes for high numbers of connections are generally:

Transactions take too long (performance or locking problems, improper use of XA), or
Connections not released back to the pool (sloppy coding or just plain bugs)
Therefore, it makes good sense to include connection-count and connection monitoring in the test-plans, and to monitor (count, plot) connections during live operations. My strategy is to always set processes to a conservative value. I use it protect my database against brownout and meltdown.It is up to the Java (or other app-level components) to use the number of available connections to the best extend, and to provide an error-handling or throttle-mechanism to handle overload.

The approach in short:

Determine how many connections the DB can server while maintaining reasonable response-times.
Set that as max-processes.
Tell your app-servers to stay within the limit (and use max-pool-size).
Monitor the session-high-water limit (does it ever approach the max?)
Do spot-counts of connections (and plot them over time)
Audit-session to know where they come from (find the origin of high numbers).
Bottom line: a surge in traffic should not be allowed to cause melt-down of the database. High volumes should be throttled higher up in the stack (load balancers, web-servers, app-containers).

If the throttling mechanisms are absent, or are not working then I think the database has a legitimate need to keep processes to a reasonable value. Potentially a difficult decision, but someone has to take it. Triage can be painfull, but there is a good reason to do it. The survival of the system can depend on it.

Admittedly, this is a very db-centered approach. Waiting for some debate from the app-guys now.

Tuesday, 11 December 2007

UKOUG 2007 Conference - the Lessons.

Finally, I can offload some of my UKOUG stuff.

First the Budget-justification (management is reading - sometimes). Like every year, I can justify visiting UKOUG as a training cost. I have learned more in four days UKOUG then I would have on just about any (Oracle) course and have saved my employer some money at that.

By taking the time to see other “users”, I have picked up some interesting new notions and ideas. Not to mention how a particular colleague has found a way to save a large, high-profile project some significant money. Money that a Large vendOR or some partner-reseller would gladly have pocketed.

In other words: information that would probably not have been spelled out clearly in a “commercial” course.

The hint is in Lesson 1:

Sometimes the best way of doing something is to … Not do it at all.

(going back, I think to Dave Ensor, possibly further).

Enough messages to management now, on with the real stuff.

CERN: Testing the limits.

The CERN guys were impressive. As with the CERN-physics, They are going places with the software were some of us may follow in a few years time. Others may never have to go that far, but we can all benefit from their experience.

What sets them apart is not only their pioneering and their IT-skills. They also have the urge, the liberty (and the duty?) to share their knowledge freely.

They had some interesting experiences with RAC, Clustering, Scaling, en software distribution. They notably had some gentle criticism on the deployment-process of the software. See the presentation on RPM-ing Oracle that was particularly in my line of reasoning.

An RPM or similar distribution, paired with a re-vamped OFA would make our favORite software much more manageable (Those who saw my ~~rant~~ presentation this year know why).

The input from CERN can be very valuable to many organizations (commercial and others) out there. They fact that they can present this at the conference is another demonstration of the useful role that UKOUG has in knowledge transfer.

CERN employees can be quite open and can share their experience more easily then commercial users with competitors listening. They can also more easily and openly criticize the vendor then commercial consultants are able to. Most “independent partners” both large and small, will always remain

“diplomatic” towards the dominant party, as they need the business and revenue that is benignly handed down to them (and can be taken away on bad-behavior).

Meanwhile, CERN is a highly visible, and hopefully “independent” user of the product. It is possibly one of the very few large customers to which oracle might sometimes listen.

Lesson 2: Criticism is Good.

Events like this, allow for some constructive criticism towards the vendor. Criticism that would otherwise mostly get lost in the “commercial” ~~spin~~ processes. It is my own belief that this criticism can make the product more competitive. And ultimately, both the vendor and the user-community will benefit.

Other presentations: Many Good ones…

I cannot resist mentioning how Michael Doherty confirmed my opinion:

Keeping a system “Up” is never Simple (as we would like it to be simple…)

We can keep trying though.

Lesson 3: Simplicity is Always better!

There are many more memorable presentations I want to mention, but they will have to wait until the next installment. The good topics range from “disater recovery”, use of streams (aka logical standby), all the way to various RAC-internals and ASM. And of course the inevitable topics on optimization and CBO (which I keep attending, because of course we have been bitten by CBO, again).

Hm, tempted to insert a semi-funny CBO lessons here:

4a) CBO is never wrong, but your design/code/sys-statistics/seg-statistics/data/hints/outlines can be very Very VERY wrong…

But there is hope: According to Jonathan Lewis, commenting on the unpredictable behavior of CBO, notably on upgrades, and on the feeble attempts by mere mortals to use hints or otherwise control the beast:

4b) “Sometimes, you Can get lucky.”

Before pressing the publish-button, I specifically wanted to insert two “Thank You” items.

First a “Thank You” to Dough Burns for reviving OFA.

The old, rather boring workhorse of OFA was conceived by Cary Millsap and put to Good use in Oracle versions 6, 7 and 8. It even became part of the official oracle distribution. Dough has now made it his mission to re-educate the world about how sensible good OFA practices are.

It doesn’t seem to matter that Oracle still has OFA in as part of their manuals, and even as part of their default installations. This standards still seems to get both ignored and abused by many. Visit Dough for more.

And have a read of the presentation by Robyn Sands. Same problem, different angle. Interesting. As I have hinted in my own presentation, OFA can also do with a bit of good attention. Maybe we should make some suggestions to Oracle.

Finally, a humble “Thank You” to those who sat out my own disastrous presentation. I’m sure they had fun seeing me sweat when the equipment didnt work as expected. Plan A failed. Plan B failed.. Improvise!

As a final lesson, here is a more detailed account of what (nearly) went wrong. But I basically had to do the first 10 minutes without slides.

The ICC technician saved the day. He deserves a medal.

I think everybody picked up the last lessons there:

Do not go head-first with the latest + greatest versions. Or if you do: Test Properly, Test again, and be very, very prepared.

And to survive disasters: Prepare (have backups and standbys), prepare (have more copies), prepare (document), and prepare further (test)…

Friday, 16 November 2007

Databases and Systems, what kind of Relation.

Over the years, I’ve had several interesting discussion on the relationship between “databases” and “systems”. It comes down to this: Should there be many databases on a single system (m:1, the traditional approach), or should there be many systems underneath one database (1:m, grid)? And when is 1:1 appropriate?

Current hardware is powerful enough to allow “supernodes” that can run hundreds of databases. We are also confronted with Virtualization or “carve-up” of hardware by way of XEN, VMware, or vendor specific product that create domains or partitions.
With the current push towards “virtualization” of systems and the (in)capabilities of Oracle, it may just be worth to re-start some of the n:m discussion (did it ever go away?).

It is time to take a position on the many-to-many relationship between databases and systems.

This very story can also be found is actually the text version of my presentation "Databases Everywhere", and can be found from various UKOUG and other sites. (include a link...)

To summarize it for those architects and (account-)managers who are in a hurry:
I am in favor of 1:1 wherever possible, I support 1:m (RAC, grid) if really, really needed (but just please think about it one more time). RAC is a wonderful piece of technology that can serve many other vendors as a good example. It can be made to work. I will try to indicate under which conditions I think RAC can or cannot be applied.

Finally, I will only tolerate the old-fashioned m:1 for non-critical situations or on systems where there is some sort of risk-mitigation against interference between the (instances of) multiple databases.

After explaining these positions, I also have a list of recommendations for customers, providers and even for the Big-Oracle itself.

Those of you interested: read on. Possibly prove me wrong.
Others, keep browsing, the truth is out there, whichever version or vendor you want to see.

For the record: the bottom of the text contains links to all our major “partners”. Please keep the invites coming.

Intro, and some definitions.

On conventional systems, we generally find 1 or more databases running on a single system (*nix or even windows). For example, for concept-testing of a DG setup with cascaded standby, my laptop has run 5 databases simultaneously. Slow, but running. It just takes loads of memory and careful (memory-) parameter setting. It illustrates the capabilities of the Oracle database that the concept, once proven on a laptop could then be used to clone a 3TB production system (this time not on Windows). But the laptop Proof-of-Concept also illustrated an important issue when running multiple databases on a single system: Contention. The databases were visibly (and audibly from disk and fan) competing for IO and CPU resource.

NB: Can’t resist to put one in for Deb and Nigel: Let the world know that Deb got more memory into her laptop than Nigel, but then both David and I still have more memory in our laptops then the two of you together, so there ;-). Size matters.
Techies rule.
Back to serious business.

With the current possibilities for virtualization, you can take hardware and split it into many “systems” using VMware, OracleVM, any other Xen or vendor-specific tools for domains, lpars, etc..

Each resulting system can then be used to run instance(s) of 1 or more databases. When is this useful and how far should we take this?
With Oracle RAC (GRID, anyone?) you can take a database and distribute its instances over many system. When does this add benefits?

First a few quick definitions to delimit the playing field. Please not that the definitions are for the purpose of this argument only. They don’t pretend to be scientific or final. Or even to be correct.

System: a running *nix (or win*) instance containing a process list and an amount of addressable memory. On short: a running instance of an “operating system”.
A system can be Virtual if the hardware seen by the operating system is not identical to the actual underlying kit. This is the case when a larger system is split into “virtual” units by use of Xen, VMware or some vendor-specific layer of software or firmware. Some of those can also be modified dynamically (e.g.: Rolling). Some definitions and a good description can be found here:
http://encyclopedia.thefreedictionary.com/virtualization.

Database: an (Oracle) database, containing one system tablespace, and one user called SYS (I’m still trying to find the “essence” of an Oracle database, how about the sys.obj$ entries?). Note that the use of DataGuard can mystify the definition of a “database”, because each DG-clone can represent “the database”. The actual Single Point of Truth (SPoT) is where the current “primary instantiation” of the database resides.

Also note that my definitions do not include the binaries, or the ORACLE_HOME, as part of the database or the system. Indeed, systems and databases can be used in situations where the software needed to run them is “shared”. Most system only need a few “system specific” files in /var and /etc. I will always point out VMS as the ultimate mother of all clustered systems whereby files are shared between multiple nodes. But that requires a Clustered File System (CFS), and that opens a different discussion altogether. Suffice to say that a CFS is very suitable to ensure that all machines can be connected to the same, identical software and are guaranteed to run the same version of the binaries.

Now let me briefly elaborate on the different options.

Conventional deployment - many:1.

On conventional systems, we often see many databases running on a single (unix) instace.

The DBA can look after the databases, and the unix administrators have only one entity to watch. Any collision between databases will have to be handled by the DBA. Note that in most of these conventional cases all databases share the same software-tree (multi databases running from the same oracle_home).
These systems tend to have a relaxed SLA. The utilization varies and sometimes we see high percentages of CPU or IO bandwidth being consumed by a single database.

The main disadvantage of conventional systems is the fact that there is only one system, and all databases meet there. A problem with either a single database or with the system itself can quickly contaminate all databases on the system. And upgrades of system- or oracle-software lead to simultaneous outages of all databases on the system.

In case of system- or hardware failures many databases must be recovered simultaneously or re-started on one or more (other) systems (requiring some prioritization). The simultaneous recovery of many databases may lead to a brief period of overload and/or chaos an possibly a domino effect on other systems or components.

The advantages of an m:1 configuration are the simplicity of a “single system”, which is easy on the system-admin, and the often cheaper license structure when using per-system license.

Simple and Robust 1:1

On system with a high load or a stringent SLA, we tend to see 1:1 relationships: a single database, whereby the sole and single instance of the database runs in a single unix system. This is sometimes referred to as the monogamous configuration.
By having the whole system to itself, the database can benefit from all the resources available. There is no interaction (disturbance) from adjacent instances or other processes. Determining parameters is relatively simple.

However, as Oracle (-sales) will point out, this 1:1 configuration generally means the system is grossly under-utilized. It also means the database can still suffer from unix- or hardware failures.
But I like the simplicity of this configuration, I think this is the most “robust” solution and is applicable to the majority of databases. Whenever a problem occurs on one of the systems, only one database is affected, and recovery-efforts can be concentrated on a single database and application, reducing the risk of a domino-effect.

The 1:1 situation also lends itself very well to hardware-clustering or “cold failover” whereby a database is re-started on another system (node) in case the underlying system or hardware fails. Only one database needs to be re-started or recovered.

Since a 1:1 configuration requires many “systems”, it is attractive to use server virtualization. By running multiple “virtual” systems on a single piece of hardware, you can quickly create the required number of “separate” or isolated systems. When doing this, keep in mind that the underlying hardware remains the single point of failure. When one or more virtual system are meant to replace one-another in case of failure, they should preferably run on separated hardware.

Real Application Clusters, RAC - 1 : many

As a techie, I like the technology behind RAC. It is a wonderful thing to play with and I like the challenge to master this thoroughbred in real-life system. But I have to be careful not to be running a “solution looking for a problem”.

We tend to see RAC databases in organizations with formal and very stringent SLA’s and with the budget and the resources to try and meet these requirements.

OPS and RAC eliminate the SPoF of the “system” and deploy the database over multiple systems. Theoretically this works nicely and even provides dynamic provisioning of system resources (you can utilize all available kit, and you can add more kit as needed). In practice, many have pointed out the relative complex setup, the high price, and the other shortcomings of RAC (link to Miraculous, Famous Danish company).

We have indeed seen successful deployments of multi-instance database on some very large kit. In some cases, the impressive amount of hardware was able to hide badly designed application-code for quite some time. And by constantly distributing the (mainly CPU-) workload over the available unix systems, the coders got away with some appallingly inefficient constructions. Some of these cases have demonstrated the viability (and sometimes vulnerability) of RAC quite nicely, although a better design or implementation might have been cheaper (I prefer brains over iron, always!, in case of doubt: reduce the size of the hardware and tell the IT crowd to JFDI).
Note however, that it is important to let the hardware boys and vendors it their way a bit too. Riding in some extra hardware makes them happy, and is good for our relationship with these vendors. They might invite us on future projects.

And to pour further praise on the Oracle techies, TAF has saved our systems several times when a node got in trouble and died. Database-nodes die mostly through software errors, core dumps or memory leaks, and sometimes through human errors. The underlying hardware is rarely a problem, as these high-end-hardware systems are built to keep running run even if a salty ocean wave runs through the lower floors of the building : unintentionally (Kathryn, are you reading this?).
Please remark that even when you have RAC and TAF-capabilities, you still need to code your application to correctly trap and handle the failover-events. Otherwise, it only works on “idle” connections.

Note that most databases can (be made to) run on RAC, provided there are no obvious bottlenecks such as an ordered-no-cache sequence or some very hot blocks with running-totals. And those bottlenecks can generally be un-designed.
NB: Oracle currently seems to have the following position on RAC: if it doesn’t run (or scale) on RAC, your design is wrong.

Suffice to say that where Very Fast Failover (TAF or FCF/FAN) are needed, RAC has no equal. And for systems that have extreme hardware (CPU) requirements, the RAC scale-out model is also beneficial.

Question : Now what to choose ?

The classic answer: It Depends.
However, I will try to provide some guidance and some opinions.

The only factor that really matters is “The Business” (duh).
What does your business need and what can it afford. For simple or undemanding SLAs the traditional m:1 configuration is often sufficient and cost-effective. For businesses that have more stringent demands or for providers that risk being sued by their (business-)customers over a broken SLA, a 1:1 is advised, possibly with some cold-failover mechanism. And finally, if you really need the additional 10 minutes, or if you need the scale-out features, AND if you can afford the resources for testing, training and ongoing maintenance, a 1:m configuration (aka RAC or grid) can be your choice.

To figure out what your business needs (and can afford), you can either think for yourself, or you can give yourself and your department more credibility by engaging (principle-, business-) consultants to do cost-benefit analysis, risk-assessments or FEMA (that is: Failure Escalation Mode Analysis, not the other FEMA). They will especially stress the business-cost of downtime and any related loss of data/productivity/customers/orders/MegaWatts/ That is enough FUD for the moment. Back to more practical matters.

On the practical level, there are some factors that come into play. There is a) the preference (eeh: dictate) of your system administrators, your SAN engineers, your ASP, or your hosting provider. Then there is b) your commercial relationship with Oracle which will determine how high your licensing cost will turn out. But there may be others, such as c) the capabilities and preferences of your DBA. We will not even go into items like d) the availability of test-systems to prove and maintain your architecture, or e) the available rack-space.

The first important factor that often comes into play is the preference (or the pricing-policy) of the system-admin team or the Hosting-Organization (the ASP). Are they capable of handling many systems or do they prefer a low number of unixes ? Can they quickly build and clone systems for provisioning? What price do your ASP’s charge for additional systems ? This may determine your capability to run “multiple” systems.
Some organizations, by choice or by force, still get away with running just 1 large unix box with everything on it: HR, CRM, Logistics, and sometimes they even have their dev/test/uat environments on the very same box. Feasible, but with most of the drawback of a m:1 configuration.

The next factor is often License cost. How is your relation with Oracle, commercially? If you have to pay list-price, you will want to stick with “conventional”. Here, Oracle shoots itself in the foot: a lower price on RAC would speed up acceptance of the RAC and GRID model.
Machiavelli did suspect this was done to buy sufficient Beta-time to find all quirks, possibly to find a solution for instance-affinity and to give customers the time to come up with a solution for the friendly delivered “Your Design is Wrong” consultancy-audit outcomes.

And last but not least, what do your DBA’s prefer, and how trained and comfortable are they with RAC/grid? The traditional choice, m:1, despite its disadvantages (contention, domino-effects) is still the easiest to maintain for a DBA. Choosing a 1:1 configuration brings on a slightly higher workload, but has the advantage of more robustness and easier, isolated, troubleshooting since databases and systems do not affect one-another when trouble or maintenance occurs (yes, yes, someone must shout “utilization” now, thank you).

The choice for a RAC or grid configuration tends to create a significant overhead. We politely disagree with Oracle at this point that the new grid-control alleviates all problems.
And even if GC and its agents do try to take away a lot of the routine-tasks, Knowledge and Experience can never be completely replaced by a GUI. This aspect, the Human- or Operator dimension, tends to be the most under-estimated factor when (prematurely) implementing RAC/Grid.

Recommendations:

For customers and end-user businesses:
Move carefully from m:1 to 1:1. The 1:1 configuraion is at this moment arguably the most robust way to run a database. Consider using virtual systems to support a 1:1 deployment, but beware of the possible contention and SPoF on the underlying physical layers. Move on to 1:m (RAC) for cases with specific needs (failover or scale-out). Only use RAC in cases where you must, but then don’t hesitate to use it. It can (be made to) work, and it will work (eventually), and you may have to learn these tricks eventually. Start on the first valid occasion.

For ASP and hosting organizations:
Learn how to handle clusters, clones and virtual systems. These tools will give you an edge in flexibility. Then offer your customers the possibility to host many _identical_ systems at a rebate. Your customers will buy more as 1:1 and 1:m systems proliferate, and in the long run you will benefit. Hosting Companies and some vendor pricing-policies are the largest obstacle when moving from m:1 to 1:1 or even on to 1:m. Innovative customers will try to move to cheaper and more flexible platforms, and even lagging customers will eventually follow. If the hosting provider can quickly “provision” at acceptable cost, he can be seen as a partner in commoditization, rather then as an obstacle to flexibility.

For System-admins:
Learn how to handle a multitude of systems, learn how to keep them in sync, and how to clone or (re-)build systems quickly.
NB: for the addicts: investigate the use of Clustered File Systems (CFS).

For DBA’s and system-admins:
Aim to deploy databases and systems in a 1:1 fashion. The “isolation” of each database and system greatly facilitates admin- and troubleshooting activities.
Also get used to replicating or sharing software through OUI or other mechanisms.
And if possible, start to work with a CFS.
Sharing storage at the “filesystem” level can facilitate the juggling of multiple systems very much. Even NFS (supported, but not recommended is a usable alternative.
A CFS can offer great advantages by sharing files across nodes and this can simplify software-deployment and distribution. You will always need two copies of software for redundancy-purposes, but please think before making he 3rd, 4th or 42nd copy. Sharing is better the copying, especially at high numbers. It is easier to manage a small number of shared oracle_home trees then to have 42 or more copies that need to be rsynced or otherwise kept identical.

For Oracle:
It is appalling to find that a grid (aka an OPS database) was easier to build and maintain under VMS or Tru64 with Oracle 8174 then it is with the current 10g versions.
Please pursue the development of OCFS and facilitate shared-binary installs on OCFS and other CFS platforms. This will help proliferation of your GRID strategy, and will get you more market-share and revenue in the long run. The oracle-inventory mechanism and the configuration of agents tend to make life difficult for deployment of shared-binaries. Any viable grid should IMHO include the shared use of a software tree and not depend on endless replication of executables.

Shutdown (normal)

By adopting a grid-strategy, Oracle has greatly increased the options for its customers. And for those of you who don’t know it yet: find the Oracle Sponsored GRID-INDEX. Various hard- and software vendors have added to the palette of choices by implementing their own versions of Grid, clusters, or virtualization. All this new (eh, apologies: innovative but proven) technology can be put to good use, but only when making the correct choices.

We hope the preceding information can offer some help, and we would like to close down by adding one more item of advice: try to base you choices on simplicity

Relevant links

The usual brownie-point partner-links and marketing buzzwords on grid etc :

http://www.oracle.com/global/nl/corporate/press/200634.html
http://www.oracle.com/global/eu/pressroom/emeagridreport4.pdf
http://www.hp.com/techservers/grid/index.html
http://www.ibm.com
http://www.sun.com

More on server-virtualisation (start here!):
http://encyclopedia.thefreedictionary.com/virtualization

An introduction to Clustered File Systems:
http://www.beedub.com/clusterfs.html
http://www.oracle.com/technology/pub/articles/calish_filesys2.html

Some good, albeit biassed arguments for CFS can also be found here:
http://www.polyserver.com (look for articles by Kevin Closson).

A classic on RAC:
http://www.miracleas.dk/WritingsFromMogens/YouProbablyDontNeedRACUSVersion.pdf

Just for fun:
http://www.userfriendly.org
http://www.dilbert.com

Last bootnote for all Alliance Managers and other people in control of party-invites: take note of my cross-links to our “powerful partners”. And don’t worry, since you didn’t bother to read all of the text, neither did the real decision-makers, hence no damage is done. Oh, and please keep me on the invite-lists, I still appreciate good food and quality entertainment.

Now shoot me (over lunch?).

Monday, 7 May 2007

sitemap of SimpleoOracleDBA

Here are the introductions to PdvFirstBlog and SimpleOracleDBA

Intro - if available - This is where I started my my blogging era

Simple oracle DBA - The main topic.

Various Oracle Tips and Rants and Opinions, all to do with practicalities.

Simple Oracle DBA - Availability, Scalability, Manageability... It has to be Simple

Bitmap Indexes on an OLTP system - ORA-00060... Classic Deadlock

Backup on a different Level - in the SOA layer.

How To Pool Connections - One of the challenges

Databases Everywhwere - Systems and Databases, how many databases.

ILM, Information LifeCycle Management - another Three Letter Acronym.

Index Organized Tables - Many Benefits

txt - txt.

Various Rants, all to do with practicalities.

In times of Crisis you call the DBA - Who you gonna call...

Rationalize TOAD - Toad will find a way...

M O O W - Miracle Oracle Open World...

My blog-plug for the Conference - UK Oracle USergroup...

Travel related stuff.

Angel of the North - My most visited leasure page...

The usual links to groups etc..

My own linked in profile (shameless plug)

my agenda (normally at the right hand side of the blog)

UKOUG main url

Miracle BV

the home of Oracle-L : Best Mailing list

link to Oracle Forum.

Simple Oracle Dba