Tuesday, 4 July 2023

Use SQL to find problem-statements

 TL;DR: There is a lot of interesting information in v$sqlarea, but beware of jumping to conclusions. 



On a troubled system, one of the first things I look at is v$sqlarea.


Have a look: 


Select * from v$sqlarea s order by buffer_gets ;


This view has grown over time to include a lot of columns but I recommend you start by looking at the ones you (think you...) understand: executions, time, disk_reads.


Depending on what you (think you) know to be your bottleneck, you can order by buffer_gets, disk_reads, or simply by elapsed time. 


My starting point for investigating a system is often:


select sql_id

, executions

, elapsed_time as microsec

, elapsed_time / ( (executions +1) * 1000 ) msec_p_exec

, buffer_gets

, disk_reads

, substr ( sql_text, 1, 60) txt

from v$sqlarea s

where 1=1

-- and parsing_schema_name like 'APP_OWN%'

order by 4 desc /* heaviest on top */

, elapsed_time desc

, buffer_gets desc ;



When run in SQLDeveloper, this puts the "slowest-per-execute" on top of the list. But by editing the query, I can quickly look for the most-buffer-gets, or the most-disk-reads as well. 




In the screenshot above, it is clear that the first 3 lines, a PL/SQL block and two select-statements, are consuming the majority of the elapsed-time. Those are probably the pieces we should investigate.


Notice several things about this way of "finding heavies":


Depending on what I want to look for, I can add/remove columns to the select. Some lines are read-to-uncomment, in case I need them. In reality, the stmnt I have saved in a ready-to-use txtfile is much larger, but it would look really messy to put that up in a blog.


The +1 is my dirty-trick to avoid div-by-zero. Notice how several stmnts have Zero executes: those were still running while we did the sample.


When run in sqlcl or sql*plus, I remove the DESC from the order-by, to make sure my relevant stmnt appears at the bottom (e.g. I dont have to scroll back).




Some warnings are due: 


There are a lot of caveats to this quick-and-dirty method.


It will only find statements that are currently in the shared_pool, e.g. recent or often-used statements. 


The numbers are "averages", which may or may not be impacted by outliers. One slow stmnt, an accidental lock, or one user looking for "Everyone called Garcia" may impact the results. Hence in interpreting the results: also check against your AWR or statspack, and check against common sense.


For example, if you know or suspect your database has an IO bottleneck, you should also order by disk-reads rather than by buffer_gets or by elapsed.  And if the slowest SQL does not concur with the most disk-reads,  you should question whether you really have an IO bottleneck.


Once you have identified a few relevant or "heavy statements", also go and verify that those statements actually are involved in relevant activity. Sometimes, the "load" is caused by something you didnt expect. (I have a funny story about an uptime-measuring script that impacted a running system, causing a hiccup every 15 minutes).


And next, once you think you have the SQL_ID of a problem-stmnt, you can use that sql_id do do further inspection.


For example, you can dig into v$sql for child-cursors, or into v$sql_plan and v$sql_bind_capture. Lots of possibilities there, and there is always more to discover.


At this point, I strongly encourage you to experiment and do some further research by yourself. You are welcome to Copy/Paste my code, but you learn much more from discovering things by yourself.


Happy Searching !



No comments: