Tuesday 11 July 2023

The History of a Query

 TL;DR: The dba_hist_sqlstat view can tell you what has changed in the frequency or in the execution plan of a given query.



We have this dashboard-query that suddenly became very slow. The qry is a sum (something) to indicate "how much work was done". Normally it uses an index and is lighting fast. This morning it seem to have suddenly slowed down.


Let's dig in and use this as an example on "How to Investigate".


We know what the query looks like, and using the v$sqlarea, we can find the SQL_ID: 7292jtjypdyvt.


Next is the DBA_HIST_SQLSTAT : this can tell you how an SQL performed over the snapshot-intervals. 


If you are using "statspack", you will find similar information in STATS$SQL_SUMMARY, but you will need to subtract the preceding value to find the difference since the previous snapshot.


We start with this:


select * from dba_hist_sqlstat  sq

where sq.sql_id = '7292jtjypdyvt'

order by sq.snap_id ;


Since v12, the DBA_HIST...  views contain "delta" columns which save you the effort of subtracting.  


And we need to join to the snapshot view to find at what time each snapshot was taken. The query evolves into something like this:


select to_char ( sn.end_interval_time , 'DDMON HH24:MI') as Time

, sq.executions_delta     execs

, sq.buffer_gets_delta    buff_gets

, rows_processed_delta nr_rows

-- , sq.*

from dba_hist_sqlstat  sq

, dba_hist_snapshot sn

where sq.sql_id = '7292jtjypdyvt'

  and sn.snap_id = sq.snap_id

  and sn.dbid = sq.dbid

  and sn.instance_number = sq.instance_number

order by sn.dbid, sn.snap_id, sn.instance_number ;


To place it all in perspective, I also tend to look at the elapsed-time per execution, and the buffer_gets per execution:


elapsed_time_delta / executions_delta  as  sec_p_exe


But here I need to be careful, some decode is needed to avoid divide-by-zero.


And another interesting item is the plan_hash_value: if the plan has changed over time, it means the optimiser has chosen another plan, which may or may not be advantageous.


In my case, the inspection-query has turned into :



Or, in my case, I often run it from a script, that I have called 

SQL> @sql_freq <sql_id> <enter>




Let me discuss the relevant columns here:


TIME: Notice that my awr-snapshots are only 10min apart. I tend to set that interval narrow on critical systems or on systems under investigation.


EXECS: notice that this SQL is fired 10404 times between 07:30 and 07:40. That is quite a lot of executes in a 10min interval. Busy time-window?


BUFF_GETS and GET_PX (gets per execute) : At 07:50, the nr of buffer-gets suddenly goes up. And the nr of gets-per-execute goes from 3 (very efficient) to 7000 (not so efficient). 


SEC_PX: the nr of seconds per execute is still way below 0.5 sec. When rounded to whole-seconds, it ends up as zero-seconds. In some versions of the script, I display milliseconds, but when dealing with big/slow SQL, those numbers tend to become unpractically wide.


NR_ROWS: Because this is a sum-query, each execute only returns only 1 row.


G_PR: the nr of block-gets per row-returned. For some queries, I want to know the nr of blocks processed (gets) in relation to the nr of rows returned. In an OLTP system, you want the nr of blocks processed to be low. But if for some reasons the nr of rows is high, I can tolerate the processing of more blocks (e.g. bulk-shipments with a lot of items...)


PLN_HV: the Plan-Hash-Value. If this value changes, the execution-plan has changed over time. In this case, we we notice 3 intervals where the plan was indeed different. From the nr of Buffer-gets and the gets-per-row, we deduce that the "other plan" is probably a lot less efficient because it requires 7000 gets to return the result, whereas the "efficient plan" only needed 3 gets (a typical index-lookup) to return the result.


To further inspect the SQL, and display the execution-plans used, you can use something like:


select plan_table_output from table (dbms_xplan.display_awr('<sql_id>'));


And there is a lot more to discover, for example:

The DBA_HIST_SQL_PLAN contains details about the execution-plans, and DBA_HIST_SQLBIND can be used to hunt down the values of bind-variables from queries.


It is all there for you to explore ... 


In this particular case, the re-calculation of statistics for the relevant index was sufficient to "fix" this problem. At 09:00 the SQL had returned to its normal, efficient, PHV and 3 GET_PX.


No comments: