How to understand statistics of trace file in Oracle. Such as CPU, elapsed time, query...etc

How to understand statistics of trace file in Oracle. Such as CPU, elapsed time, query...etc - sql

I am learning query optimization in Oracle and I know that trace file will create statistic about the query execution and EXPLAIN Plan of the query.
At the bottom of the trace file, it is EXPLAIN PLAN of the query. My first question is , does the part "time = 136437 us" show the time duration for the steps of query execution? what does "us" mean ? Is it unit of time?
In addition, can anyone explain what statistics such as count, cpu, elapsed , disk and query mean? I google and read Oracle doc about them already but I still can not understand it. Can anyone clarify the meaning of those stats more clearly?
Thanks in advance. I am new and sorry for my English.

The smallest unit of data access in Oracle Database is a block. Not a row.
Each block can store many rows.
The database can access a block in current or consistent mode.
Current = as the block exists "right now".
Consistent = as the blocked existed at the time your query started.
The query and current columns report how many times the database accessed a block in consistent (query) and current mode.
When accessing a block it may already be in the buffer cache (memory). If so, no disk access is needed. If not, it has to do a physical read (pr). The disk column is a count of the total physical reads.
The stats for each line in the plan are the figures for that operation. Plus the sum of all its child operations.
In simple terms, the database processes the plan by accessing the first child first. Then passes the rows up to the parent. Then all the other child ops of that parent in order. Child operations are indented from their parent in the display.
So the database processed your query like so:
Read 2,000 rows from CUSTOMER. This required 749 consistent block gets and 363 disk reads (cr and pr values on this row). This took 10,100 microseconds.
Read 112,458 rows from BOOKING. This did 8,203 consistent reads and zero disk reads. This took 337,595 microseconds
Joined these two tables together using a hash join. The CR, PR, PW (physical writes) and time values are the sum of the operations below this. Plus whatever work this operation did. So the hash join:
did 8,952 - ( 749 + 8,203 ) = zero consistent reads
did 363 - ( 363 + 0 ) = zero physical reads
took 1,363,447 - ( 10,100 + 337,595 ) = 1,015,752 microseconds to execute
Notice that the CR & PR totals for the hash join match the query and disk totals in the fetch line?
The count column reports the number of times that operation happened. A fetch is a call to the database to get rows. So the client called the database 7,499 times. Each time it received ceil( 112,458 / 7,499 ) = 15 rows.
CPU is the total time in seconds the server's processors were executing that step. Elapsed is the total wall clock time. This is the CPU time + any extra work. Such as disk reads, network time, etc.

Related

Check the execution time of a query accurate to the microsecond

I have a query in SQL Server 2019 that does a SELECT on the primary key fields of a table. This table has about 6 million rows of data in it. I want to know exactly how fast my query is down to the microsecond (or at least the 100 microsecond). My query is faster than a millisecond, but all I can find in SQL server is query measurements accurate to the millisecond.
What I've tried so far:
SET STATISTICS TIME ON
This only shows milliseconds
Wrapping my query like so:
SELECT #Start=SYSDATETIME()
SELECT TOP 1 b.COL_NAME FROM BLAH b WHERE b.key = 0x1234
SELECT #End=SYSDATETIME();
SELECT DATEDIFF(MICROSECOND,#Start,#End)
This shows that no time has elapsed at all. But this isn't accurate because if I add WAITFOR DELAY '00:00:00.001', which should add a measurable millisecond of delay, it still shows 0 for the datediff. Only if I wat for 2 milliseconds do I see it show up in the datediff
Looking up the execution plan and getting the total_worker_time from the sys.dm_exec_query_stats table.
Here I see 600 microseconds, however the microsoft docs seem to indicate that this number cannot be trusted:
total_worker_time ... Total amount of CPU time, reported in microseconds (but only accurate to milliseconds)
I've run out of ideas and could use some help. Does anyone know how I can accurately measure my query in microseconds? Would extended events help here? Is there another performance monitoring tool I could use? Thank you.

This is too long for a comment.
In general, you don't look for performance measurements measured in microseconds. There is just too much variation, based on what else is happening in the database, on the server, and in the network.
Instead, you set up a loop and run the query thousands -- or even millions -- of times and then average over the executions. There are further nuances, such as clearing caches if you want to be sure that the query is using cold caches.

BigQuery. Long execution time on small datasets

I created a new Google cloud project and set up BigQuery data base. I tried different queries, they all are executing too long. Currently we don't have a lot of data, so high performance was expected.
Below are some examples of queries and their execution time.
Query #1 (Job Id bquxjob_11022e81_172cd2d59ba):
select date(installtime) regtime
,count(distinct userclientid) users
,sum(fm.advcost) advspent
from DWH.DimUser du
join DWH.FactMarketingSpent fm on fm.date = date(du.installtime)
group by 1
The query failed in 1 hour + with error "Query exceeded resource limits. 14521.457814668494 CPU seconds were used, and this query must use less than 12800.0 CPU seconds."
Query execution plan: https://prnt.sc/t30bkz
Query #2 (Job Id bquxjob_41f963ae_172cd41083f):
select fd.date
,sum(fd.revenue) adrevenue
,sum(fm.advcost) advspent
from DWH.FactAdRevenue fd
join DWH.FactMarketingSpent fm on fm.date = fd.date
group by 1
Execution time ook 59.3 sec, 7.7 MB processed. What is too slow.
Query Execution plan: https://prnt.sc/t309t4
Query #3 (Job Id bquxjob_3b19482d_172cd31f629)
select date(installtime) regtime
,count(distinct userclientid) users
from DWH.DimUser du
group by 1
Execution time 5.0 sec elapsed, 42.3 MB processed. What is not terrible but must be faster for such small volumes of data.
Tables used :
DimUser - Table size 870.71 MB, Number of rows 2,771,379
FactAdRevenue - Table size 6.98 MB, Number of rows 53,816
FaceMarketingSpent - Table size 68.57 MB, Number of rows 453,600
The question is what am I doing wrong so that query execution time is so big? If everything is ok, I would be glad to hear any advice on how to reduce execution time for such simple queries. If anyone from google reads my question, I would appreciate if jobids are checked.
Thank you!
P.s. Previously I had experience using BigQuery for other projects and the performance and execution time were incredibly good for tables of 50+ TB size.

Posting same reply i've given in the gcp slack workspace:
Both your first two queries looks like you have one particular worker who is overloaded. Can see this because in the compute section, the max time is very different from the avg time. This could be for a number of reasons, but i can see that you are joining a table of 700k+ rows (looking at the 2nd input) to a table of ~50k (looking at the first input). This is not good practice, you should switch it so the larger table is the left most table. see https://cloud.google.com/bigquery/docs/best-practices-performance-compute?hl=en_US#optimize_your_join_patterns
You may also have a heavily skew in your join keys (e.g. 90% of rows are on 1/1/2020, or NULL). check this.
For the third query, that time is expected, try a approx count instead to speed it up. Also note BQ starts to get better if you perform the same query over and over, so this will get quicker.

BigQuery Count Appears to be Processing Data

I noticed that running a SELECT count(*) FROM myTable on my larger BQ tables yields long running times, upwards of 30/40 seconds despite the validator claiming the query processes 0 bytes. This doesn't seem quite right when 500 GB queries run faster. Additionally, total row counts are listed under details -> Table Info. Am I doing something wrong? Is there a way to get total row counts instantly?

When you run a count BigQuery still needs to allocate resources (such as: slot units, shards etc). You might be reaching some limits which cause a delay. For example, the slots default per project is 2,000 units.
BigQuery execution plan provides very detail information about the process which can help you better understand the source of the delay.
One way to overcome this is to use an approximate method described in this link
This Slide by Google might also help you
For more details see this video about how to understand the execution plan

SQL Server 2008: I/O Wait Time per Database File

I am running SQL Server 2008 Enterprise Edition and want to monitor the following performance metrics i.e. via dynamic management views (from within SQL):
Average/Maximum Read/Write I/O Waits in ms per database file
for sliding time window.
That is: 4 numbers per database file: avg read wait, max read wait, avg write wait, max write wait. All in ms, and all for some sane (or even better configurable) sliding time window.
How can I do that?
PS: I have the VIEW SERVER STATE permission and can read sys.dm_os_performance_counters, sys.database_files, sys.dm_io_virtual_file_stats etc etc
PS2: At least 1 tool (Quest Spotlight 7 for SQL Server) is able to provide Max I/O Wait in ms per database file. So there has to be some way ..

Below is the query that SSMS's Activie Monitor uses. They label the io_stall field as total wait time. You could add the fs.io_stall_read_ms and fs.io_stall_write_ms fields to get the read/write specific numbers.
SELECT
d.name AS [Database],
f.physical_name AS [File],
(fs.num_of_bytes_read / 1024.0 / 1024.0) [Total MB Read],
(fs.num_of_bytes_written / 1024.0 / 1024.0) AS [Total MB Written],
(fs.num_of_reads + fs.num_of_writes) AS [Total I/O Count],
fs.io_stall AS [Total I/O Wait Time (ms)],
fs.size_on_disk_bytes / 1024 / 1024 AS [Size (MB)],
fs.io_stall_read_ms
FROM sys.dm_io_virtual_file_stats(default, default) AS fs
INNER JOIN sys.master_files f ON fs.database_id = f.database_id AND fs.file_id = f.file_id
INNER JOIN sys.databases d ON d.database_id = fs.database_id;
This query only gives you the totals. You'd have to run it at some interval and record the results in a temp table with a time stamp. You could then query this table to get your min/max/avg as needed. The sliding time window would just be a function of how much data you keep in that table and what time period you query.

The problem you are going to have is that SQL doesn't necessarily track the level of detail you are looking to get per file. You are probably going to have to use Performance monitor as well. You will have to use a combination approach looking at both performance monitor for details on I/O on the disk over the course of time. As well as the aforementioned SQL monitoring techniques to see get a more complete complete picture. I hope that helps.

You can use a few scripts to pull the metrics. I have enebled the Data Collection/Data Collection server built in to SQL Server 8. It collects the metrics from multiple instances and stores them in a mssql server you designate as the collector. The reports provided for each instance are adequate for most purposes. I am sure there are 3rd party tools that go beyond the capabilities of the data collector as it just reports on performance/disk usage/and query statistics without performance hints.
This gives you a data file growth projection and growth event summary for both logs and data files, however, I do not know if it will give you the metrics you are looking for per file or filegroup:)
NOTE: If you use the data collection warehouse you should consider rebuilding indexes periodically as the size grows. It collects approx. 20 MB/day of data in my senario.

See the following for collecting per file wait statistics http://msdn.microsoft.com/en-us/library/ms187309(v=sql.105).aspx

Is my execution plan trying to trick me?

I am trying to speed up a long running query that I have (takes about 10 minutes to run...). In order to track down what part of the query is costing me the most time I included the Actual Execution Plan when I ran it and found a particular section that was taking up 55% (screen shot below)
alt text http://img109.imageshack.us/img109/9571/53218794.png
This didn't quite seem right to me so I added Print '1' and Print '2' before and after this trouble section. When I run the query for a mere 17 seconds and then cancel it the 1 and 2 print out which I'm assuming means it's getting through that section in the first 17 seconds.
alt text http://img297.imageshack.us/img297/4739/66797633.png
Am I doing something wrong here or is my Execution plan misleading me?

Metrics from perfmon will also help figure out what's going wrong... you could be running into some serious IO issues with the drive your tempDB is residing on. Additionally, run a trace and look at the CPU & IO of the actual run.
Good perfmon metrics to look at are disk queue length (avg & writes).
If you don't have access to perfmon or don't want to trace things, use "SET STATISTICS IO ON" at the beginning of your query and allow it to complete...don't stop it. Just because an execution plan says it's taking over have the cost doesn't mean it will run for half of the query time...it could be much more (or less).

It says Query 10: Query cost (relative to the batch): 55%. Are 100% positive that it is the 10th statement in the batch that you surounded with Print statements? Could the INSERT ... INTO #mpProgramSet2 execute multiple times, some times in under 17 seconds other time for 5 minutes, depending on how much data was selected/inserted?
As a side note you should run with SET STATISTICS TIME ON rather that prints, this will give you exact compile/time and execution time of each statement in the batch.

I wouldn't trust that printing the '1' and '2' will prove anything about what has executed and what has not. I do the same thing, but I just wouldn't rely on it as proof. You could print the ##rowcount from that first insert query - that would indicate for sure that the insert has occurred.
Although the plan says that query may take 55% of the cost, it may not be 55% of the execution time, especially if the query results are cached.
Another advantage of printing the ##rowcount is to compare the actual number of rows to the estimated rows (51K). If they differ by a lot then you might investigate the statistics for your indexes.

We would need the full query to understand what's going on; but I would probably start with setting MAXDOP to 1 in order to limit the number of processors it's running on.
Note that sometimes queries need to be limited to only 1 processor due to locks etc.
Further you might try adding NOLOCKs to any of your selects which can get away with dirty reads.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas