Is my execution plan trying to trick me? - sql

I am trying to speed up a long running query that I have (takes about 10 minutes to run...). In order to track down what part of the query is costing me the most time I included the Actual Execution Plan when I ran it and found a particular section that was taking up 55% (screen shot below)
alt text http://img109.imageshack.us/img109/9571/53218794.png
This didn't quite seem right to me so I added Print '1' and Print '2' before and after this trouble section. When I run the query for a mere 17 seconds and then cancel it the 1 and 2 print out which I'm assuming means it's getting through that section in the first 17 seconds.
alt text http://img297.imageshack.us/img297/4739/66797633.png
Am I doing something wrong here or is my Execution plan misleading me?

Metrics from perfmon will also help figure out what's going wrong... you could be running into some serious IO issues with the drive your tempDB is residing on. Additionally, run a trace and look at the CPU & IO of the actual run.
Good perfmon metrics to look at are disk queue length (avg & writes).
If you don't have access to perfmon or don't want to trace things, use "SET STATISTICS IO ON" at the beginning of your query and allow it to complete...don't stop it. Just because an execution plan says it's taking over have the cost doesn't mean it will run for half of the query time...it could be much more (or less).

It says Query 10: Query cost (relative to the batch): 55%. Are 100% positive that it is the 10th statement in the batch that you surounded with Print statements? Could the INSERT ... INTO #mpProgramSet2 execute multiple times, some times in under 17 seconds other time for 5 minutes, depending on how much data was selected/inserted?
As a side note you should run with SET STATISTICS TIME ON rather that prints, this will give you exact compile/time and execution time of each statement in the batch.

I wouldn't trust that printing the '1' and '2' will prove anything about what has executed and what has not. I do the same thing, but I just wouldn't rely on it as proof. You could print the ##rowcount from that first insert query - that would indicate for sure that the insert has occurred.
Although the plan says that query may take 55% of the cost, it may not be 55% of the execution time, especially if the query results are cached.
Another advantage of printing the ##rowcount is to compare the actual number of rows to the estimated rows (51K). If they differ by a lot then you might investigate the statistics for your indexes.

We would need the full query to understand what's going on; but I would probably start with setting MAXDOP to 1 in order to limit the number of processors it's running on.
Note that sometimes queries need to be limited to only 1 processor due to locks etc.
Further you might try adding NOLOCKs to any of your selects which can get away with dirty reads.

Related

BigQuery - how to decrease slot time of Coalesce execution step?

I have a pretty complex query, with about 70 execution steps. The query was somewhat optimized for performance, so most of the steps in the execution plan run pretty fast - except Coalesce steps, which take about 10 to 100 times more slot time, compared to the others. As far as I understand, it prepares data for the following Join step, but why does it take so long even if actual nuber of records processed by this step is low? The most extreme case I saw looks like this (ZERO records processed by this step, but still takes 8 seconds! ):
S46: Coalesce
Slot time: 8223 ms
Duration: 92 ms
Bytes Shuffled: 0 B
I wasn't able to find any hints regarding this "Coalesce" step and ways to optimize it in Google documentation, so perhaps you can give me some advice about it or point to actual documentation that explains it.

Check the execution time of a query accurate to the microsecond

I have a query in SQL Server 2019 that does a SELECT on the primary key fields of a table. This table has about 6 million rows of data in it. I want to know exactly how fast my query is down to the microsecond (or at least the 100 microsecond). My query is faster than a millisecond, but all I can find in SQL server is query measurements accurate to the millisecond.
What I've tried so far:
SET STATISTICS TIME ON
This only shows milliseconds
Wrapping my query like so:
SELECT #Start=SYSDATETIME()
SELECT TOP 1 b.COL_NAME FROM BLAH b WHERE b.key = 0x1234
SELECT #End=SYSDATETIME();
SELECT DATEDIFF(MICROSECOND,#Start,#End)
This shows that no time has elapsed at all. But this isn't accurate because if I add WAITFOR DELAY '00:00:00.001', which should add a measurable millisecond of delay, it still shows 0 for the datediff. Only if I wat for 2 milliseconds do I see it show up in the datediff
Looking up the execution plan and getting the total_worker_time from the sys.dm_exec_query_stats table.
Here I see 600 microseconds, however the microsoft docs seem to indicate that this number cannot be trusted:
total_worker_time ... Total amount of CPU time, reported in microseconds (but only accurate to milliseconds)
I've run out of ideas and could use some help. Does anyone know how I can accurately measure my query in microseconds? Would extended events help here? Is there another performance monitoring tool I could use? Thank you.
This is too long for a comment.
In general, you don't look for performance measurements measured in microseconds. There is just too much variation, based on what else is happening in the database, on the server, and in the network.
Instead, you set up a loop and run the query thousands -- or even millions -- of times and then average over the executions. There are further nuances, such as clearing caches if you want to be sure that the query is using cold caches.

BigQuery Count Appears to be Processing Data

I noticed that running a SELECT count(*) FROM myTable on my larger BQ tables yields long running times, upwards of 30/40 seconds despite the validator claiming the query processes 0 bytes. This doesn't seem quite right when 500 GB queries run faster. Additionally, total row counts are listed under details -> Table Info. Am I doing something wrong? Is there a way to get total row counts instantly?
When you run a count BigQuery still needs to allocate resources (such as: slot units, shards etc). You might be reaching some limits which cause a delay. For example, the slots default per project is 2,000 units.
BigQuery execution plan provides very detail information about the process which can help you better understand the source of the delay.
One way to overcome this is to use an approximate method described in this link
This Slide by Google might also help you
For more details see this video about how to understand the execution plan

SQL SentryOne Plan Explorer, What is Duration?

I've just started using the SentryOne Plan Explorer to help tune my SQL Server queries, and have a question, I can't seem to find an answer for. What is Duration?
I would think it's the total time it took for the query to run. However, every query I am testing goes much longer in real-time than what ends up showing under Duration.
Below is a screenshot of what I'm seeing. Watching the query run takes over 2 minutes, but the final duration ends up being .770?
Thanks for any insight!
This is the answer provided by SentryOne:
While a query is running, we show clock time on the status bar. However, at the end, we sum up the total duration, in milliseconds, as reported by the trace rows we collected. We subtract duration from any trace rows that are discarded (e.g. events that don't generate plans, like WAITFOR).

Measure query execution time excluding start-up cost in postgres

I want to measure the total time taken by postgres to execute my query excluding the start-up cost. Earlier I was using \timing but now I found \timing includes start-up cost.
I also tried: "explain analyze" in which I found that actual time is specified in a particular format like: actual time=12.04..12.09
So, does this mean that the time taken to execute postgres query excluding start-up time is 0.05. If not, then is there a way to exclude start-up costs and measure query execution time?
What you want is actually quite ill-defined.
"Startup cost" could mean:
network connection overhead and back-end start cost of establishing a new connection. Avoided by re-using the same session.
network round-trip times for sending the query and getting the results. Avoided by measuring the timing server-side with log_statement_min_duration = 0 or (with timing overhead) using explain analyze or the auto_explain module.
Query planning time. Avoided by PREPAREing the query, then timing only the subsequent EXECUTE.
Lock acquisition time. There is not currently any way to exclude this.
Note that using EXPLAIN ANALYZE may not be ideal for your purposes: it throws the query result away, and it adds its own costs because of the detailed timing it does. I would set log_statement_min_duration = 0, set client_min_messages appropriately, and capture the timings from the log output.
So it sounds like you want to PREPARE a query then EXPLAIN ANALYZE EXECUTE it or just EXECUTE it with log_statement_min_duration set to 0.
For exploring PLANNING costs and EXECUTE costs separately you need to set on several postgres.conf parameters:
log_planner_stats = on
log_executor_stats = on
and explore your log file.
Update:
1. find your config file location with executing:
SHOW config_file;
2. Set parameters. Don't foget to remove comment-symbol '#'.
3. Restart postgresql service
4. Execute your query
5. Explore your log file.