Sporadic Execution Times for Query in SQL Server 2008 - sql

I have been running some speed tests on a query where I insert 10,000 records into a table that has millions (over 24mil) of records. The query (below) will not insert duplicate records.
MERGE INTO [dbo].[tbl1] AS tbl
USING (SELECT col2,col3, max(col4) col4, max(col5) col5, max(col6) col6 FROM #tmp group by col2, col3) AS src
ON (tbl.col2 = src.col2 AND tbl.col3 = src.col3)
WHEN NOT MATCHED THEN
INSERT (col2,col3,col4,col5,col6)
VALUES (src.col2,src.col3,src.col4,src.col5,src.col6);
The execution times of the above query are sporadic; ranging anywhere from 0:02 seconds to 2:00 minutes.
I am running these tests within SQL Server Studio via a script that will create the 10,000 rows of data (into the #tmp table), then the MERGE query above is fired. The point being, the same exact script is executing for each test that I run.
The execution times bounce around from seconds to minutes as in:
Test #1: 0:10 seconds
Test #2: 1:13 minutes
Test #3: 0:02 seconds
Test #4: 1:56 minutes
Test #5: 0:05 seconds
Test #6: 1:22 minutes
One metric that I find interesting is that the seconds/minutes alternating sequence is relatively consistent - i.e. every other test the results are in seconds.
Can you give me any clues as to what may be causing this query to have such sporadic execution times?

I wish I could say what the cause of the sporadic execution times was, but I can say what I did to work around the problem...
I created a new database and target table and added 25 million records to the target table. Then I ran my original tests on the new database/table by repeatedly inserting 10k records into the target table. The results were consistent execution times of aprox 0:07 seconds (for each 10k insert).
For kicks I did the exact same testing on a machine that has twice as much CPU/Memory than my dev laptop. The results were consistent execution times of 0:00 seconds (It's time for a new dev machine ;))
I dislike not discovering the cause to the problem, but in this case I'm going to have to call it good and move on. Hopefully, someday, a StackO die-hard can update this question with a good answer.

Related

Check the execution time of a query accurate to the microsecond

I have a query in SQL Server 2019 that does a SELECT on the primary key fields of a table. This table has about 6 million rows of data in it. I want to know exactly how fast my query is down to the microsecond (or at least the 100 microsecond). My query is faster than a millisecond, but all I can find in SQL server is query measurements accurate to the millisecond.
What I've tried so far:
SET STATISTICS TIME ON
This only shows milliseconds
Wrapping my query like so:
SELECT #Start=SYSDATETIME()
SELECT TOP 1 b.COL_NAME FROM BLAH b WHERE b.key = 0x1234
SELECT #End=SYSDATETIME();
SELECT DATEDIFF(MICROSECOND,#Start,#End)
This shows that no time has elapsed at all. But this isn't accurate because if I add WAITFOR DELAY '00:00:00.001', which should add a measurable millisecond of delay, it still shows 0 for the datediff. Only if I wat for 2 milliseconds do I see it show up in the datediff
Looking up the execution plan and getting the total_worker_time from the sys.dm_exec_query_stats table.
Here I see 600 microseconds, however the microsoft docs seem to indicate that this number cannot be trusted:
total_worker_time ... Total amount of CPU time, reported in microseconds (but only accurate to milliseconds)
I've run out of ideas and could use some help. Does anyone know how I can accurately measure my query in microseconds? Would extended events help here? Is there another performance monitoring tool I could use? Thank you.
This is too long for a comment.
In general, you don't look for performance measurements measured in microseconds. There is just too much variation, based on what else is happening in the database, on the server, and in the network.
Instead, you set up a loop and run the query thousands -- or even millions -- of times and then average over the executions. There are further nuances, such as clearing caches if you want to be sure that the query is using cold caches.

BigQuery. Long execution time on small datasets

I created a new Google cloud project and set up BigQuery data base. I tried different queries, they all are executing too long. Currently we don't have a lot of data, so high performance was expected.
Below are some examples of queries and their execution time.
Query #1 (Job Id bquxjob_11022e81_172cd2d59ba):
select date(installtime) regtime
,count(distinct userclientid) users
,sum(fm.advcost) advspent
from DWH.DimUser du
join DWH.FactMarketingSpent fm on fm.date = date(du.installtime)
group by 1
The query failed in 1 hour + with error "Query exceeded resource limits. 14521.457814668494 CPU seconds were used, and this query must use less than 12800.0 CPU seconds."
Query execution plan: https://prnt.sc/t30bkz
Query #2 (Job Id bquxjob_41f963ae_172cd41083f):
select fd.date
,sum(fd.revenue) adrevenue
,sum(fm.advcost) advspent
from DWH.FactAdRevenue fd
join DWH.FactMarketingSpent fm on fm.date = fd.date
group by 1
Execution time ook 59.3 sec, 7.7 MB processed. What is too slow.
Query Execution plan: https://prnt.sc/t309t4
Query #3 (Job Id bquxjob_3b19482d_172cd31f629)
select date(installtime) regtime
,count(distinct userclientid) users
from DWH.DimUser du
group by 1
Execution time 5.0 sec elapsed, 42.3 MB processed. What is not terrible but must be faster for such small volumes of data.
Tables used :
DimUser - Table size 870.71 MB, Number of rows 2,771,379
FactAdRevenue - Table size 6.98 MB, Number of rows 53,816
FaceMarketingSpent - Table size 68.57 MB, Number of rows 453,600
The question is what am I doing wrong so that query execution time is so big? If everything is ok, I would be glad to hear any advice on how to reduce execution time for such simple queries. If anyone from google reads my question, I would appreciate if jobids are checked.
Thank you!
P.s. Previously I had experience using BigQuery for other projects and the performance and execution time were incredibly good for tables of 50+ TB size.
Posting same reply i've given in the gcp slack workspace:
Both your first two queries looks like you have one particular worker who is overloaded. Can see this because in the compute section, the max time is very different from the avg time. This could be for a number of reasons, but i can see that you are joining a table of 700k+ rows (looking at the 2nd input) to a table of ~50k (looking at the first input). This is not good practice, you should switch it so the larger table is the left most table. see https://cloud.google.com/bigquery/docs/best-practices-performance-compute?hl=en_US#optimize_your_join_patterns
You may also have a heavily skew in your join keys (e.g. 90% of rows are on 1/1/2020, or NULL). check this.
For the third query, that time is expected, try a approx count instead to speed it up. Also note BQ starts to get better if you perform the same query over and over, so this will get quicker.

how to handle query execution time (performance issue ) in oracle

I have situation need to execute patch script for million row of data.The current query execution time is not meet the expectation for few rows (18000) which is take around 4 hours( testing data before deploy for live ).
The patch script actually select million row of data in loop and update according to the specification , im just wonder how long it could take for million row of data since it take around 4 hour for just 18000 rows.
to overcome this problem im decided to create temp table hold the entire select statement data and proceed with the patch process using the temp table where the process could be bit faster compare select and update.
is there any other ways i can use to handle this situation ? Any suggestion and ways to solve this.
(Due to company policy im unable to post the PL/SQl script here )
seems there is no one can answer my question here i post my answer.
In oracle there is Parallel Execution which is allows spreading the processing of a single SQL statement across multiple threads.
So by using this method i solved my long running query (4 hours ) to 6 minz ..
For more information :
https://docs.oracle.com/cd/E11882_01/server.112/e25523/parallel002.htm
http://www.oracle.com/technetwork/articles/database-performance/geist-parallel-execution-1-1872400.html

SQL Server Timeout based on locking?

I've got a SQL query that normally runs for about .5 seconds and now, as our server gets slightly more busy, it is timing out. It involves a join on a table that is getting updated every couple seconds as a matter of course.
That is, the EmailDetails table gets updated from a mail service to notify me of reads,clicks, etc and that table gets updated every couple seconds as those notifications come through. I'm wondering if the join below is timing out because those updates are locking the table and blocking my SQL join.
If so, any suggestions how to avoid my timeout would be appreciated.
SELECT TOP 25
dbo.EmailDetails.Id,
dbo.EmailDetails.AttendeesId,
dbo.EmailDetails.Subject,
dbo.EmailDetails.BodyText,
dbo.EmailDetails.EmailFrom,
dbo.EmailDetails.EmailTo,
dbo.EmailDetails.EmailDetailsGuid,
dbo.Attendees.UserFirstName,
dbo.Attendees.UserLastName,
dbo.Attendees.OptInTechJobKeyWords,
dbo.Attendees.UserZipCode,
dbo.EmailDetails.EmailDetailsTopicId,
dbo.EmailDetails.EmailSendStatus,
dbo.EmailDetails.TextTo
FROM
dbo.EmailDetails
LEFT OUTER JOIN
dbo.Attendees ON (dbo.EmailDetails.AttendeesId = dbo.Attendees.Id)
WHERE
(dbo.EmailDetails.EmailSendStatus = 'NEEDTOSEND' OR
dbo.EmailDetails.EmailSendStatus = 'NEEDTOTEXT')
AND
dbo.EmailDetails.EmailDetailsTopicId IS NOT NULL
ORDER BY
dbo.EmailDetails.EmailSendPriority,
dbo.EmailDetails.Id DESC
Adding Execution Plan:
https://dl.dropbox.com/s/bo6atz8bqv68t0i/emaildetails.sqlplan?dl=0
It takes .5 seconds on my fast macbook but on my real server with magnetic media it takes 8 seconds.

Understanding the Sql Server execution time

I'm using an SQL Server 2012 and SET STATISTICS TIME ON to measure the CPU-time for my sql statements. I use this because i only want to get the time the database needs to execute the statement.
When returning large data from a select, i noticed the CPU-time going up pretty high, like using TOP 2000 will need about 400ms, but without it will need about 10000ms CPU-time.
What i'm not sure about is:
Is it possible that the CPU-time i get returned includes something like the time it needs to display the millions of rows returned in my Sql Server Management Studio? That would be pretty much of a bad situation.
Update:
The time i want to recieve is the execution time of the sql server without the time needed for the ssms to display the rows. There are several time statistics display in the Client statistics , but after searching for a long time it's really hard to find good references explaining what they are. Any suggestions?
Idea: elapsed time(sql server execution time) - client processing time (Client statistics)
Maybe this is an option?
In a multi-threaded world, CPU time is increasingly less helpful for simple tuning. Execution time is worth looking at.
To see if execution time (elapsed time) spent on displaying results is included you could SELECT TOP 2000 * INTO #temp to compare execution times.
Update:
My quick tests suggest the overhead of creating/inserting into a #temp table outweighs that of displaying results (at 5000). When I go to 50,000 results the SELECT INTO runs more quickly. The counts at which the two become equivalent depends on how many and what type of fields are returned. I tested with:
SET STATISTICS TIME ON
SELECT TOP 50000 NEWID()
FROM master..spt_values v1, master..spt_values v2
WHERE v1.number > 100
SET STATISTICS TIME OFF
-- CPU time = 32 ms, elapsed time = 121 ms.
SET STATISTICS TIME ON
SELECT TOP 50000 NEWID() col1
INTO #test
FROM master..spt_values v1, master..spt_values v2
WHERE v1.number > 100
SET STATISTICS TIME OFF
-- CPU time = 15 ms, elapsed time = 87 ms.
CPU time in SET STATISTICS TIME ON only counts the time that SQL Server needs to execute the query. It doesn't include any time the client takes to render the results. It also excludes any time SQL Server spends waiting for buffers to clear. In short, it really is pretty independent of the client.