I need to measure the duration of several queries, like this:
declare #dtStart1 as datetime;
declare #dtStart2 as datetime;
declare #dtStart3 as datetime;
declare #dtStart4 as datetime;
declare #dtStart5 as datetime;
declare #dtStart6 as datetime;
declare #dtStart7 as datetime;
declare #dtStart8 as datetime;
declare #dtStart9 as datetime;
declare #dtStart10 as datetime;
declare #duration1 as int;
declare #duration2 as int;
declare #duration3 as int;
declare #duration4 as int;
declare #duration5 as int;
declare #duration6 as int;
declare #duration7 as int;
declare #duration8 as int;
declare #duration9 as int;
declare #duration10 as int;
set #dtStart1 = (select getutcdate());
--query1
set #duration1 = (select datediff(millisecond, #dtStart1, GETUTCDATE()));
set #dtStart2 = (select getutcdate());
--query2
set #duration2 = (select datediff(millisecond, #dtStart2, GETUTCDATE()));
set #dtStart3 = (select getutcdate());
--query3
set #duration3 = (select datediff(millisecond, #dtStart3, GETUTCDATE()));
set #dtStart4 = (select getutcdate());
--query4
set #duration4 = (select datediff(millisecond, #dtStart4, GETUTCDATE()));
set #dtStart5 = (select getutcdate());
--query5
set #duration5 = (select datediff(millisecond, #dtStart5, GETUTCDATE()));
set #dtStart6 = (select getutcdate());
--query6
set #duration6 = (select datediff(millisecond, #dtStart6, GETUTCDATE()));
set #dtStart7 = (select getutcdate());
--query7
set #duration7 = (select datediff(millisecond, #dtStart7, GETUTCDATE()));
set #dtStart8 = (select getutcdate());
--query8
set #duration8 = (select datediff(millisecond, #dtStart8, GETUTCDATE()));
set #dtStart9 = (select getutcdate());
--query9
set #duration9 = (select datediff(millisecond, #dtStart9, GETUTCDATE()));
set #dtStart10 = (select getutcdate());
--query10
set #duration10 = (select datediff(millisecond, #dtStart10, GETUTCDATE()));
select #duration1 / 1000.0 as q1,
#duration2 / 1000.0 as q2,
#duration3 / 1000.0 as q3,
#duration4 / 1000.0 as q4,
#duration5 / 1000.0 as q5,
#duration6 / 1000.0 as q6,
#duration7 / 1000.0 as q7,
#duration8 / 1000.0 as q8,
#duration9 / 1000.0 as q9,
#duration10 / 1000.0 as q10;
The problem is that besides the results I am actually interested in I get the results of the queries as well. I have tried using a cursor for each query, but it was instant even for long queries. Possibly it was just defined but not executed. I have tried setting FMTONLY to ON and then to OFF, but that was instant as well for long queries and has even shown the column names in the result. I would like to achieve executing the queries, getting the durations just like in the case the queries were normally unning, but not giving it back to my application server, where it will be problematic to handle the millions of records the queries might return, not to mention the huge waste of memory compared to the ideal result of getting just the one row that I am interested in, namely the results.
Few options come to mind.
One obvious way to suppress the query result set is to insert result of the query into a #temp table and then drop the temp table. This will affect the query run time, but relatively easy to implement. Simply add INTO #temp clause after the SELECT of your query. The calling application doesn't need to change.
Change the calling application and make it expect these result sets. Measure the "time-to-first-row" and once application receives the first row stop the query. It would be a rather significant task to implement.
Change the query, so that its results are stored in variables, not temp table. One variable per column.
Note: As Martin Smith pointed out in the comments, assigning column values into variables may change the shape of the plan, as shown in his answer to the question: sql execution latency when assign to a variable, so you should use option 3 with caution.
For example, if you have a query
SELECT
Col1
,Col2
,Col3
FROM YourTable
... some complex logic
;
Change it to the following:
DECLARE #VarCol1 bigint;
DECLARE #VarCol2 int;
DECLARE #VarCol3 datetime2(0);
-- use appropriate types that match the query columns
SELECT
#VarCol1 = Col1
,#VarCol2 = Col2
,#VarCol3 = Col3
FROM YourTable
... some complex logic
;
Such query will run in full (as opposed to wrapping the query in SELECT COUNT(*)), but its results will be stored in the local variables. Each new row will overwrite the variable values, but it should be less overhead than using #temp table.
You can easily verify and compare methods 1 and 3 by adding
SET STATISTICS TIME ON;
SET STATISTICS IO ON;
before the query and
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
after the query.
Try to run your original query, query with saving result into the #temp table, query with saving result into the variables and compare the CPU and reads.
In my tests the number of reads was the same for normal query and query that saved results into variables. The query with variables was much faster in elapsed time, but had similar CPU time, because there was no network traffic.
The query that saved results into temp table had more reads and was a bit slower than query that saved results into variables.
I have a large table and my test query simply reads 1M rows from it:
SELECT
TOP (1000000)
[ID]
,[ElevatorID]
,[TimestampUTC]
FROM [dbo].[ArchivePlaybackStatsDay];
DECLARE #VarID bigint;
DECLARE #VarElevatorID int;
DECLARE #VarTimestampUTC int;
SELECT
TOP (1000000)
#VarID = [ID]
,#VarElevatorID = [ElevatorID]
,#VarTimestampUTC = [TimestampUTC]
FROM [dbo].[ArchivePlaybackStatsDay];
SELECT
TOP (1000000)
[ID]
,[ElevatorID]
,[TimestampUTC]
INTO #Temp
FROM [dbo].[ArchivePlaybackStatsDay];
DROP TABLE #Temp;
I ran it in SQL Sentry Plan Explorer and got these stats:
You can see that Reads of the first and second rows are the same, CPU is close, but Duration is very different, because first query actually transfers 1M rows to the client. The third query that uses #temp table has some extra overhead compared to the second query that uses variables.
I added another variant that converts all columns into varbinary variable to unify variable declarations. Unfortunately, conversion into varbinary and especially varbinary(max) had a noticeable overhead.
DECLARE #VarBin varbinary(8000);
SELECT
TOP (1000000)
#VarBin = [ID]
,#VarBin = [ElevatorID]
,#VarBin = [TimestampUTC]
FROM [dbo].[ArchivePlaybackStatsDay];
DECLARE #VarBinMax varbinary(max);
SELECT
TOP (1000000)
#VarBinMax = [ID]
,#VarBinMax = [ElevatorID]
,#VarBinMax = [TimestampUTC]
FROM [dbo].[ArchivePlaybackStatsDay];
You may try to get very approx measure by
declare #dummyCounter as int;
set #dummyCounter = (Select count(*)
from (
/* original query */
) t);
Definetly it may have a different plan from the original one
Related
I wrote a simple loop code in SQL Server to measure performance of my SQL query. The measurement times I get from that code are in 95% of times as expected, very similar to each other, but sometimes the time can be 3-4 times higher than the normal one. There is no other process obtaining database at the same time.
declare #tTOTAL int = 0
declare #i integer = 0
declare #itrs integer = 100
while #i < #itrs
begin
CHECKPOINT; DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE;
declare #t0 datetime2 = GETDATE()
// Here goes the query to measure its performance
SELECT COUNT(*) FROM [Comments] AS [c] WHERE [c].[Score] > 100
declare #t1 datetime2 = GETDATE()
set #tTotal = DATEDIFF(MILLISECOND,#t0,#t1)
select #tTotal as TimeT
set #i = #i + 1
end
If that is not the perfect way to measure query performance do you recommend any tool or way to do simple measurements as I want to do? (I need to get result of every single measurement, not only averages ...)
New to stored procedures. Can anyone explain the following SQL sample which appears at the start of a stored procedure?
Begin/End - Encloses a series of SQL statements so that a group of SQL statements can be executed
SET NOCOUNT ON - the count (indicating the number of rows affected by a SQL statement) is not returned.
DECLARE - setting local variables
While - loops round
With - unsure
Update batch - unsure
SET #Rowcount = ##ROWCOUNT; - unsure
BEGIN
SET NOCOUNT ON;
--UPDATE, done in batches to minimise locking
DECLARE #Batch INT= 100;
DECLARE #Rowcount INT= #Batch;
WHILE #Rowcount > 0
BEGIN
WITH t
AS (
SELECT [OrganisationID],
[PropertyID],
[QuestionID],
[BaseAnsweredQuestionID]
FROM dbo.Unioned_Table
WHERE organisationid = 1),
s
AS (
SELECT [OrganisationID],
[PropertyID],
[QuestionID],
[BaseAnsweredQuestionID]
FROM dbo.table
WHERE organisationid = 1),
batch
AS (
SELECT TOP (#Batch) T.*,
s.BaseAnsweredQuestionID NewBaseAnsweredQuestionID
FROM T
INNER JOIN s ON t.organisationid = s.organisationid
AND t.PropertyID = s.PropertyID
AND t.QuestionID = s.QuestionID
WHERE t.BaseAnsweredQuestionID <> s.BaseAnsweredQuestionID)
UPDATE batch
SET
BaseAnsweredQuestionID = NewBaseAnsweredQuestionID
SET #Rowcount = ##ROWCOUNT;
END;
The clue is in the comment --UPDATE, done in batches to minimise locking.
The intent is to update dbo.table's column BaseAnsweredQuestionID with the equivalent column from dbo.Unioned_Table, in batches of 100. The comment suggests the batching logic is necessary to prevent locking.
In detail:
DECLARE #Batch INT= 100; sets the batch size.
DECLARE #Rowcount INT= #Batch; initializes the loop.
WHILE #Rowcount > 0 starts the loop. #Rowcount will become zero when the update statement affects no rows (see below).
with a as () is a common table expression (commonly abbreviated to CTE) - it creates a temporary result set which you can effectively treat as a table. The next few queries define CTEs t, s and batch.
CTE batch contains just 100 rows by using the SELECT TOP (#Batch) term - it selects a random 100 rows from the two other CTEs.
The next statement:
UPDATE batch
SET BaseAnsweredQuestionID = NewBaseAnsweredQuestionID
SET #Rowcount = ##ROWCOUNT
updates the 100 rows in the batch CTE (which in turn is a join on two other CTEs), and populates the loop variable #Rowcount with the number of rows affected by the update statement (##ROWCOUNT). If there are no matching rows, ##ROWCOUNT becomes zero, and thus the loop ends.
I'm a C# developer trying to become more familiar with SQL Server stored procedures.
I'm a little confused as to why the syntax in "A" works and "B" does not work with Set #id. What is happening here that makes "B" require Select instead of Set?
Example A (works)
DECLARE #currDateTime DateTime
SET #currDateTime = GetDate()
SELECT #currDateTime
Example B (does not work)
DECLARE #id int
SET #id = ID FROM [MyTable] WHERE [Field1] = 'Test'
Example C (works)
DECLARE #id int
SELECT #id = ID
FROM [MyTable]
WHERE [Field1] = 'Test'
SELECT is a built-in type of SQL clause that runs a query and returns a result-set in the format of a table or it assigns variables to the results from a query.
SET is a clause that sets a variable.
The two are very different. SELECT has various other associated clauses, such as FROM, WHERE and so on; SET does not. SELECT returns values as a result table in its normal usage; SET does not.
Admittedly, both look the same in an expression such as:
set #currDateTime = GetDate();
select #currDateTime = GetDate();
However, it is really a coincidence that the syntax for setting a single value happens to look the same.
It doesn't work because it's incorrect SQL syntax, You need SELECT when fetching data from table/view/table function.
You could use SET when using an expression though i.e:
DECLARE #Id bigint
SET #Id = (SELECT TOP 1 Id
FROM MyTable
WHERE Field1 = 'Test')
Is it possibile to count the number of result that a query returns using sp_executesql without executing query?
What I mean:
I have a procedure that gets a sql query in string.
Example:
SELECT KolumnaA FROM Users WHERE KolumnaA > 5
I would like to assign count of how many results this query will return, and store it in a variable, but I do not want to actually execute the query.
I cannot use this solution:
EXECUTE sp_executesql #sql
SET #allCount = ##rowcount
because it returns the query result, in addition to getting the count of returned rows.
Can you somehow generate another query from the above one like this
SELECT count(*) FROM Uzytkownicy WHERE KolumnaA > 5
and then execute that?
In general case...
SELECT COUNT(*) FROM ( <your query> )
...which in your case can be simplified into:
SELECT COUNT(*) FROM Users WHERE KolumnaA > 5
The reason it can't be done cheaper is that there are no hidden "counters" inside the data managed by the DBMS. The DBMS won't even know the total number of rows in the table, let alone the number of rows fulfilling a criteria that is not known in advance (such as KolumnaA > 5).
So, counting requires actually finding the data, so it requires the "real" query. Fortunately, all this happens on the server and only a minute amount of data is transferred to the client (the count itself), so assuming your data is properly indexed it should be pretty fast.
Be careful about consistency though: just because the counting query returned certain count, does not mean that the "real" query will return the same number of rows (in the environment where multiple clients may be modifying the data concurrently).
I think there are two parts to the question, which I will address.
The first part of the question is how to return row counts instead of the query results. This is done using Count(item). Using Count(1) instead of Count(KolumnaA) can be slightly faster, since it just counts the number of rows to be returned, instead of retreiving a specific column.
SELECT Count(1) FROM Users WHERE KolumnaA > 5
The second part is assigning this to a variable. If you need to use sp_executesql, you can do as follows:
Declare #sql varchar(4000)
Declare #allCount int
Set #sql = 'SELECT 1 FROM Users WHERE KolumnaA > 5'
sp_executesql(#sql)
SET #allCount = ##rowcount
Alternatively, you can try to use the sp_executesql output feature:
DECLARE #allCount int
EXEC sp_executesql
N'#allCount = SELECT Count(1) FROM Users WHERE KolumnaA > 5',
'#allCount int OUTPUT',
#allCount OUTPUT
Use:
SELECT COUNT(1) FROM Uzytkownicy WHERE KolumnaA > 5
Yes, it does execute a query. But it doesn't return results other than the number of rows.
Otherwise, I don't see how you can avoid returning results.
It will execute the query but it will return only the count not the actual result.
SELECT count(*) FROM Uzytkownicy WHERE KolumnaA > 5
Here is the result that I have found:
Getting Rowcount within sp_executesql
Basically, re-write your query as follows:
DECLARE #SQL NVARCHAR(1000)
DECLARE #Count INT
SET #SQL = 'SELECT KolumnaA FROM Users WHERE KolumnaA > 5; SELECT #Count = ##ROWCOUNT;'
DECLARE #Params NVARCHAR(100)
SET #Params = '#Count INT OUTPUT'
EXEC sp_executesql #SQL, #Params, #Count = #Count OUTPUT
PRINT #Count --should return the number of rows
Cheers.
I have a table in SQL Server 2005 which has approx 4 billion rows in it. I need to delete approximately 2 billion of these rows. If I try and do it in a single transaction, the transaction log fills up and it fails. I don't have any extra space to make the transaction log bigger. I assume the best way forward is to batch up the delete statements (in batches of ~ 10,000?).
I can probably do this using a cursor, but is the a standard/easy/clever way of doing this?
P.S. This table does not have an identity column as a PK. The PK is made up of an integer foreign key and a date.
You can 'nibble' the delete's which also means that you don't cause a massive load on the database. If your t-log backups run every 10 mins, then you should be ok to run this once or twice over the same interval. You can schedule it as a SQL Agent job
try something like this:
DECLARE #count int
SET #count = 10000
DELETE FROM table1
WHERE table1id IN (
SELECT TOP (#count) tableid
FROM table1
WHERE x='y'
)
What distinguishes the rows you want to delete from those you want to keep? Will this work for you:
while exists (select 1 from your_table where <your_condition>)
delete top(10000) from your_table
where <your_condition>
In addition to putting this in a batch with a statement to truncate the log, you also might want to try these tricks:
Add criteria that matches the first column in your clustered index in addition to your other criteria
Drop any indexes from the table and then put them back after the delete is done if that's possible and won't interfere with anything else going on in the DB, but KEEP the clustered index
For the first point above, for example, if your PK is clustered then find a range which approximately matches the number of rows that you want to delete each batch and use that:
DECLARE #max_id INT, #start_id INT, #end_id INT, #interval INT
SELECT #start_id = MIN(id), #max_id = MAX(id) FROM My_Table
SET #interval = 100000 -- You need to determine the right number here
SET #end_id = #start_id + #interval
WHILE (#start_id <= #max_id)
BEGIN
DELETE FROM My_Table WHERE id BETWEEN #start_id AND #end_id AND <your criteria>
SET #start_id = #end_id + 1
SET #end_id = #end_id + #interval
END
Sounds like this is one-off operation (I hope for you) and you don't need to go back to a state that's halfway this batched delete - if that's the case why don't you just switch to SIMPLE transaction mode before running and then back to FULL when you're done?
This way the transaction log won't grow as much. This might not be ideal in most situations but I don't see anything wrong here (assuming as above you don't need to go back to a state that's in between your deletes).
you can do this in your script with smt like:
ALTER DATABASE myDB SET RECOVERY FULL/SIMPLE
Alternatively you can setup a job to shrink the transaction log every given interval of time - while your delete is running. This is kinda bad but I reckon it'd do the trick.
Well, if you were using SQL Server Partitioning, say based on the date column, you would have possibly switched out the partitions that are no longer required. A consideration for a future implementation perhaps.
I think the best option may be as you say, to delete the data in smaller batches, rather than in one hit, so as to avoid any potential blocking issues.
You could also consider the following method:
Copy the data to keep into a temporary table
Truncate the original table to purge all data
Move everything from the temporary table back into the original table
Your indexes would also be rebuilt as the data was added back to the original table.
I would do something similar to the temp table suggestions but I'd select into a new permanent table the rows you want to keep, drop the original table and then rename the new one. This should have a relatively low tran log impact. Obviously remember to recreate any indexes that are required on the new table after you've renamed it.
Just my two p'enneth.
Here is my example:
-- configure script
-- Script limits - transaction per commit (default 10,000)
-- And time to allow script to run (in seconds, default 2 hours)
--
DECLARE #MAX INT
DECLARE #MAXT INT
--
-- These 4 variables are substituted by shell script.
--
SET #MAX = $MAX
SET #MAXT = $MAXT
SET #TABLE = $TABLE
SET #WHERE = $WHERE
-- step 1 - Main loop
DECLARE #continue INT
-- deleted in one transaction
DECLARE #deleted INT
-- deleted total in script
DECLARE #total INT
SET #total = 0
DECLARE #max_id INT, #start_id INT, #end_id INT, #interval INT
SET #interval = #MAX
SELECT #start_id = MIN(id), #max_id = MAX(id) from #TABLE
SET #end_id = #start_id + #interval
-- timing
DECLARE #start DATETIME
DECLARE #now DATETIME
DECLARE #timee INT
SET #start = GETDATE()
--
SET #continue = 1
IF OBJECT_ID (N'EntryID', 'U') IS NULL
BEGIN
CREATE TABLE EntryID (startid INT)
INSERT INTO EntryID(startid) VALUES(#start_id)
END
ELSE
BEGIN
SELECT #start_id = startid FROM EntryID
END
WHILE (#continue = 1 AND #start_id <= #max_id)
BEGIN
PRINT 'Start issued: ' + CONVERT(varchar(19), GETDATE(), 120)
BEGIN TRANSACTION
DELETE
FROM #TABLE
WHERE id BETWEEN #start_id AND #end_id AND #WHERE
SET #deleted = ##ROWCOUNT
UPDATE EntryID SET EntryID.startid = #end_id + 1
COMMIT
PRINT 'Deleted issued: ' + STR(#deleted) + ' records. ' + CONVERT(varchar(19), GETDATE(), 120)
SET #total = #total + #deleted
SET #start_id = #end_id + 1
SET #end_id = #end_id + #interval
IF #end_id > #max_id
SET #end_id = #max_id
SET #now = GETDATE()
SET #timee = DATEDIFF (second, #start, #now)
if #timee > #MAXT
BEGIN
PRINT 'Time limit exceeded for the script, exiting'
SET #continue = 0
END
-- ELSE
-- BEGIN
-- SELECT #total 'Removed now', #timee 'Total time, seconds'
-- END
END
SELECT #total 'Removed records', #timee 'Total time sec' , #start_id 'Next id', #max_id 'Max id', #continue 'COMPLETED? '
SELECT * from EntryID next_start_id
GO
The short answer is, you can't delete 2 billion rows without incurring some kind of major database downtime.
Your best option may be to copy the data to a temp table and truncate the original table, but this will fill your tempDB and would use no less logging than deleting the data.
You will need to delete as many rows as you can until the transaction log fills up, then truncate it each time. The answer provided by Stanislav Kniazev could be modified to do this by increasing the batch size and adding a call to truncate the log file.
I agree with the people who want you loop over a smaller set of records, this will be faster than trying to do the whole operation in one step. You may to experience withthe number of records you should include inthe loop. About 2000 at a time seems to be the sweet spot in most of the tables I do large deltes from althouhg a few need smaller amounts like 500. Depends on number of forign keys, size of the record, triggers etc, so it really will take some experimenting to find what you need. It also depends on how heavy the use of the table is. A heavily accessed table will need each iteration of the loop to run a shorter amount of time. If you can run during off hours, or best yet in single user mode, then you can have more records deleted in one loop.
If you don't think you do this in one night during off hours, it might be best to design the loop with a counter and only do a set number of iterations each night until it is done.
Further, if you use an implicit transaction rather than an explicit one, you can kill the loop query at any time and records already deleted will stay deleted except those in the current round of the loop. Much faster than trying to rollback half a million records becasue you've brought the system to a halt.
It is usually a good idea to backup a database immediately before undertaking an operation of this nature.