I have a process in sql server that seems to never end. To spot if there is a block in this process I used EXEC sp_who2 and I seen that the SPID is 197.
The status is runnable and there is no block. Command is inserting. The weird thing is the CPU time which is the biggest: 68891570 and the DISK IO operations: 16529185.
This process truncates two tables and then insert the data from a another table to these two tables. It is true that there is a lot of information (101962758 rows in the origin table) but I think that there is too much time.
What can I do to accelerate this process or to spot what is happening?
Thank you
It depends on the scenario here. I recommend the following steps in order to decide where to move next.
Find most expensive queries
Using the following SQL to determine the most expensive queries:
SELECT TOP 10 SUBSTRING(qt.TEXT, (qs.statement_start_offset/2)+1,
((CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(qt.TEXT)
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2)+1),
qs.execution_count,
qs.total_logical_reads, qs.last_logical_reads,
qs.total_logical_writes, qs.last_logical_writes,
qs.total_worker_time,
qs.last_worker_time,
qs.total_elapsed_time/1000000 total_elapsed_time_in_S,
qs.last_elapsed_time/1000000 last_elapsed_time_in_S,
qs.last_execution_time,
qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
ORDER BY qs.total_logical_reads DESC -- logical reads
-- ORDER BY qs.total_logical_writes DESC -- logical writes
-- ORDER BY qs.total_worker_time DESC -- CPU time
Execution plan
This could help to determine what is going on with your actual query. More information could be found here.
Performance tips
Indexes. Remove all indexes, except for those needed by the insert (SELECT INTO)
Constraints and triggers. Remove them from the table.
Choosing good clustered index. New records will be inserted at the end of the table.
Fill factor. Set it to 0 or 100 (the same as 0). This will reduce the number of pages that the data is spread across.
Recovery model. Change it to Simple.
Also
Consider reviewing Insert into table select * from table vs bulk insert.
From the info you provided,it seems the query is still running...
This process truncates two tables and then insert the data from a another table to these two tables. It is true that there is a lot of information (101962758 rows in the origin table) but I think that there is too much time.
Can you follow below process (instead of your approach), assuming your tables are Main, T1, T2.
Select * into t1_dup,t2_dup from main
rename t1,t2 to t1dup,t2dup
rename t1_dup,t2_dup to t1,t2
drop t1dup,t2dup
Related
Hi consider there is an INSERT statement running on a table TABLE_A, which takes a long time, I would like to see how has it progressed.
What I tried was to open up a new session (new query window in SSMS) while the long running statement is still in process, I ran the query
SELECT COUNT(1) FROM TABLE_A WITH (nolock)
hoping that it will return right away with the number of rows everytime I run the query, but the test result was even with (nolock), still, it only returns after the INSERT statement is completed.
What have I missed? Do I add (nolock) to the INSERT statement as well? Or is this not achievable?
(Edit)
OK, I have found what I missed. If you first use CREATE TABLE TABLE_A, then INSERT INTO TABLE_A, the SELECT COUNT will work. If you use SELECT * INTO TABLE_A FROM xxx, without first creating TABLE_A, then non of the following will work (not even sysindexes).
Short answer: You can't do this.
Longer answer: A single INSERT statement is an atomic operation. As such, the query has either inserted all the rows or has inserted none of them. Therefore you can't get a count of how far through it has progressed.
Even longer answer: Martin Smith has given you a way to achieve what you want. Whether you still want to do it that way is up to you of course. Personally I still prefer to insert in manageable batches if you really need to track progress of something like this. So I would rewrite the INSERT as multiple smaller statements. Depending on your implementation, that may be a trivial thing to do.
If you are using SQL Server 2016 the live query statistics feature can allow you to see the progress of the insert in real time.
The below screenshot was taken while inserting 10 million rows into a table with a clustered index and a single nonclustered index.
It shows that the insert was 88% complete on the clustered index and this will be followed by a sort operator to get the values into non clustered index key order before inserting into the NCI. This is a blocking operator and the sort cannot output any rows until all input rows are consumed so the operators to the left of this are 0% done.
With respect to your question on NOLOCK
It is trivial to test
Connection 1
USE tempdb
CREATE TABLE T2
(
X INT IDENTITY PRIMARY KEY,
F CHAR(8000)
);
WHILE NOT EXISTS(SELECT * FROM T2 WITH (NOLOCK))
LOOP:
SELECT COUNT(*) AS CountMethod FROM T2 WITH (NOLOCK);
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('T2');
RAISERROR ('Waiting for 10 seconds',0,1) WITH NOWAIT;
WAITFOR delay '00:00:10';
SELECT COUNT(*) AS CountMethod FROM T2 WITH (NOLOCK);
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('T2');
RAISERROR ('Waiting to drop table',0,1) WITH NOWAIT
DROP TABLE T2
Connection 2
use tempdb;
--Insert 2000 * 2000 = 4 million rows
WITH T
AS (SELECT TOP 2000 'x' AS x
FROM master..spt_values)
INSERT INTO T2
(F)
SELECT 'X'
FROM T v1
CROSS JOIN T v2
OPTION (MAXDOP 1)
Example Results - Showing row count increasing
SELECT queries with NOLOCK allow dirty reads. They don't actually take no locks and can still be blocked, they still need a SCH-S (schema stability) lock on the table (and on a heap it will also take a hobt lock).
The only thing incompatible with a SCH-S is a SCH-M (schema modification) lock. Presumably you also performed some DDL on the table in the same transaction (e.g. perhaps created it in the same tran)
For the use case of a large insert, where an approximate in flight result is fine, I generally just poll sysindexes as shown above to retrieve the count from metadata rather than actually counting the rows (non deprecated alternative DMVs are available)
When an insert has a wide update plan you can even see it inserting to the various indexes in turn that way.
If the table is created inside the inserting transaction this sysindexes query will still block though as the OBJECT_ID function won't return a result based on uncommitted data regardless of the isolation level in effect. It's sometimes possible to get around that by getting the object_id from sys.tables with nolock instead.
Use the below query to find the count for any large table or locked table or being inserted table in seconds . Just replace the table name which you want to search.
SELECT
Total_Rows= SUM(st.row_count)
FROM
sys.dm_db_partition_stats st
WHERE
object_name(object_id) = 'TABLENAME' AND (index_id < 2)
For those who just need to see the record count while executing a long running INSERT script, I found you can see the current record count through SSMS by right clicking on the destination database table, -> Properties -> Storage, then view the "Row Count" value like so:
Close window and repeat to see the updated record count.
Let we have a table of payments having 35 columns with a primary key (autoinc bigint) and 3 non-clustered, non-unique indeces (each on one int column).
Among the table's columns we have two datetime fields:
payment_date datetime NOT NULL
edit_date datetime NULL
The table has about 1 200 000 rows.
Only ~1000 of rows have edit_date column = null.
9000 of rows have edit_date not null and not equal to payment_date
Others have edit_date=payment_date
When we run the following query 1:
select top 1 *
from payments
where edit_date is not null and (payment_date=edit_date or payment_date<>edit_date)
order by payment_date desc
server needs a couple of seconds to do it. But if we run query 2:
select top 1 *
from payments
where edit_date is not null
order by payment_date desc
the execution ends up with The log file for database 'tempdb' is full. Back up the transaction log for the database to free up some log space.
If we replace * with some certain column, see query 3
select top 1 payment_date
from payments
where edit_date is not null
order by payment_date desc
it also finishes in a couple of seconds.
Where is the magic?
EDIT
I've changed query 1 so that it operates over exactly the same number of rows as the 2nd query. And still it returns in a second, while query 2 fills tempdb.
ANSWER
I followed the advice to add an index, did this for both date fields - everything started working quick, as expected. Though, the question was - why in this exact situation sql server behave differently on similar queries (query 1 vs query 2); I wanted to understand the logic of the server optimization. I would agree if both queries did used tempdb similarly, but they didn't....
In the end I mark as the answer the first one, where I saw the must-be symptoms of my problem and the first, as well, thoughts on how to avoid this (i.e. indeces)
This is happening cause certain steps in an execution plan can trigger writes to tempdb in particular certain sorts and joins involving lots of data.
Since you are sorting a table with a boat load of columns, SQL decides it would be crazy to perform the sort alone in temp db without the associated data. If it did that it would need to do a gazzilion inefficient bookmark lookups on the underlying table.
Follow these rules:
Try to select only the data you need
Size tempdb appropriately, if you need to do crazy queries that sort a gazzilion rows, you better have an appropriately sized tempdb
Usually, tempdb fills up when you are low on disk space, or when you have set an unreasonably low maximum size for database growth.
Many people think that tempdb is only used for #temp tables. When in fact, you can easily fill up tempdb without ever creating a single temp table. Some other scenarios that can cause tempdb to fill up:
any sorting that requires more memory than has been allocated to SQL
Server will be forced to do its work in tempdb;
if the sorting requires more space than you have allocated to tempdb,
one of the above errors will occur;
DBCC CheckDB('any database') will perform its work in tempdb -- on
larger databases, this can consume quite a bit of space;
DBCC DBREINDEX or similar DBCC commands with 'Sort in tempdb' option
set will also potentially fill up tempdb;
large resultsets involving unions, order by / group by, cartesian
joins, outer joins, cursors, temp tables, table variables, and
hashing can often require help from tempdb;
any transactions left uncommitted and not rolled back can leave
objects orphaned in tempdb;
use of an ODBC DSN with the option 'create temporary stored
procedures' set can leave objects there for the life of the
connection.
USE tempdb
GO
SELECT name
FROM tempdb..sysobjects
SELECT OBJECT_NAME(id), rowcnt
FROM tempdb..sysindexes
WHERE OBJECT_NAME(id) LIKE '#%'
ORDER BY rowcnt DESC
The higher rowcount, values will likely indicate the biggest temporary tables that are consuming space.
Short-term fix
DBCC OPENTRAN -- or DBCC OPENTRAN('tempdb')
DBCC INPUTBUFFER(<number>)
KILL <number>
Long-term prevention
-- SQL Server 7.0, should show 'trunc. log on chkpt.'
-- or 'recovery=SIMPLE' as part of status column:
EXEC sp_helpdb 'tempdb'
-- SQL Server 2000, should yield 'SIMPLE':
SELECT DATABASEPROPERTYEX('tempdb', 'recovery')
ALTER DATABASE tempdb SET RECOVERY SIMPLE
Reference : https://web.archive.org/web/20080509095429/http://sqlserver2000.databases.aspfaq.com:80/why-is-tempdb-full-and-how-can-i-prevent-this-from-happening.html
Other references : http://social.msdn.microsoft.com/Forums/is/transactsql/thread/af493428-2062-4445-88e4-07ac65fedb76
Given the example queries below (Simplified examples only)
DECLARE #DT int; SET #DT=20110717; -- yes this is an INT
WITH LargeData AS (
SELECT * -- This is a MASSIVE table indexed on dt field
FROM mydata
WHERE dt=#DT
), Ordered AS (
SELECT TOP 10 *
, ROW_NUMBER() OVER (ORDER BY valuefield DESC) AS Rank_Number
FROM LargeData
)
SELECT * FROM Ordered
and ...
DECLARE #DT int; SET #DT=20110717;
BEGIN TRY DROP TABLE #LargeData END TRY BEGIN CATCH END CATCH; -- dump any possible table.
SELECT * -- This is a MASSIVE table indexed on dt field
INTO #LargeData -- put smaller results into temp
FROM mydata
WHERE dt=#DT;
WITH Ordered AS (
SELECT TOP 10 *
, ROW_NUMBER() OVER (ORDER BY valuefield DESC) AS Rank_Number
FROM #LargeData
)
SELECT * FROM Ordered
Both produce the same results, which is a limited and ranked list of values from a list based on a fields data.
When these queries get considerably more complicated (many more tables, lots of criteria, multiple levels of "with" table alaises, etc...) the bottom query executes MUCH faster then the top one. Sometimes in the order of 20x-100x faster.
The Question is...
Is there some kind of query HINT or other SQL option that would tell the SQL Server to perform the same kind of optimization automatically, or other formats of this that would involve a cleaner aproach (trying to keep the format as much like query 1 as possible) ?
Note that the "Ranking" or secondary queries is just fluff for this example, the actual operations performed really don't matter too much.
This is sort of what I was hoping for (or similar but the idea is clear I hope). Remember this query below does not actually work.
DECLARE #DT int; SET #DT=20110717;
WITH LargeData AS (
SELECT * -- This is a MASSIVE table indexed on dt field
FROM mydata
WHERE dt=#DT
**OPTION (USE_TEMP_OR_HARDENED_OR_SOMETHING) -- EXAMPLE ONLY**
), Ordered AS (
SELECT TOP 10 *
, ROW_NUMBER() OVER (ORDER BY valuefield DESC) AS Rank_Number
FROM LargeData
)
SELECT * FROM Ordered
EDIT: Important follow up information!
If in your sub query you add
TOP 999999999 -- improves speed dramatically
Your query will behave in a similar fashion to using a temp table in a previous query. I found the execution times improved in almost the exact same fashion. WHICH IS FAR SIMPLIER then using a temp table and is basically what I was looking for.
However
TOP 100 PERCENT -- does NOT improve speed
Does NOT perform in the same fashion (you must use the static Number style TOP 999999999 )
Explanation:
From what I can tell from the actual execution plan of the query in both formats (original one with normal CTE's and one with each sub query having TOP 99999999)
The normal query joins everything together as if all the tables are in one massive query, which is what is expected. The filtering criteria is applied almost at the join points in the plan, which means many more rows are being evaluated and joined together all at once.
In the version with TOP 999999999, the actual execution plan clearly separates the sub querys from the main query in order to apply the TOP statements action, thus forcing creation of an in memory "Bitmap" of the sub query that is then joined to the main query. This appears to actually do exactly what I wanted, and in fact it may even be more efficient since servers with large ammounts of RAM will be able to do the query execution entirely in MEMORY without any disk IO. In my case we have 280 GB of RAM so well more then could ever really be used.
Not only can you use indexes on temp tables but they allow the use of statistics and the use of hints. I can find no refernce to being able to use the statistics in the documentation on CTEs and it says specifically you cann't use hints.
Temp tables are often the most performant way to go when you have a large data set when the choice is between temp tables and table variables even when you don't use indexes (possobly because it will use statistics to develop the plan) and I might suspect the implementation of the CTE is more like the table varaible than the temp table.
I think the best thing to do though is see how the excutionplans are different to determine if it is something that can be fixed.
What exactly is your objection to using the temp table when you know it performs better?
The problem is that in the first query SQL Server query optimizer is able to generate a query plan. In the second query a good query plan isn't able to be generated because you're inserting the values into a new temporary table. My guess is there is a full table scan going on somewhere that you're not seeing.
What you may want to do in the second query is insert the values into the #LargeData temporary table like you already do and then create a non-clustered index on the "valuefield" column. This might help to improve your performance.
It is quite possible that SQL is optimizing for the wrong value of the parameters.
There are a couple of options
Try using option(RECOMPILE). There is a cost to this as it recompiles the query every time but if different plans are needed it might be worth it.
You could also try using OPTION(OPTIMIZE FOR #DT=SomeRepresentatvieValue) The problem with this is you pick the wrong value.
See I Smell a Parameter! from The SQL Server Query Optimization Team blog
In Oracle, there's a view called V$SQLAREA that lists statistics on shared SQL area and contains one row per SQL string. It provides statistics on SQL statements that are in memory, parsed, and ready for execution.
There is one column -ROWS_PROCESSED that sums the Total number of rows processed on behalf of this SQL statement.
I'm looking for collateral information in SQLSERVER 2005.
I looked in some of the DMV's (like sys.dm_exec_query_stats), but I haven't found anything related.
##ROWCOUNT won't be useful to me, as I want incremental statistics information that will sum the rows_processed of the top cpu/io/execution consumption queries in the database.
I would appreciate any help in regards the subject.
Thanks in advance,
Roni.
I saw that when I query the following query, I receive the Query Plan in XML.
Inside the XML plan code, there's a "EstimateRows" part with a number that correlates to the number of estimation rows of the query.
I'm thinking of the option to substr the query_plan column to retreive only the above information (unless I would find it in some system views/tables).
Where can I find the Actual number of rows of the query ? Where is it stored ?
SELECT
case when sql_handle IS NULL
then ' '
else ( substring(st.text,(statement_start_offset+2)/2,
(case when qs.statement_end_offset = -1
then len(convert(nvarchar(MAX),st.text))*2
else statement_end_offset
end - statement_start_offset) /2 ) )
end as query_text ,
query_plan,
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle)
cross apply sys.dm_exec_sql_text(sql_handle) st;
There isn't a direct equivalent especially for rowcounts as far as I know. The relevant dmvs track IO cost which is used to show missing indexed, most expensive queries etc
This will give you stats per SQL statement for current sessions. YMMV: I've just put it together based on scripts I have lying around.
SELECT
*
FROM
sys.dm_exec_query_stats QS
CROSS APPLY
sys.dm_exec_sql_text(sql_handle) ST
My scripts are based on the links I mentioned here
You can use the system variable ##rowcount to know the number of records affected by last statement.
I have a very large database (~100Gb) primarily consisting of two tables I want to reduce in size (both of which have approx. 50 million records). I have an archive DB set up on the same server with these two tables, using the same schema. I'm trying to determine the best conceptual way of going about removing the rows from the live db and inserting them in the archive DB. In pseudocode this is what I'm doing now:
Declare #NextIDs Table(UniqueID)
Declare #twoYearsAgo = two years from today's date
Insert into #NextIDs
SELECT top 100 from myLargeTable Where myLargeTable.actionDate < twoYearsAgo
Insert into myArchiveTable
<fields>
SELECT <fields>
FROM myLargeTable INNER JOIN #NextIDs on myLargeTable.UniqueID = #NextIDs.UniqueID
DELETE MyLargeTable
FROM MyLargeTable INNER JOIN #NextIDs on myLargeTable.UniqueID = #NextIDs.UniqueID
Right now this takes a horrifically slow 7 minutes to complete 1000 records. I've tested the Delete and the Insert, both taking approx. 3.5 minutes to complete, so its not necessarily one is drastically more inefficient than the other. Can anyone point out some optimization ideas in this?
Thanks!
This is SQL Server 2000.
Edit: On the large table there is a clustered index on the ActionDate field. There are two other indexes, but neither are referenced in any of the queries. The Archive table has no indexes. On my test server, this is the only query hitting the SQL Server, so it should have plenty of processing power.
Code (this does a loop in batches of 1000 records at a time):
DECLARE #NextIDs TABLE(UniqueID int primary key)
DECLARE #TwoYearsAgo datetime
SELECT #TwoYearsAgo = DATEADD(d, (-2 * 365), GetDate())
WHILE EXISTS(SELECT TOP 1 UserName FROM [ISAdminDB].[dbo].[UserUnitAudit] WHERE [ActionDateTime] < #TwoYearsAgo)
BEGIN
BEGIN TRAN
--get all records to be archived
INSERT INTO #NextIDs(UniqueID)
SELECT TOP 1000 UniqueID FROM [ISAdminDB].[dbo].[UserUnitAudit] WHERE [UserUnitAudit].[ActionDateTime] < #TwoYearsAgo
--insert into archive table
INSERT INTO [ISArchive].[dbo].[userunitaudit]
(<Fields>)
SELECT <Fields>
FROM [ISAdminDB].[dbo].[UserUnitAudit] AS a
INNER JOIN #NextIDs AS b ON a.UniqueID = b.UniqueID
--remove from Admin DB
DELETE [ISAdminDB].[dbo].[UserUnitAudit]
FROM [ISAdminDB].[dbo].[UserUnitAudit] AS a
INNER JOIN #NextIDs AS b ON a.UniqueID = b.UniqueID
DELETE FROM #NextIDs
COMMIT
END
You effectively have three selects which need to be run before your insert/delete commands are executed:
for the 1st insert:
SELECT top 100 from myLargeTable Where myLargeTable.actionDate < twoYearsAgo
for the 2nd insert:
SELECT <fields> FROM myLargeTable INNER JOIN NextIDs
on myLargeTable.UniqueID = NextIDs.UniqueID
for the delete:
(select *)
FROM MyLargeTable INNER JOIN NextIDs on myLargeTable.UniqueID = NextIDs.UniqueID
I'd try and optimize these and if they are all quick, then the indexes may be slowing down your writes. Some suggestions:
start profiler and see what's happenng with the reads/writes etc.
check index usage for all three statements.
try running the SELECTs returning only the PK, to see if the delay is query execution or fetching the data (do have e.g. any fulltext-indexed fields, TEXT fields etc.)
Do you have an index on the source table for the column which you're using to filter the results? In this case, that would be the actionDate.
Also, it can often help to remove all indexes from the destination table before doing massive inserts, but in this case you're only doing 100's at a time.
You would also probably be better off doing this in larger batches. With one hundred at a time the overhead of the queries is going to end up dominating the costs/time.
Is there any other activity on the server during this time? Is there any blocking happening?
Hopefully this gives you a starting point.
If you can provide the exact code that you're using (maybe without the column names if there are privacy issues) then maybe someone can spot other ways to optimize.
EDIT:
Have you checked the query plan for your block of code? I've run into issues with table variables like this where the query optimizer couldn't figure out that the table variable would be small in size so it always tried to do a full table scan on the base table.
In my case it eventually became a moot point, so I'm not sure what the ultimate solution is. You can certainly add a condition on the actionDate to all of your select queries, which would at least minimize the effects of this.
The other option would be to use a normal table to hold the IDs.
The INSERT and DELETE statements are joining on
[ISAdminDB].[dbo].[UserUnitAudit].UniqueID
If there's no index on this, and you indicate there isn't, you're doing two table scans. That's likely the source of the slowness, b/c a SQL Server table scan reads the entire table into a scratch table, searches the scratch table for matching rows, then drops the scratch table.
I think you need to add an index on UniqueID. The performance hit for maintaining it has got to be less than table scans. And you can drop it after your archive is done.
Are there any indexes on myLargeTable.actionDate and .UniqueID?
Have you tried larger batch sizes than 100?
What is taking the most time? The INSERT, or the DELETE?
You might try doing this using the output clause:
declare #items table (
<field list just like source table> )
delete top 100 source_table
output deleted.first_field, deleted.second_field, etc
into #items
where <conditions>
insert archive_table (<fields>)
select (<fields>) from #items
You also might be able to do this in a single query, by doing 'output into' directly into the archive table (eliminating the need for the table var)