Performance issue when inserting large XML parameters into temp table [duplicate] - sql

I'm trying to insert some data from a XML document into a variable table. What blows my mind is that the same select-into (bulk) runs in no time while insert-select takes ages and holds SQL server process accountable for 100% CPU usage while the query executes.
I took a look at the execution plan and INDEED there's a difference. The insert-select adds an extra "Table spool" node even though it doesn't assign cost. The "Table Valued Function [XML Reader]" then gets 92%. With select-into, the two "Table Valued Function [XML Reader]" get 49% each.
Please explain "WHY is this happening" and "HOW to resolve this (elegantly)" as I can indeed bulk insert into a temporary table and then in turn insert into variable table, but that's just creepy.
I tried this on SQL 10.50.1600, 10.00.2531 with the same results
Here's a test case:
declare #xColumns xml
declare #columns table(name nvarchar(300))
if OBJECT_ID('tempdb.dbo.#columns') is not null drop table #columns
insert #columns select name from sys.all_columns
set #xColumns = (select name from #columns for xml path('columns'))
delete #columns
print 'XML data size: ' + cast(datalength(#xColumns) as varchar(30))
--raiserror('selecting', 10, 1) with nowait
--select ColumnNames.value('.', 'nvarchar(300)') name
--from #xColumns.nodes('/columns/name') T1(ColumnNames)
raiserror('selecting into #columns', 10, 1) with nowait
select ColumnNames.value('.', 'nvarchar(300)') name
into #columns
from #xColumns.nodes('/columns/name') T1(ColumnNames)
raiserror('inserting #columns', 10, 1) with nowait
insert #columns
select ColumnNames.value('.', 'nvarchar(300)') name
from #xColumns.nodes('/columns/name') T1(ColumnNames)
Thanks a bunch!!

This is a bug in SQL Server 2008.
Use
insert #columns
select ColumnNames.value('.', 'nvarchar(300)') name
from #xColumns.nodes('/columns/name') T1(ColumnNames)
OPTION (OPTIMIZE FOR ( #xColumns = NULL ))
This workaround is from an item on the Microsoft Connect Site which also mentions a hotfix for this Eager Spool / XML Reader issue is available (under traceflag 4130).
The reason for the performance regression is explained in a different connect item
The spool was introduced due to a general halloween protection logic
(that is not needed for the XQuery expressions).

Looks to be an issue specific to SQL Server 2008. When I run the code in SQL Server 2005, both inserts run quickly and produce identical execution plans that start with the fragment shown below as Plan 1. In 2008, the first insert uses Plan 1 but the second insert produces Plan 2. The remainder of both plans beyond the fragment shown are identical.
Plan 1
Plan 2

Related

How do you show the actual query plan for a single query in a batch in SSMS?

I have a stored procedure I'm trying to optimize which accepts a table variable as parameter. The table variable has a single column and is used to pass in a list of primary keys. When testing, I create a dummy table variable, run a bunch of INSERT statements to add dummy values to it, and pass it into the stored procedure. This works fine, but when I want to look at the actual query execution plan, SSMS compiles a query plan for all of the INSERT statements and the stored procedure. So, if I have 1000 dummy INSERT statements, SSMS will crash. Unfortunately, the table variable has to be executed in the same batch as the stored procedure, so I can't enable/disable query plans by batch.
Is there a way to compile the actual query plan for the stored procedure only, ignoring all INSERT statements in the batch?
Here's what it looks like:
DECLARE #dummyIds AS PrimaryKeyTable
INSERT INTO #dummyIds VALUES(1)
INSERT INTO #dummyIds VALUES(2)
...
INSERT INTO #dummyIds VALUES(1000)
EXEC MyStoredProc #dummyIds
If I executed that batch including the actual query plan, it would generate 1001 query plans, when I really only want the 1.
The SQL server instance is running SQL SERVER 2014.
One way to achieve this is not to rely on SSMS to generate the request query plans, but do in your test batch at the required points only. For this, just manually insert the SET STATISTICS XML instruction before the to-be-analyzed query and disable it afterwards, and keep the option in SSMS disabled.
Something like that will do:
DECLARE #dummyIds AS PrimaryKeyTable
INSERT INTO #dummyIds VALUES(1)
INSERT INTO #dummyIds VALUES(2)
...
INSERT INTO #dummyIds VALUES(1000)
SET STATISTICS XML ON --Enable plan generation
EXEC MyStoredProc #dummyIds
SET STATISTICS XML OFF --Disable plan generation
That way the server will return query plans just for the code in-between. Internally, SSMS does exactly this for the whole batch. Note that with this change SSMS won't show an "execution plan" tab in the result panel, but instead an extra result set with a lone XML document will be returned (or more precisely, one for each plan). You must click those manually and SSMS will then show the graphical plan for it.
Use something like a recursive cte to generate your vals and then insert into a table. This will result in a plan to generate the values and then a plan to insert into the table. Don't do individual inserts. You could have put all those values into one insert as well.
DECLARE # TABLE (val INT)
;with cte AS (
SELECT 1 AS col_
UNION ALL
SELECT col_ + 1
FROM cte
WHERE col_ < 1000
)
INSERT INTO #
SELECT *
FROM cte
OPTION (MAXRECURSION 0)
SELECT *
FROM #
If you want to totally remove the insert plan, you need to insert into a persistent table first, like a user table or a temp table, then run the non-insert TSQL by itself.

T-SQL parameter sniffing recompile plan

I have the SQL command
exec sp_executesql N'SELECT TOP (10) * FROM mytableView WHERE ([Name]) LIKE (''%'' +
(#Value0) + ''%'') ORDER BY [Id] DESC',N'#Value0 varchar(5)',#Value0='value'
this sql command execute near 22 seconds. I fount that it happens because I have a parameter sniffing..
If add to end of SQL command option(recompile) it's work fast: 0 seconds was shown in Managements studio
exec sp_executesql N'SELECT TOP (10) * FROM mytableView WHERE ([Name]) LIKE
(''%'' + (#Value0) + ''%'') ORDER BY [Id] DESC
option(recompile)',N'#Value0 varchar(5)',#Value0='value'
Is it possible to recompile/recreate/erase/update execution plan for my SQL command to work without option(recompile)?
I have tried to apply
UPDATE STATISTICS
sp_recompile
DBCC FREEPROCCACHE
DBCC UPDATEUSAGE (0)
DBCC FREESYSTEMCACHE ('ALL')
ALTER INDEX and REBUILD WITH
unfortunately all this actions didn't help me.
You could try the OPTIMIZE FOR UNKNOWN hint instead of RECOMPILE:
exec sp_executesql N'SELECT TOP (10) *
FROM mytableView
WHERE ([Name]) LIKE (''%'' + (#Value0) + ''%'')
ORDER BY [Id] DESC
option(OPTIMIZE FOR UNKNOWN);',
N'#Value0 varchar(5)',
#Value0 = 'value';
The MSDN page for Query Hints states that OPTIMIZE FOR UNKNOWN:
Instructs the query optimizer to use statistical data instead of the initial values for all local variables when the query is compiled and optimized, including parameters created with forced parameterization.
This hint instructs the optimizer to use the total number of rows for the specified table divided by the number of distinct values for the specified column (i.e. average rows per value) as the row estimate instead of using the statistics of any particular value. As pointed out by #GarethD in a comment below: since this will possibly benefit some queries and possibly hurt others, it needs to be tested to see if the overall gain from this is a net savings over the cost of doing the RECOMPILE. For more details check out: How OPTIMIZE FOR UNKNOWN Works.
And just to have it stated, depending on the distribution of the data and what values are passed in, if there is a particular value being used that has a distribution that is fairly representative of most of the values that could be passed in (even if wildly different from some values that won't ever be passed in), then you can target that value by using OPTIMIZE FOR (#Value0 = 'representative value') rather than OPTIMIZE FOR UNKNOWN.
Please note that this query hint is only needed for queries that have:
parameters supplied by variables
the field(s) in question do not have a fairly even distribution of values (and hence different values passed in via the variable could generate different plans)
The following scenarios were identified in comments below and do not all require this hint, so here is how to address each situation:
select top 80 * from table order by id desc
There is no variable being passed in here so no query hint needed.
select top 80 * from table where id < #lastid order by id desc
There is a variable being passed in here, but the [id] field, by its very nature, is evenly distributed, even if sparse due to some deletes, hence no query hint needed (or at least should not be needed).
SELECT TOP (10) * FROM mytableView WHERE ([Name]) LIKE (''%'' + (#Value0) + ''%'') ORDER BY [Id] DESC
There is a variable being passed in here, and used in such a way that there could be no indication of consistent numbers of matching rows for different values, especially due to not being able to use an index as a result of the leading %. THIS is a good opportunity for the OPTION (OPTIMIZE FOR UNKNOWN) hint as discussed above.
If there is a situation where a variable is passed in that has a greatly varying distribution of matching rows, but not many possible values to get passed in, and the values that are passed in are re-used frequently, then those can be concatenated (after doing a REPLACE(#var, '''', '''''') on it) directly into the Dynamic SQL. This allows for each of those values to have their own separate yet reusable query plan. Other variables should be sent in as parameters as usual.
For example, a lookup value for [StatusID] will only have a few possible values and they will get reused frequently but each particular value can match a vastly different number of rows. In that case, something like the following will allow for separate execution plans that do not need either the RECOMPILE or OPTIMIZE FOR UNKNOWN hints as each execution plan will be optimized for that particular value:
IF (TRY_CONVERT(INT, #StatusID) IS NULL)
BEGIN
;THROW 50505, '#StatusID was not a valid INT', 55;
END;
DECLARE #SQL NVARCHAR(MAX);
SET #SQL = N'SELECT TOP (10) * FROM myTableView WHERE [StatusID] = '
+ REPLACE(#StatusID, N'''', N'''''') -- really only needed for strings
+ N' AND [OtherField] = #OtherFieldVal;';
EXEC sp_executesql
#SQL,
N'#OtherFieldVal VARCHAR(50)',
#OtherFieldVal = #OtherField;
Assuming two different values of #StatusID are passed in (e.g. 1 and 2), there will be two execution plans cached matching the following queries:
SELECT TOP (10) * FROM myTableView WHERE [StatusID] = 1 AND [OtherField] = #OtherFieldVal;
SELECT TOP (10) * FROM myTableView WHERE [StatusID] = 2 AND [OtherField] = #OtherFieldVal;

Performance Dynamic SQL vs Temporary Tables

I'm wondering if copying an existing Table into a Temporary Table results in a worse performance compared to Dynamic SQL.
To be concrete i wonder if i should expect a different performance between the following two SQL Server stored procedures:
CREATE PROCEDURE UsingDynamicSQL
(
#ID INT ,
#Tablename VARCHAR(100)
)
AS
BEGIN
DECLARE #SQL VARCHAR(MAX)
SELECT #SQL = 'Insert into Table2 Select Sum(ValColumn) From '
+ #Tablename + ' Where ID=' + #ID
EXEC(#SQL)
END
CREATE PROCEDURE UsingTempTable
(
#ID INT ,
#Tablename Varachar(100)
)
AS
BEGIN
Create Table #TempTable (ValColumn float, ID int)
DECLARE #SQL VARCHAR(MAX)
SELECT #SQL = 'Select ValColumn, ID From ' + #Tablename
+ ' Where ID=' + #ID
INSERT INTO #TempTable
EXEC ( #SQL );
INSERT INTO Table2
SELECT SUM(ValColumn)
FROM #TempTable;
DROP TABLE #TempTable;
END
I'm asking this since I'm currently using a Procedure build in the latter style where i create many Temporary Tables in the beginning as simple extracts of existing Tables and am afterwards working with these Temporary Tables.
Could I improve the performance of the stored procedure by getting rid of the Temporary Tables and using Dynamic SQL instead? In my opinion the Dynamic SQL Version is a lot uglier to programm - therefore i used Temporary Tables in the first place.
Table variables suffer performance problems because the query optimizer always assumes there will be exactly one row in them. If you have table variables holding > 100 rows, I'd switch them to temp tables.
Using dynamic sql with EXEC(#sql) instead of exec sp_executesql #sql will prevent the query plan from being cached, which will probably hurt performance.
However, you are using dynamic sql on both queries. The only difference is that the second query has the unnecessary step of loading to a table variable first, then loading into the final table. Go with the first stored procedure you have, but switch to sp_executesql.
In the posted query the temporary table is an extra write.
It is not going to help.
Don't just time a query look at the query plan.
If you have two queries the query plan will tell you the split.
And there is a difference between a table variable and temp table
The temp table is faster - the query optimizer does more with a temp table
A temporary table can help in a few situations
The output from a select is going to be used more than once
You materialize the output so it is only executed once
Where you see this is with a an expensive CTE that is evaluated many times
People of falsely think a CTE is just executed once - no it is just syntax
The query optimizer need help
An example
You are doing a self join on a large table with multiple conditions and some of conditions eliminate most of the rows
A query to a #temp can filter the rows and also reduce the number of join conditions
I agree with everyone else that you always need to test both... I'm putting it in an answer here so it's more clear.
If you have an index setup that is perfect for the final query, going to temp tables could be nothing but extra work.
If that's not the case, pre-filtering to a temp table may or may not be faster.
You can predict it at the extremes - if you're filtering down from a million to a dozen rows, I would bet it helps.
But otherwise it can be genuinely difficult to know without trying.
I agree with you that maintenance is also an issue and lots of dynamic sql is a maintenance cost to consider.

Statement 'SELECT INTO' is not supported in this version of SQL Server - SQL Azure

I am getting
Statement 'SELECT INTO' is not supported in this version of SQL Server
in SQL Server
for the below query inside stored procedure
DECLARE #sql NVARCHAR(MAX)
,#sqlSelect NVARCHAR(MAX) = ''
,#sqlFrom NVARCHAR(MAX) = ''
,#sqlTempTable NVARCHAR(MAX) = '#itemSearch'
,#sqlInto NVARCHAR(MAX) = ''
,#params NVARCHAR(MAX)
SET #sqlSelect ='SELECT
,IT.ITEMNR
,IT.USERNR
,IT.ShopNR
,IT.ITEMID'
SET #sqlFrom =' FROM dbo.ITEM AS IT'
SET #sqlInto = ' INTO ' + #sqlTempTable + ' ';
IF (#cityId > 0)
BEGIN
SET #sqlFrom = #sqlFrom +
' INNER JOIN dbo.CITY AS CI2
ON CI2.CITYID = #cityId'
SET #sqlSelect = #sqlSelect +
'CI2.LATITUDE AS CITYLATITUDE
,CI2.LONGITUDE AS CITYLONGITUDE'
END
SELECT #params =N'#cityId int '
SET #sql = #sqlSelect +#sqlInto +#sqlFrom
EXEC sp_executesql #sql,#params
I have around 50,000 records, so decided to use Temp Table. But surprised to see this error.
How can i achieve the same in SQL Azure?
Edit: Reading this blog http://blogs.msdn.com/b/sqlazure/archive/2010/05/04/10007212.aspx suggesting us to CREATE a Table inside Stored procedure for storing data instead of Temp table. Is it safe under concurrency? Will it hit performance?
Adding some points taken from http://blog.sqlauthority.com/2011/05/28/sql-server-a-quick-notes-on-sql-azure/
Each Table must have clustered index. Tables without a clustered index are not supported.
Each connection can use single database. Multiple database in single transaction is not supported.
‘USE DATABASE’ cannot be used in Azure.
Global Temp Tables (or Temp Objects) are not supported.
As there is no concept of cross database connection, linked server is not the concept in Azure at this moment.
SQL Azure is shared environment and because of the same there is no concept of Windows Login.
Always drop TempDB objects after their need as they create pressure on TempDB.
During buck insert use batchsize option to limit the number of rows to be inserted. This will limit the usage of Transaction log space.
Avoid unnecessary usage of grouping or blocking ORDER by operations as they leads to high end memory usage.
SELECT INTO is one of the many things that you can unfortunately not perform in SQL Azure.
What you'd have to do is first create the temporary table, then perform the insert. Something like:
CREATE TABLE #itemSearch (ITEMNR INT, USERNR INT, IT.ShopNR INT, IT.ITEMID INT)
INSERT INTO #itemSearch
SELECT IT.ITEMNR, IT.USERNR, IT.ShopNR ,IT.ITEMID
FROM dbo.ITEM AS IT
The new Azure DB Update preview has this problem resolved:
The V12 preview enables you to create a table that has no clustered
index. This feature is especially helpful for its support of the T-SQL
SELECT...INTO statement which creates a table from a query result.
http://azure.microsoft.com/en-us/documentation/articles/sql-database-preview-whats-new/
Create the table using # prefix, e.g. create table #itemsearch then use insert into. The scope of the temp table is limited to the session so there will no concurrency problems.
Well, As we all know SQL Azure table must have a clustered index, that is why SELECT INTO failure copy data from one table in to another table.
If you want to migrate, you must create a table first with same structure and then execute INSERT INTO statement.
For temporary table which followed by # you don't need to create Index.
how to create index and how to execute insert into for temp table?

Is my stored procedure executing out of order?

Brief history:
I'm writing a stored procedure to support a legacy reporting system (using SQL Server Reporting Services 2000) on a legacy web application.
In keeping with the original implementation style, each report has a dedicated stored procedure in the database that performs all the querying necessary to return a "final" dataset that can be rendered simply by the report server.
Due to the business requirements of this report, the returned dataset has an unknown number of columns (it depends on the user who executes the report, but may have 4-30 columns).
Throughout the stored procedure, I keep a column UserID to track the user's ID to perform additional querying. At the end, however, I do something like this:
UPDATE #result
SET Name = ppl.LastName + ', ' + ppl.FirstName
FROM #result r
LEFT JOIN Users u ON u.id = r.userID
LEFT JOIN People ppl ON ppl.id = u.PersonID
ALTER TABLE #result
DROP COLUMN [UserID]
SELECT * FROM #result r ORDER BY Name
Effectively I set the Name varchar column (that was previously left NULL while I was performing some pivot logic) to the desired name format in plain text.
When finished, I want to drop the UserID column as the report user shouldn't see this.
Finally, the data set returned has one column for the username, and an arbitrary number of INT columns with performance totals. For this reason, I can't simply exclude the UserID column since SQL doesn't support "SELECT * EXCEPT [UserID]" or the like.
With this known (any style pointers are appreciated but not central to this problem), here's the problem:
When I execute this stored procedure, I get an execution error:
Invalid column name 'userID'.
However, if I comment out my DROP COLUMN statement and retain the UserID, the stored procedure performs correctly.
What's going on? It certainly looks like the statements are executing out of order and it's dropping the column before I can use it to set the name strings!
[Edit 1]
I defined UserID previously (the whole stored procedure is about 200 lies of mostly irrelevant logic, so I'll paste snippets:
CREATE TABLE #result ([Name] NVARCHAR(256), [UserID] INT);
Case sensitivity isn't the problem but did point me to the right line - there was one place in which I had userID instead of UserID. Now that I fixed the case, the error message complains about UserID.
My "broken" stored procedure also works properly in SQL Server 2008 - this is either a 2000 bug or I'm severely misunderstanding how SQL Server used to work.
Thanks everyone for chiming in!
For anyone searching this in the future, I've added an extremely crude workaround to be 2000-compatible until we update our production version:
DECLARE #workaroundTableName NVARCHAR(256), #workaroundQuery NVARCHAR(2000)
SET #workaroundQuery = 'SELECT [Name]';
DECLARE cur_workaround CURSOR FOR
SELECT COLUMN_NAME FROM [tempdb].INFORMATION_SCHEMA.Columns WHERE TABLE_NAME LIKE '#result%' AND COLUMN_NAME <> 'UserID'
OPEN cur_workaround;
FETCH NEXT FROM cur_workaround INTO #workaroundTableName
WHILE ##FETCH_STATUS = 0
BEGIN
SET #workaroundQuery = #workaroundQuery + ',[' + #workaroundTableName + ']'
FETCH NEXT FROM cur_workaround INTO #workaroundTableName
END
CLOSE cur_workaround;
DEALLOCATE cur_workaround;
SET #workaroundQuery = #workaroundQuery + ' FROM #result ORDER BY Name ASC'
EXEC(#workaroundQuery);
Thanks everyone!
A much easier solution would be to not drop the column, but don't return it in the final select.
There are all sorts of reasons why you shouldn't be returning select * from your procedure anyway.
EDIT: I see now that you have to do it this way because of an unknown number of columns.
Based on the error message, is the database case sensitive, and so there's a difference between userID and UserID?
This works for me:
CREATE TABLE #temp_t
(
myInt int,
myUser varchar(100)
)
INSERT INTO #temp_t(myInt, myUser) VALUES(1, 'Jon1')
INSERT INTO #temp_t(myInt, myUser) VALUES(2, 'Jon2')
INSERT INTO #temp_t(myInt, myUser) VALUES(3, 'Jon3')
INSERT INTO #temp_t(myInt, myUser) VALUES(4, 'Jon4')
ALTER TABLE #temp_t
DROP Column myUser
SELECT * FROM #temp_t
DROP TABLE #temp_t
It says invalid column for you. Did you check the spelling and ensure there even exists that column in your temp table.
You might try wrapping everything preceding the DROP COLUMN in a BEGIN...COMMIT transaction.
At compile time, SQL Server is probably expanding the * into the full list of columns. Thus, at run time, SQL Server executes "SELECT UserID, Name, LastName, FirstName, ..." instead of "SELECT *". Dynamically assembling the final SELECT into a string and then EXECing it at the end of the stored procedure may be the way to go.