SQL Server Indexing - sql-server-2005

SQL Server Indexing - sql-server-2005

I am trying to understand what is going on with CREATE INDEX internally. When I create a NONCLUSTERED index it shows as an INSERT in the execution plan as well as when I get the query test.
DECLARE #sqltext VARBINARY(128)
SELECT #sqltext = sql_handle
FROM sys.sysprocesses s
WHERE spid = 73 --73 is the process creating the index
SELECT TEXT
FROM sys.dm_exec_sql_text(#sqltext)
GO
Show:
insert [dbo].[tbl] select * from [dbo].[tbl] option (maxdop 1)
This is consistent in the execution plan. Any info is appreciated.

This was my lack of knowledge on indexes, what a difference 4 months of experience makes! :)
The index creation will cause writes to the index to insert the leafs as needed.

Related

Performance Dynamic SQL vs Temporary Tables

I'm wondering if copying an existing Table into a Temporary Table results in a worse performance compared to Dynamic SQL.
To be concrete i wonder if i should expect a different performance between the following two SQL Server stored procedures:
CREATE PROCEDURE UsingDynamicSQL
(
#ID INT ,
#Tablename VARCHAR(100)
)
AS
BEGIN
DECLARE #SQL VARCHAR(MAX)
SELECT #SQL = 'Insert into Table2 Select Sum(ValColumn) From '
+ #Tablename + ' Where ID=' + #ID
EXEC(#SQL)
END
CREATE PROCEDURE UsingTempTable
(
#ID INT ,
#Tablename Varachar(100)
)
AS
BEGIN
Create Table #TempTable (ValColumn float, ID int)
DECLARE #SQL VARCHAR(MAX)
SELECT #SQL = 'Select ValColumn, ID From ' + #Tablename
+ ' Where ID=' + #ID
INSERT INTO #TempTable
EXEC ( #SQL );
INSERT INTO Table2
SELECT SUM(ValColumn)
FROM #TempTable;
DROP TABLE #TempTable;
END
I'm asking this since I'm currently using a Procedure build in the latter style where i create many Temporary Tables in the beginning as simple extracts of existing Tables and am afterwards working with these Temporary Tables.
Could I improve the performance of the stored procedure by getting rid of the Temporary Tables and using Dynamic SQL instead? In my opinion the Dynamic SQL Version is a lot uglier to programm - therefore i used Temporary Tables in the first place.

Table variables suffer performance problems because the query optimizer always assumes there will be exactly one row in them. If you have table variables holding > 100 rows, I'd switch them to temp tables.
Using dynamic sql with EXEC(#sql) instead of exec sp_executesql #sql will prevent the query plan from being cached, which will probably hurt performance.
However, you are using dynamic sql on both queries. The only difference is that the second query has the unnecessary step of loading to a table variable first, then loading into the final table. Go with the first stored procedure you have, but switch to sp_executesql.

In the posted query the temporary table is an extra write.
It is not going to help.
Don't just time a query look at the query plan.
If you have two queries the query plan will tell you the split.
And there is a difference between a table variable and temp table
The temp table is faster - the query optimizer does more with a temp table
A temporary table can help in a few situations
The output from a select is going to be used more than once
You materialize the output so it is only executed once
Where you see this is with a an expensive CTE that is evaluated many times
People of falsely think a CTE is just executed once - no it is just syntax
The query optimizer need help
An example
You are doing a self join on a large table with multiple conditions and some of conditions eliminate most of the rows
A query to a #temp can filter the rows and also reduce the number of join conditions

I agree with everyone else that you always need to test both... I'm putting it in an answer here so it's more clear.
If you have an index setup that is perfect for the final query, going to temp tables could be nothing but extra work.
If that's not the case, pre-filtering to a temp table may or may not be faster.
You can predict it at the extremes - if you're filtering down from a million to a dozen rows, I would bet it helps.
But otherwise it can be genuinely difficult to know without trying.
I agree with you that maintenance is also an issue and lots of dynamic sql is a maintenance cost to consider.

Performance issue when inserting large XML parameters into temp table [duplicate]

I'm trying to insert some data from a XML document into a variable table. What blows my mind is that the same select-into (bulk) runs in no time while insert-select takes ages and holds SQL server process accountable for 100% CPU usage while the query executes.
I took a look at the execution plan and INDEED there's a difference. The insert-select adds an extra "Table spool" node even though it doesn't assign cost. The "Table Valued Function [XML Reader]" then gets 92%. With select-into, the two "Table Valued Function [XML Reader]" get 49% each.
Please explain "WHY is this happening" and "HOW to resolve this (elegantly)" as I can indeed bulk insert into a temporary table and then in turn insert into variable table, but that's just creepy.
I tried this on SQL 10.50.1600, 10.00.2531 with the same results
Here's a test case:
declare #xColumns xml
declare #columns table(name nvarchar(300))
if OBJECT_ID('tempdb.dbo.#columns') is not null drop table #columns
insert #columns select name from sys.all_columns
set #xColumns = (select name from #columns for xml path('columns'))
delete #columns
print 'XML data size: ' + cast(datalength(#xColumns) as varchar(30))
--raiserror('selecting', 10, 1) with nowait
--select ColumnNames.value('.', 'nvarchar(300)') name
--from #xColumns.nodes('/columns/name') T1(ColumnNames)
raiserror('selecting into #columns', 10, 1) with nowait
select ColumnNames.value('.', 'nvarchar(300)') name
into #columns
from #xColumns.nodes('/columns/name') T1(ColumnNames)
raiserror('inserting #columns', 10, 1) with nowait
insert #columns
select ColumnNames.value('.', 'nvarchar(300)') name
from #xColumns.nodes('/columns/name') T1(ColumnNames)
Thanks a bunch!!

This is a bug in SQL Server 2008.
Use
insert #columns
select ColumnNames.value('.', 'nvarchar(300)') name
from #xColumns.nodes('/columns/name') T1(ColumnNames)
OPTION (OPTIMIZE FOR ( #xColumns = NULL ))
This workaround is from an item on the Microsoft Connect Site which also mentions a hotfix for this Eager Spool / XML Reader issue is available (under traceflag 4130).
The reason for the performance regression is explained in a different connect item
The spool was introduced due to a general halloween protection logic
(that is not needed for the XQuery expressions).

Looks to be an issue specific to SQL Server 2008. When I run the code in SQL Server 2005, both inserts run quickly and produce identical execution plans that start with the fragment shown below as Plan 1. In 2008, the first insert uses Plan 1 but the second insert produces Plan 2. The remainder of both plans beyond the fragment shown are identical.
Plan 1
Plan 2

SQL Server: what will trigger execution plan?

If I have statement
DECLARE #i INT;
DECLARE #d NUMERIC(9,3);
SET #i = 123;
SET #d = #i;
SELECT #d;
and I include actual execution plan and run this query, I don't get an execution plan. Will the query trigger execution plan only when there is FROM statement in the batch?

The simple answer is you don't get execution plans without table access.
Execution plans are what the optimiser produces: it work out the best way to satisfy the query based on indexes, statistics, etc.
What you have above is trivial and has no table access. Why do you need a plan?
Edit:
A derived table is table access as per Lucero's example in comments
Edit 2:
"Trivial" table access gives constant scans, not real scans or seeks:
SELECT * FROM sys.tables WHERE 1=0
Lucero's examples in comments

What you mean by what will trigger execution plan? Also I didn't understand I include actual execution plan and run this query, I don't get an execution plan. Hope this link may help you.
SQL Tuning Tutorial - Understanding
a Database Execution Plan (1)

I would assume that a query plan is generated whenever a set-based operation needs to be performed.

Yes you need a from clause.
You can do like this
declare #i int
declare #d numeric(9,3)
set #i = 123
select #d = #i
from (select 1) as x(x)
select #d
And in the execution plan you see this
<ScalarOperator ScalarString="CONVERT_IMPLICIT(numeric(9,3),[#i],0)">

SQL Server table perf comparison -- temp tables, or table variables? Or something else?

In SQL Server, I'm trying to do a comparative analysis between two different table structures with regard to insert performance given different keys. Does it matter if I use a table variable to do this testing, or should I use a temporary table? Or do I need to go to the trouble of actually creating the tables and indexes?
Specifically, I'm currently using the following script:
DECLARE #uniqueidentifierTest TABLE
(
--yes, this is terrible, but I am looking for numbers on how bad this is :)
tblIndex UNIQUEIDENTIFIER PRIMARY KEY CLUSTERED,
foo INT,
blah VARCHAR(100)
)
DECLARE #intTest TABLE
(
tblindex INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
foo INT,
blah VARCHAR(100)
)
DECLARE #iterations INT = 250000
DECLARE #ctrl INT = 1
DECLARE #guidKey UNIQUEIDENTIFIER
DECLARE #intKey INT
DECLARE #foo INT = 1234
DECLARE #blah VARCHAR(100) = 'asdfjifsdj fds89fsdio23r'
SET NOCOUNT ON
--test uniqueidentifier pk inserts
PRINT 'begin uniqueidentifier insert test at ' + CONVERT(VARCHAR(50), GETDATE(), 109)
WHILE #ctrl < #iterations
BEGIN
SET #guidKey = NEWID()
INSERT INTO #uniqueidentifierTest (tblIndex, foo, blah)
VALUES (#guidKey, #foo, #blah)
SET #ctrl = #ctrl + 1
END
PRINT 'end uniqueidentifier insert test at ' + CONVERT(VARCHAR(50), GETDATE(), 109)
SET #CTRL = 1
--test int pk inserts
PRINT 'begin int insert test at ' + CONVERT(VARCHAR(50), GETDATE(), 109)
WHILE #ctrl < #iterations
BEGIN
INSERT INTO #intTest (foo, blah)
VALUES (#foo, #blah)
SET #ctrl = #ctrl + 1
END
PRINT 'end int insert test at ' + CONVERT(VARCHAR(50), GETDATE(), 109)
SET NOCOUNT OFF

If you want to compare actual performance, you need to create the tables and indexes (and everything else involved). While a temp table will be a much better analog than a table variable, neither is a substitute for an actual permanent table structure if you're seeking performance metrics.
All of that being said, however, you should avoid using uniqueidentifier as a primary key, or, at the very least, use newsequentialid() rather than newid(). Having a clustered index means that the rows will actually be stored in physical order. If an inserted value is out of sequence, SQL Server will have to rearrange the rows in order to insert it into its proper place.

First of all never ever cluster on a uniqueidentifier when using newid(), it will cause fragmentation and thus page splits, if you have to use a GUID then do it like this
create table #test (id uniqueidentifier primary key defualt newsequentialid())
newsequentialid() won't cause page splits
Still an int is still better as the PK since now all your non clustered indexes and foreign keys will be smaller and now you need less IO to get the same numbers of rows back

I dunno why but I'd like to cite Remus Rusanu [1]:
First of all, you need to run the query repeatedly under each [censored] and average the result, discarding the one with the maximum time. This will eliminate the buffer warm up impact: you want all runs to be on a warm cache, not have one query warm the cache and pay the penalty in comparison.
Next, you need to make sure you measure under realistic concurrency scenario. IF you will have updates/inserts/deletes occur under real life, then you must add them to your test, since they will impact tremendously the reads under various isolation level. The last thing you want is to conclude 'serializable reads are fastest, lets use them everywhere' and then watch the system melt down in production because everything is serialized.
1) Running the query on a cold cache is not accurate. Your production queries will not run on a cold cache, you'll be optimizing an unrealistic scenario and you don't measure the query, you are really measuring the disk read throughput. You need to measure the performance on a warm cache as well, and keep track of both (cold run time, warm run times).
How relevant is the cache for a large query (millions of rows) that under normal circumstances runs only once for particular data?
Still very relevant. Even if the data is so large that it never fits in memory and each run has to re-read every page of the table, there is still the caching of non-leaf pages (ie. hot pages in the table, root or near root), cache of narrower non-clustered indexes, cache of table metadata. Don't think at your table as an ISAM file
[1] Why better isolation level means better performance in SQL Server
Why better isolation level means better performance in SQL Server

Stored Procedure; Insert Slowness

I have an SP that takes 10 seconds to run about 10 times (about a second every time it is ran). The platform is asp .net, and the server is SQL Server 2005. I have indexed the table (not on the PK also), and that is not the issue. Some caveats:
usp_SaveKeyword is not the issue. I commented out that entire SP and it made not difference.
I set #SearchID to 1 and the time was significantly reduced, only taking about 15ms on average for the transaction.
I commented out the entire stored procedure except the insert into tblSearches and strangely it took more time to execute.
Any ideas of what could be going on?
set ANSI_NULLS ON
go
ALTER PROCEDURE [dbo].[usp_NewSearch]
#Keyword VARCHAR(50),
#SessionID UNIQUEIDENTIFIER,
#time SMALLDATETIME = NULL,
#CityID INT = NULL
AS
BEGIN
SET NOCOUNT ON;
IF #time IS NULL SET #time = GETDATE();
DECLARE #KeywordID INT;
EXEC #KeywordID = usp_SaveKeyword #Keyword;
PRINT 'KeywordID : '
PRINT #KeywordID
DECLARE #SearchID BIGINT;
SELECT TOP 1 #SearchID = SearchID
FROM tblSearches
WHERE SessionID = #SessionID
AND KeywordID = #KeywordID;
IF #SearchID IS NULL BEGIN
INSERT INTO tblSearches
(KeywordID, [time], SessionID, CityID)
VALUES
(#KeywordID, #time, #SessionID, #CityID)
SELECT Scope_Identity();
END
ELSE BEGIN
SELECT #SearchID
END
END

Why are you using top 1 #SearchID instead of max (SearchID) or where exists in this query? top requires you to run the query and retrieve the first row from the result set. If the result set is large this could consume quite a lot of resources before you get out the final result set.
SELECT TOP 1 #SearchID = SearchID
FROM tblSearches
WHERE SessionID = #SessionID
AND KeywordID = #KeywordID;
I don't see any obvious reason for this - either of aforementioned constructs should get you something semantically equivalent to this with a very cheap index lookup. Unless I'm missing something you should be able to do something like
select #SearchID = isnull (max (SearchID), -1)
from tblSearches
where SessionID = #SessionID
and KeywordID = #KeywordID
This ought to be fairly efficient and (unless I'm missing something) semantically equivalent.

Enable "Display Estimated Execution Plan" in SQL Management Studio - where does the execution plan show you spending the time? It'll guide you on the heuristics being used to optimize the query (or not in this case). Generally the "fatter" lines are the ones to focus on - they're ones generating large amounts of I/O.
Unfortunately even if you tell us the table schema, only you will be able to see actually how SQL chose to optimize the query. One last thing - have you got a clustered index on tblSearches?

Triggers!
They are insidious indeed.

What is the clustered index on tblSearches? If the clustered index is not on primary key, the database may be spending a lot of time reordering.
How many other indexes do you have?
Do you have any triggers?
Where does the execution plan indicate the time is being spent?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas