I am trying to speed up my monster of a stored procedure that works on millions of records across many tables.
I've stumbled on this:
Is it possible to use a Stored Procedure as a subquery in SQL Server 2008?
My question is why using a table valued function be better then using a temp table.
Suppose my stored procedure #SP1
declare #temp table(a int)
insert into #temp
select a from BigTable
where someRecords like 'blue%'
update AnotherBigTable
set someRecords = 'were blue'
from AnotherBigTable t
inner join
#temp
on t.RecordID = #temp.a
After reading the above link it seems that the consunsus is instead of using my #temp as temp table, rather create a table valued function that will do that select.
(and inline it if its a simple select like I have in this example) But my actual selects are multiple and often not simple (ie with subqueires, etc)
What is the benefit?
Thanks
Generally, you would use a temporary table (#) instead of a table variable. Table variables are really only useful for
functions, which cannot create temporary objects
passing table-valued data (sets) as read-only parameters
gaming statistics for certain query edge-cases
execution plan stability (related to statistics and also the fact that INSERT INTO table variables cannot use a parallel plan)
prior to SQL Server 2012, #temp tables inherit collation from the tempdb whereas #table variables uses the current database collation
Other than those, a #temporary table will work as well as if not better than a variable.
Further reading: What's the difference between a temp table and table variable in SQL Server?
Probably no longer relevant... but two things I might suggest that take two different approaches.
Simple approach 1:
Try a primary key on your table valued variable:
declare #temp table(a int, primary key(a))
Simple approach 2:
In this particular case try a common table expression (CTE)...
;with
temp as (
SELECT a as Id
FROM BigTable
WHERE someRecords like '%blue'
),
UPDATE AnotherBigTable
SET someRecords = 'were Blue'
FROM AnotherBigTable
JOIN temp
ON temp.Id = AnotherBigTable.RecordId
CTE's are really great and help to isolate specific data sets you actually want to work on from the myriad of records contained in larger tables... and if you find your self utilizing the same CTE declaration repeatedly consider formalizing that expression into a view. Views are an often overlooked and very valuable tool for DBA and DB programmers to manage large complex data sets with lots of records and relationships.
Related
Let's say I have an Account table and several other tables which reference Account.
Is there actually a performance difference between:
DECLARE #accountIDs table (id int);
INSERT INTO #accountIDs
SELECT id
FROM Account
DELETE TableA
WHERE accountFk IN (SELECT id FROM #accountIDs)
and
DELETE TableA
WHERE accountFk IN (SELECT id FROM Account)
Thanks.
Sure!
The declared table variable needs quite some overhead to be created and filled and there won't be any indexes (following your sample code). You could create your declared table with an index (Thx Gordon for your hint!), but this would just add even more overhead...
The id in your table Account will be indexed (probably / hopefully). The direct access will be much faster - for sure!
Furthermore you need a procedural approach to perform the first. Declaring a table valued variable will not be allowed in views or inline TVF.
As with any question about performance, you should test the two methods.
In both cases, the performance is likely to be dominated by the deletion, not the finding of the records. The one advantage of a table variable/temporary table is that the statistics are more accurate when it is created.
The queries that you have provided suggest that you would want two separate indexes on (id) and on (accountFk). With those indexes, I see little advantage to using a temporary table.
All are used to store data temporarily.
Are there any performance difference (time complexity and space complexity) for these 3 types of temporary table?
Performance issue should depend on whether the result is saved on disk or memory.
I have searched a lot but did not get satisfactory answer.
CTE - Common Table Expressions
CTE stands for Common Table expressions. It was introduced with SQL Server 2005. It is a temporary result set and typically it may be a result of complex sub-query. Unlike temporary table its life is limited to the current query. It is defined by using WITH statement. CTE improves readability and ease in maintenance of complex queries and sub-queries. Always begin CTE with semicolon.
With CTE1(Address, Name, Age)--Column names for CTE, which are optional
AS
(
SELECT Addr.Address, Emp.Name, Emp.Age from Address Addr
INNER JOIN EMP Emp ON Emp.EID = Addr.EID
)
SELECT * FROM CTE1 --Using CTE
WHERE CTE1.Age > 50
ORDER BY CTE1.NAME
When to use CTE?
This is used to store result of a complex sub query for further use.
This is also used to create a recursive query.
Temporary Tables
In SQL Server, temporary tables are created at run-time and you can do all the operations which you can do on a normal table. These tables are created inside Tempdb database. Based on the scope and behavior temporary tables are of two types as given below-
Local Temp Table
Local temp tables are only available to the SQL Server session or connection (means single user) that created the tables. These are automatically deleted when the session that created the tables has been closed. Local temporary table name is stared with single hash ("#") sign.
CREATE TABLE #LocalTemp
(
UserID int,
Name varchar(50),
Address varchar(150)
)
GO
insert into #LocalTemp values ( 1, 'Shailendra','Noida');
GO
Select * from #LocalTemp
The scope of Local temp table exist to the current session of current user means to the current query window. If you will close the current query window or open a new query window and will try to find above created temp table, it will give you the error.
Global Temp Table
Global temp tables are available to all SQL Server sessions or connections (means all the user). These can be created by any SQL Server connection user and these are automatically deleted when all the SQL Server connections have been closed. Global temporary table name is stared with double hash ("##") sign.
CREATE TABLE ##GlobalTemp
(
UserID int,
Name varchar(50),
Address varchar(150)
)
GO
insert into ##GlobalTemp values ( 1, 'Shailendra','Noida');
GO
Select * from ##GlobalTemp
Global temporary tables are visible to all SQL Server connections while Local temporary tables are visible to only current SQL Server connection.
Table Variable
This acts like a variable and exists for a particular batch of query execution. It gets dropped once it comes out of batch. This is also created in the Tempdb database but not the memory. This also allows you to create primary key, identity at the time of Table variable declaration but not non-clustered index.
GO
DECLARE #TProduct TABLE
(
SNo INT IDENTITY(1,1),
ProductID INT,
Qty INT
)
--Insert data to Table variable #Product
INSERT INTO #TProduct(ProductID,Qty)
SELECT DISTINCT ProductID, Qty FROM ProductsSales ORDER BY ProductID ASC
--Select data
Select * from #TProduct
--Next batch
GO
Select * from #TProduct --gives error in next batch
Note
Temp Tables are physically created in the Tempdb database. These tables act as the normal table and also can have constraints, index like normal tables.
CTE is a named temporary result set which is used to manipulate the complex sub-queries data. This exists for the scope of statement. This is created in memory rather than Tempdb database. You cannot create any index on CTE.
Table Variable acts like a variable and exists for a particular batch of query execution. It gets dropped once it comes out of batch. This is also created in the Tempdb database but not the memory.
Fairly broad topic to cover all the ins and outs. Here are a few high level differences which would give you more ideas for researching this.
CTEs are part of the same query and should be thought of as being very similar to a sub-query. A CTE allows for better readability and code-reuse (same CTE can be reused in different parts of the overall query).
Table variables and Temporary tables should be thought of as being similar real tables but with optimizations that enable SQL server to make operations against them fast especially when used with relatively small data sets. Note that although these operate against the tempdb, that doesn't automatically mean data stored here is actually persisted to disk. With each new version of SQL server, there have been additional optimizations (memory-optimized tables for example) to make these constructs faster, especially for their mainline use case of simplifying complex queries.
See this for more information on this topic:
https://www.brentozar.com/archive/2014/06/temp-tables-table-variables-memory-optimized-table-variables/
I need to do a very complex query.
At one point, this query must have a join to a view that cannot be indexed unfortunately.
This view is also a complex view joining big tables.
View's output can be simplified as this:
PID (int), Kind (int), Date (date), D1,D2..DN
where PID and Date and Kind fields are not unique (there may be more than one row having same combination of pid,kind,date), but are those that will be used in join like this
left join ComplexView mkcs on mkcs.PID=q4.PersonID and mkcs.Date=q4.date and mkcs.Kind=1
left join ComplexView mkcl on mkcl.PID=q4.PersonID and mkcl.Date=q4.date and mkcl.Kind=2
left join ComplexView mkco on mkco.PID=q4.PersonID and mkco.Date=q4.date and mkco.Kind=3
Now, if I just do it like this, execution of the query takes significant time because the complex view is ran three times I assume, and out of its huge amount of rows only some are actually used (like, out of 40000 only 2000 are used)
What i did is declare #temptable, and insert into #temptable select * from ComplexView where Date... - one time per query I select only the rows I am going to use from my ComplexView, and then I am joining this #temptable.
This reduced execution time significantly.
However, I noticed, that if I make a table in my database, and add a clustered index on PID,Kind,Date (non-unique clustered) and take data from this table, then doing delete * from this table and insert into this table from complex view takes some seconds (3 or 4), and then using this table in my query (left joining it three times) take down query time to half, from 1 minute to 30 seconds!
So, my question is, first of all - is it possible to create indexes on declared #temptables.
And then - I've seen people talk about "create #temptable" syntax. Maybe this is what i need? Where can I read about what's the difference between declare #temptable and create #temptable? What shall I use for a query like mine? (this query is for MS Reporting Services report, if it matters).
#tablename is a physical table, stored in tempdb that the server will drop automatically when the connection that created it is closed, #tablename is a table stored in memory & lives for the lifetime of the batch/procedure that created it, just like a local variable.
You can only add a (non PK) index to a #temp table.
create table #blah (fld int)
create nonclustered index idx on #blah (fld)
It's not a complete answer but #table will create a temporary table that you need to drop or it will persist in your database. #table is a table variable that will not persist longer than your script.
Also, I think this post will answer the other part of your question.
Creating an index on a table variable
Yes, you can create indexes on temp tables or table variables. http://sqlserverplanet.com/sql/create-index-on-table-variable/
The #tableName syntax is a table variable. They are rather limited. The syntax is described in the documentation for DECLARE #local_variable. You can kind of have indexes on table variables, but only indirectly by specifying PRIMARY KEY and UNIQUE constraints on columns. So, if your data in the columns that you need an index on happens to be unique, you can do this. See this answer. This may be “enough” for many use cases, but only for small numbers of rows. If you don’t have indexes on your table variable, the optimizer will generally treat table variables as if they contain one row (regardless of how many rows there actually are) which can result in terrible query plans if you have hundreds or thousands of rows in them instead.
The #tableName syntax is a locally-scoped temporary table. You can create these either using SELECT…INTO #tableName or CREATE TABLE #tableName syntax. The scope of these tables is a little bit more complex than that of variables. If you have CREATE TABLE #tableName in a stored procedure, all references to #tableName in that stored procedure will refer to that table. If you simply reference #tableName in the stored procedure (without creating it), it will look into the caller’s scope. So you can create #tableName in one procedure, call another procedure, and in that other procedure read/update #tableName. However, once the procedure that created #tableName runs to completion, that table will be automatically unreferenced and cleaned up by SQL Server. So, there is no reason to manually clean up these tables unless if you have a procedure which is meant to loop/run indefinitely or for long periods of time.
You can define complex indexes on temporary tables, just as if they are permanent tables, for the most part. So if you need to index columns but have duplicate values which prevents you from using UNIQUE, this is the way to go. You do not even have to worry about name collisions on indexes. If you run something like CREATE INDEX my_index ON #tableName(MyColumn) in multiple sessions which have each created their own table called #tableName, SQL Server will do some magic so that the reuse of the global-looking identifier my_index does not explode.
Additionally, temporary tables will automatically build statistics, etc., like normal tables. The query optimizer will recognize that temporary tables can have more than just 1 row in them, which can in itself result in great performance gains over table variables. Of course, this also is a tiny amount of overhead. Though this overhead is likely worth it and not noticeable if your query’s runtime is longer than one second.
To extend Alex K.'s answer, you can create the PRIMARY KEY on a temp table
IF OBJECT_ID('tempdb..#tempTable') IS NOT NULL
DROP TABLE #tempTable
CREATE TABLE #tempTable
(
Id INT PRIMARY KEY
,Value NVARCHAR(128)
)
INSERT INTO #tempTable
VALUES
(1, 'first value')
,(3, 'second value')
-- will cause Violation of PRIMARY KEY constraint 'PK__#tempTab__3214EC071AE8C88D'. Cannot insert duplicate key in object 'dbo.#tempTable'. The duplicate key value is (1).
--,(1, 'first value one more time')
SELECT * FROM #tempTable
what are the alternatives to using cursors in sql server.
i already know a trick which involves using the Row_Number() function which numbers the rows then i can loop over them one by one. any other ideas?
When I don't want to complicate things with SQL cursors I often populate temporary tables or table variables, then do a while loop to go through them.
For example:
declare #someresults table (
id int,
somevalue varchar(10)
)
insert into #someresults
select
id,
somevalue
from
whatevertable
declare #currentid int
declare #currentvalue varchar(10)
while exists(select 1 from #someresults)
begin
select top 1 #currentid = id, #currentvalue = somevalue from #someresults
--work with those values here
delete from #someresults where id = #currentid
end
Several options:
Best is to re-analyze the problem from a Mathematical Set-based perspective. If this can be done, it will most likely provide the best solution in both calrity and performance.
Second, use a Temporary table variable to store only the keys. Insert the keys into this temp table variable using a recursive Common table expression if possible, or failing that, use a T-SQL programming loop (Where Clause or constructed iterative loop of some kind), and then when the temp table variable has all the key values in it, use it to join to the real tables in the appropriate way to execute whatever your real SQL design goal happens to be... Use only the keys as you recursively or iteratively build the temp table to keep it as narrow as possible during the expensive construction phase...
use a temporary table (on disk) in a similar way to the above. This is a better choice when you need this temp table variable to contain more than a few columns and/or a very large (> 1M) number of rows, or if you need the temp table to have more than a primary Key index....
This answer had me slightly confused. What is a 'select to a temp table' and can someone show me a simple example of it?
A temp table is a table that exists just for the duration of the stored procedure and is commonly used to hold temporary results on the way to a final calculation.
In SQL Server, all temp tables are prefixed with a # so if you issue a statement like
Create table #tmp(id int, columnA)
Then SQL Server will automatically know that the table is temporary, and it will be destroyed when the stored procedure goes out of scope unless the table is explicitly dropped like
drop table #tmp
I commonly use them in stored procedures that run against huge tables with a high transaction volume, because I can insert the subset of data that I need into the temp table as a temporary copy and work on the data without fear of bringing down a production system if what I'm doing with the data is a fairly intense operation.
In SQL Server all temp tables live in the tempdb datase.
See this article for more information.
If you have a complex set of results that you want to use again and again, then do you keep querying the main tables (where data will be changing, and may impact performance) or do you store them up in a temporary table for more processing. It's better to use a temporary table often.
Or you really need to iterate through rows in a non-set fashion you can use a temp table (or CURSOR)
If you do simple CRUD against a DB then you probably have no need for temp tables
You have:
table variables: DECLARE #foo TABLE (bar int...)
explict temp tables: CREATE TABLE #foo (bar int...)
inline created: SELECT ... INTO #foo FROM...
A temp table is a table that is dynamically created by using some such syntax:
SELECT [columns] INTO #MyTable FROM SomeExistingTable
What you then have is a table that is populated with the values that you selected into it. Now you can select against it, update it, whatever.
SELECT FirstName FROM #MyTable WHERE...
The table lives for some predetermined scope of time, for example, for the duration of the stored procedure in which it lives. Then it's gone from memory and never accessible again. Temporary.
HTH
You can use SELECT ... INTO to both create a temp table and populate it like so:
SELECT Col1, Col2...
INTO #Table
FROM ...
WHERE ...
(BTW, this syntax is for SQL Server and Sybase. )
EDIT Once you had created the table like I did above, you can then use it other queries on the same connection:
Select
From OtherTable
Join #Table
On #Table.Col = OtherTable.Col
The key here is that it all happens on the same connection. Thus, to create and use a temp table from a client script would be awkward in that you would have to ensure that all subsequent uses of the table were on the same connection. Instead, most people use temp tables in stored procedures where they create the table on one line and then use a few lines later in the same procedure.
Think of temp tables as sql variable of type 'table'. Use them in scripts and stored procedures. It comes handy when you need to manipulate data that is not simple value but a subset of a database table (both vertical and horizontal).
When you realize these benefits then you can take advantage of more power that comes with various sharing models (scope) for temp tables: private, global, transaction, etc. All major RDBMS engines support temp tables but there is no standard features or syntax for them.
For example of usage see answer.