Difference between CTE, Temp Table and Table Variable in MSSQL - sql

All are used to store data temporarily.
Are there any performance difference (time complexity and space complexity) for these 3 types of temporary table?
Performance issue should depend on whether the result is saved on disk or memory.
I have searched a lot but did not get satisfactory answer.

CTE - Common Table Expressions
CTE stands for Common Table expressions. It was introduced with SQL Server 2005. It is a temporary result set and typically it may be a result of complex sub-query. Unlike temporary table its life is limited to the current query. It is defined by using WITH statement. CTE improves readability and ease in maintenance of complex queries and sub-queries. Always begin CTE with semicolon.
With CTE1(Address, Name, Age)--Column names for CTE, which are optional
AS
(
SELECT Addr.Address, Emp.Name, Emp.Age from Address Addr
INNER JOIN EMP Emp ON Emp.EID = Addr.EID
)
SELECT * FROM CTE1 --Using CTE
WHERE CTE1.Age > 50
ORDER BY CTE1.NAME
When to use CTE?
This is used to store result of a complex sub query for further use.
This is also used to create a recursive query.
Temporary Tables
In SQL Server, temporary tables are created at run-time and you can do all the operations which you can do on a normal table. These tables are created inside Tempdb database. Based on the scope and behavior temporary tables are of two types as given below-
Local Temp Table
Local temp tables are only available to the SQL Server session or connection (means single user) that created the tables. These are automatically deleted when the session that created the tables has been closed. Local temporary table name is stared with single hash ("#") sign.
CREATE TABLE #LocalTemp
(
UserID int,
Name varchar(50),
Address varchar(150)
)
GO
insert into #LocalTemp values ( 1, 'Shailendra','Noida');
GO
Select * from #LocalTemp
The scope of Local temp table exist to the current session of current user means to the current query window. If you will close the current query window or open a new query window and will try to find above created temp table, it will give you the error.
Global Temp Table
Global temp tables are available to all SQL Server sessions or connections (means all the user). These can be created by any SQL Server connection user and these are automatically deleted when all the SQL Server connections have been closed. Global temporary table name is stared with double hash ("##") sign.
CREATE TABLE ##GlobalTemp
(
UserID int,
Name varchar(50),
Address varchar(150)
)
GO
insert into ##GlobalTemp values ( 1, 'Shailendra','Noida');
GO
Select * from ##GlobalTemp
Global temporary tables are visible to all SQL Server connections while Local temporary tables are visible to only current SQL Server connection.
Table Variable
This acts like a variable and exists for a particular batch of query execution. It gets dropped once it comes out of batch. This is also created in the Tempdb database but not the memory. This also allows you to create primary key, identity at the time of Table variable declaration but not non-clustered index.
GO
DECLARE #TProduct TABLE
(
SNo INT IDENTITY(1,1),
ProductID INT,
Qty INT
)
--Insert data to Table variable #Product
INSERT INTO #TProduct(ProductID,Qty)
SELECT DISTINCT ProductID, Qty FROM ProductsSales ORDER BY ProductID ASC
--Select data
Select * from #TProduct
--Next batch
GO
Select * from #TProduct --gives error in next batch
Note
Temp Tables are physically created in the Tempdb database. These tables act as the normal table and also can have constraints, index like normal tables.
CTE is a named temporary result set which is used to manipulate the complex sub-queries data. This exists for the scope of statement. This is created in memory rather than Tempdb database. You cannot create any index on CTE.
Table Variable acts like a variable and exists for a particular batch of query execution. It gets dropped once it comes out of batch. This is also created in the Tempdb database but not the memory.

Fairly broad topic to cover all the ins and outs. Here are a few high level differences which would give you more ideas for researching this.
CTEs are part of the same query and should be thought of as being very similar to a sub-query. A CTE allows for better readability and code-reuse (same CTE can be reused in different parts of the overall query).
Table variables and Temporary tables should be thought of as being similar real tables but with optimizations that enable SQL server to make operations against them fast especially when used with relatively small data sets. Note that although these operate against the tempdb, that doesn't automatically mean data stored here is actually persisted to disk. With each new version of SQL server, there have been additional optimizations (memory-optimized tables for example) to make these constructs faster, especially for their mainline use case of simplifying complex queries.
See this for more information on this topic:
https://www.brentozar.com/archive/2014/06/temp-tables-table-variables-memory-optimized-table-variables/

Related

Poor performance of SQL query with Table Variable or User Defined Type

I have a SELECT query on a view, that contains 500.000+ rows. Let's keep it simple:
SELECT * FROM dbo.Document WHERE MemberID = 578310
The query runs fast, ~0s
Let's rewrite it to work with the set of values, which reflects my needs more:
SELECT * FROM dbo.Document WHERE MemberID IN (578310)
This is same fast, ~0s
But now, the set is of IDs needs to be variable; let's define it as:
DECLARE #AuthorizedMembers TABLE
(
MemberID BIGINT NOT NULL PRIMARY KEY, --primary key
UNIQUE NONCLUSTERED (MemberID) -- and index, as if it could help...
);
INSERT INTO #AuthorizedMembers SELECT 578310
The set contains the same, one value but is a table variable now. The performance of such query drops to 2s, and in more complicated ones go as high as 25s and more, while with a fixed id it stays around ~0s.
SELECT *
FROM dbo.Document
WHERE MemberID IN (SELECT MemberID FROM #AuthorizedMembers)
is the same bad as:
SELECT *
FROM dbo.Document
WHERE EXISTS (SELECT MemberID
FROM #AuthorizedMembers
WHERE [#AuthorizedMembers].MemberID = Document.MemberID)
or as bad as this:
SELECT *
FROM dbo.Document
INNER JOIN #AuthorizedMembers AS AM ON AM.MemberID = Document.MemberID
The performance is same for all the above and always much worse than the one with a fixed value.
The dynamic SQL comes with help easily, so creating an nvarchar like (id1,id2,id3) and building a fixed query with it keeps my query times ~0s. But I would like to avoid using Dynamic SQL as much as possible and if I do, I would like to keep it always the same string, regardless the values (using parameters - which above method does not allow).
Any ideas how to get the performance of the table variable similar to a fixed array of values or avoid building a different dynamic SQL code for each run?
P.S. I have tried the above with a user defined type with same results
Edit:
The results with a temporary table, defined as:
CREATE TABLE #AuthorizedMembers
(
MemberID BIGINT NOT NULL PRIMARY KEY
);
INSERT INTO #AuthorizedMembers SELECT 578310
have improved the execution time up to 3 times. (13s -> 4s). Which is still significantly higher than dynamic SQL <1s.
Your options:
Use a temporary table instead of a TABLE variable
If you insist on using a TABLE variable, add OPTION(RECOMPILE) at the end of your query
Explanation:
When the compiler compiles your statement, the TABLE variable has no rows in it and therefore doesn't have the proper cardinalities. This results in an inefficient execution plan. OPTION(RECOMPILE) forces the statement to be recompiled when it is run. At that point the TABLE variable has rows in it and the compiler has better cardinalities to produce an execution plan.
The general rule of thumb is to use temporary tables when operating on large datasets and table variables for small datasets with frequent updates. Personally I only very rarely use TABLE variables because they generally perform poorly.
I can recommend this answer on the question "What's the difference between temporary tables and table variables in SQL Server?" if you want an in-depth analysis on the differences.

Creating temporary tables in SQL

I am trying to create a temporary table that selects only the data for a certain register_type. I wrote this query but it does not work:
$ CREATE TABLE temp1
(Select
egauge.dataid,
egauge.register_type,
egauge.timestamp_localtime,
egauge.read_value_avg
from rawdata.egauge
where register_type like '%gen%'
order by dataid, timestamp_localtime ) $
I am using PostgreSQL.
Could you please tell me what is wrong with the query?
You probably want CREATE TABLE AS - also works for TEMPORARY (TEMP) tables:
CREATE TEMP TABLE temp1 AS
SELECT dataid
, register_type
, timestamp_localtime
, read_value_avg
FROM rawdata.egauge
WHERE register_type LIKE '%gen%'
ORDER BY dataid, timestamp_localtime;
This creates a temporary table and copies data into it. A static snapshot of the data, mind you. It's just like a regular table, but resides in RAM if temp_buffers is set high enough. It is only visible within the current session and dies at the end of it. When created with ON COMMIT DROP it dies at the end of the transaction.
Temp tables come first in the default schema search path, hiding other visible tables of the same name unless schema-qualified:
How does the search_path influence identifier resolution and the "current schema"
If you want dynamic, you would be looking for CREATE VIEW - a completely different story.
The SQL standard also defines, and Postgres also supports: SELECT INTO. But its use is discouraged:
It is best to use CREATE TABLE AS for this purpose in new code.
There is really no need for a second syntax variant, and SELECT INTO is used for assignment in plpgsql, where the SQL syntax is consequently not possible.
Related:
Combine two tables into a new one so that select rows from the other one are ignored
ERROR: input parameters after one with a default value must also have defaults in Postgres
CREATE TABLE LIKE (...) only copies the structure from another table and no data:
The LIKE clause specifies a table from which the new table
automatically copies all column names, their data types, and their
not-null constraints.
If you need a "temporary" table just for the purpose of a single query (and then discard it) a "derived table" in a CTE or a subquery comes with considerably less overhead:
Change the execution plan of query in postgresql manually?
Combine two SELECT queries in PostgreSQL
Reuse computed select value
Multiple CTE in single query
Update with results of another sql
http://www.postgresql.org/docs/9.2/static/sql-createtable.html
CREATE TEMP TABLE temp1 LIKE ...

Why use table valued function instead of a temp table in SQL?

I am trying to speed up my monster of a stored procedure that works on millions of records across many tables.
I've stumbled on this:
Is it possible to use a Stored Procedure as a subquery in SQL Server 2008?
My question is why using a table valued function be better then using a temp table.
Suppose my stored procedure #SP1
declare #temp table(a int)
insert into #temp
select a from BigTable
where someRecords like 'blue%'
update AnotherBigTable
set someRecords = 'were blue'
from AnotherBigTable t
inner join
#temp
on t.RecordID = #temp.a
After reading the above link it seems that the consunsus is instead of using my #temp as temp table, rather create a table valued function that will do that select.
(and inline it if its a simple select like I have in this example) But my actual selects are multiple and often not simple (ie with subqueires, etc)
What is the benefit?
Thanks
Generally, you would use a temporary table (#) instead of a table variable. Table variables are really only useful for
functions, which cannot create temporary objects
passing table-valued data (sets) as read-only parameters
gaming statistics for certain query edge-cases
execution plan stability (related to statistics and also the fact that INSERT INTO table variables cannot use a parallel plan)
prior to SQL Server 2012, #temp tables inherit collation from the tempdb whereas #table variables uses the current database collation
Other than those, a #temporary table will work as well as if not better than a variable.
Further reading: What's the difference between a temp table and table variable in SQL Server?
Probably no longer relevant... but two things I might suggest that take two different approaches.
Simple approach 1:
Try a primary key on your table valued variable:
declare #temp table(a int, primary key(a))
Simple approach 2:
In this particular case try a common table expression (CTE)...
;with
temp as (
SELECT a as Id
FROM BigTable
WHERE someRecords like '%blue'
),
UPDATE AnotherBigTable
SET someRecords = 'were Blue'
FROM AnotherBigTable
JOIN temp
ON temp.Id = AnotherBigTable.RecordId
CTE's are really great and help to isolate specific data sets you actually want to work on from the myriad of records contained in larger tables... and if you find your self utilizing the same CTE declaration repeatedly consider formalizing that expression into a view. Views are an often overlooked and very valuable tool for DBA and DB programmers to manage large complex data sets with lots of records and relationships.

Is it possible to add index to a temp table? And what's the difference between create #t and declare #t

I need to do a very complex query.
At one point, this query must have a join to a view that cannot be indexed unfortunately.
This view is also a complex view joining big tables.
View's output can be simplified as this:
PID (int), Kind (int), Date (date), D1,D2..DN
where PID and Date and Kind fields are not unique (there may be more than one row having same combination of pid,kind,date), but are those that will be used in join like this
left join ComplexView mkcs on mkcs.PID=q4.PersonID and mkcs.Date=q4.date and mkcs.Kind=1
left join ComplexView mkcl on mkcl.PID=q4.PersonID and mkcl.Date=q4.date and mkcl.Kind=2
left join ComplexView mkco on mkco.PID=q4.PersonID and mkco.Date=q4.date and mkco.Kind=3
Now, if I just do it like this, execution of the query takes significant time because the complex view is ran three times I assume, and out of its huge amount of rows only some are actually used (like, out of 40000 only 2000 are used)
What i did is declare #temptable, and insert into #temptable select * from ComplexView where Date... - one time per query I select only the rows I am going to use from my ComplexView, and then I am joining this #temptable.
This reduced execution time significantly.
However, I noticed, that if I make a table in my database, and add a clustered index on PID,Kind,Date (non-unique clustered) and take data from this table, then doing delete * from this table and insert into this table from complex view takes some seconds (3 or 4), and then using this table in my query (left joining it three times) take down query time to half, from 1 minute to 30 seconds!
So, my question is, first of all - is it possible to create indexes on declared #temptables.
And then - I've seen people talk about "create #temptable" syntax. Maybe this is what i need? Where can I read about what's the difference between declare #temptable and create #temptable? What shall I use for a query like mine? (this query is for MS Reporting Services report, if it matters).
#tablename is a physical table, stored in tempdb that the server will drop automatically when the connection that created it is closed, #tablename is a table stored in memory & lives for the lifetime of the batch/procedure that created it, just like a local variable.
You can only add a (non PK) index to a #temp table.
create table #blah (fld int)
create nonclustered index idx on #blah (fld)
It's not a complete answer but #table will create a temporary table that you need to drop or it will persist in your database. #table is a table variable that will not persist longer than your script.
Also, I think this post will answer the other part of your question.
Creating an index on a table variable
Yes, you can create indexes on temp tables or table variables. http://sqlserverplanet.com/sql/create-index-on-table-variable/
The #tableName syntax is a table variable. They are rather limited. The syntax is described in the documentation for DECLARE #local_variable. You can kind of have indexes on table variables, but only indirectly by specifying PRIMARY KEY and UNIQUE constraints on columns. So, if your data in the columns that you need an index on happens to be unique, you can do this. See this answer. This may be “enough” for many use cases, but only for small numbers of rows. If you don’t have indexes on your table variable, the optimizer will generally treat table variables as if they contain one row (regardless of how many rows there actually are) which can result in terrible query plans if you have hundreds or thousands of rows in them instead.
The #tableName syntax is a locally-scoped temporary table. You can create these either using SELECT…INTO #tableName or CREATE TABLE #tableName syntax. The scope of these tables is a little bit more complex than that of variables. If you have CREATE TABLE #tableName in a stored procedure, all references to #tableName in that stored procedure will refer to that table. If you simply reference #tableName in the stored procedure (without creating it), it will look into the caller’s scope. So you can create #tableName in one procedure, call another procedure, and in that other procedure read/update #tableName. However, once the procedure that created #tableName runs to completion, that table will be automatically unreferenced and cleaned up by SQL Server. So, there is no reason to manually clean up these tables unless if you have a procedure which is meant to loop/run indefinitely or for long periods of time.
You can define complex indexes on temporary tables, just as if they are permanent tables, for the most part. So if you need to index columns but have duplicate values which prevents you from using UNIQUE, this is the way to go. You do not even have to worry about name collisions on indexes. If you run something like CREATE INDEX my_index ON #tableName(MyColumn) in multiple sessions which have each created their own table called #tableName, SQL Server will do some magic so that the reuse of the global-looking identifier my_index does not explode.
Additionally, temporary tables will automatically build statistics, etc., like normal tables. The query optimizer will recognize that temporary tables can have more than just 1 row in them, which can in itself result in great performance gains over table variables. Of course, this also is a tiny amount of overhead. Though this overhead is likely worth it and not noticeable if your query’s runtime is longer than one second.
To extend Alex K.'s answer, you can create the PRIMARY KEY on a temp table
IF OBJECT_ID('tempdb..#tempTable') IS NOT NULL
DROP TABLE #tempTable
CREATE TABLE #tempTable
(
Id INT PRIMARY KEY
,Value NVARCHAR(128)
)
INSERT INTO #tempTable
VALUES
(1, 'first value')
,(3, 'second value')
-- will cause Violation of PRIMARY KEY constraint 'PK__#tempTab__3214EC071AE8C88D'. Cannot insert duplicate key in object 'dbo.#tempTable'. The duplicate key value is (1).
--,(1, 'first value one more time')
SELECT * FROM #tempTable

What does 'select to a temp table' mean?

This answer had me slightly confused. What is a 'select to a temp table' and can someone show me a simple example of it?
A temp table is a table that exists just for the duration of the stored procedure and is commonly used to hold temporary results on the way to a final calculation.
In SQL Server, all temp tables are prefixed with a # so if you issue a statement like
Create table #tmp(id int, columnA)
Then SQL Server will automatically know that the table is temporary, and it will be destroyed when the stored procedure goes out of scope unless the table is explicitly dropped like
drop table #tmp
I commonly use them in stored procedures that run against huge tables with a high transaction volume, because I can insert the subset of data that I need into the temp table as a temporary copy and work on the data without fear of bringing down a production system if what I'm doing with the data is a fairly intense operation.
In SQL Server all temp tables live in the tempdb datase.
See this article for more information.
If you have a complex set of results that you want to use again and again, then do you keep querying the main tables (where data will be changing, and may impact performance) or do you store them up in a temporary table for more processing. It's better to use a temporary table often.
Or you really need to iterate through rows in a non-set fashion you can use a temp table (or CURSOR)
If you do simple CRUD against a DB then you probably have no need for temp tables
You have:
table variables: DECLARE #foo TABLE (bar int...)
explict temp tables: CREATE TABLE #foo (bar int...)
inline created: SELECT ... INTO #foo FROM...
A temp table is a table that is dynamically created by using some such syntax:
SELECT [columns] INTO #MyTable FROM SomeExistingTable
What you then have is a table that is populated with the values that you selected into it. Now you can select against it, update it, whatever.
SELECT FirstName FROM #MyTable WHERE...
The table lives for some predetermined scope of time, for example, for the duration of the stored procedure in which it lives. Then it's gone from memory and never accessible again. Temporary.
HTH
You can use SELECT ... INTO to both create a temp table and populate it like so:
SELECT Col1, Col2...
INTO #Table
FROM ...
WHERE ...
(BTW, this syntax is for SQL Server and Sybase. )
EDIT Once you had created the table like I did above, you can then use it other queries on the same connection:
Select
From OtherTable
Join #Table
On #Table.Col = OtherTable.Col
The key here is that it all happens on the same connection. Thus, to create and use a temp table from a client script would be awkward in that you would have to ensure that all subsequent uses of the table were on the same connection. Instead, most people use temp tables in stored procedures where they create the table on one line and then use a few lines later in the same procedure.
Think of temp tables as sql variable of type 'table'. Use them in scripts and stored procedures. It comes handy when you need to manipulate data that is not simple value but a subset of a database table (both vertical and horizontal).
When you realize these benefits then you can take advantage of more power that comes with various sharing models (scope) for temp tables: private, global, transaction, etc. All major RDBMS engines support temp tables but there is no standard features or syntax for them.
For example of usage see answer.