Performance impact of chained CTE vs Temp table - sql

I have the following Chained CTE query (simplified):
;WITH CTE1
AS(
SELECT * FROM TableA
),
CTE2
AS(
SELECT * FROM TableB b INNER JOIN CTE1 c ON b.id = c.id
)
SELECT * FROM CTE2
If I break CTE chain and store data of CTE1 into a temp table then the performance of the overall query improves (from 1 minute 20 seconds to 8 seconds).
;WITH CTE1
AS(
SELECT * FROM TableA
)
SELECT * INTO #Temp FROM CTE1
;WITH CTE2
AS(
SELECT * FROM TableB b INNER JOIN #Temp c ON b.id = c.id
)
SELECT * FROM CTE2
DROP TABLE #Temp
There are complex queries in CTE1 and CTE2. I have just created a simplified version to explain here.
Should breaking CTE chair improve the performance?
SQL Server version: 2008 R2

Obviously, it can, as you yourself have shown.
Why? The most obvious reason is that the optimizer knows the size of a temporary table. That gives it more information for optimizing the query. The CTE is just an estimate. So, the improvement you are seeing is due to the query plan.
Another reason would be if the CTE is referenced multiple times in the query. SQL Server does not materialize CTEs, so the definition code would be run multiple times.
Sometimes, you purposely materialize CTEs as temporary tables so you can add indexes to them. That can also improve performance.
All that said, I prefer to avoid temporary tables. The optimizer is usually pretty good.

Consider cte1 is expensive
;WITH CTE1
AS(
SELECT * FROM TableA
)
SELECT * INTO #Temp FROM CTE1
Above guarantees cte1 is only run once.
The chained cte can evaluate cte1 multiple times.
And even with #temp you should consider index / PK and sort the insert.

This depends upon many factors. Always try to write the single statement, if you can. Premature optimization is the root of a lot of evil.
If you do experience a performance problem, these are some of the advantages to decomposing your single statement:
It can increase maintainability, which is one of many non-functional requirements, by reducing complexity.
It can yield a better plan, so long as the cost of the intermediate materialization and the time saved is less than the original cost.
The intermediate tables can be indexed.
Indexes, primary keys, and unique constraints are very helpful to the optimizer, not only for choosing join types, but also for estimating cardinality, which has a large effect on memory grants.
You can choose to apply optimizer hints, such as MAXDOP only to select statements, rather than one gigantic statement. This is especially helpful when you need to manipulate memory grants.
You can tune individual statements to eliminate spill to tempdb.
Depending upon the complexity and total execution time of your process, you can potentially release resource locks earlier, depending also upon which isolation level your statements run under.
If your query plan is poor, due to an optimizer time-out, using less complex individual statements may yield better overall results.

Related

How does select * on a subquery affect performance

I am writing a query that involves using several subqueries using a WITH clause.
i.e.
WITH z as
(WITH x AS
(SELECT col1, col2 FROM foo LEFT JOIN bar on foo.col1 = bar.col1)
SELECT foo, bar
FROM x
INNER JOIN table2
ON x.col2 = table.col2)
SELECT *
FROM z
LEFT JOIN table3
ON z.col1 = table3.col2
In reality, there are a few more subqueries and a lot more columns. Are there any performance issues with using the SELECT * on the subquery table (in this case, x or z)?
I want to avoid re-typing the same column names multiple times within one query but also need to optimize performance.
The answer depends on the database. CTEs can be handled by:
materializing an intermediate table and storing the results
merging the CTE code with the rest of the query
combining these two approaches
In the first approach, additional columns could have a small effect on performance. In the second, there should be no effect.
That said, what usually dominates query performance is the work done for joins and group bys. Assuming the columns are no unreasonably large, I wouldn't worry about the performance implications of using select * in a CTE.
I would question how you write the CTEs. There is no need for nested CTEs, because they can be defined sequentially.

Is CTE really a view?

I always thought that CTEs should be considered as an inline view macro. So my thinking is: if the CTE is not referenced/used, it is not executed. It is just a definition, nothing more.
But, take the following query:
create table t
(
id int primary key
);
with
a as
(
insert into t(id) values(1)
)
select false;
select * from t;
It seems that after the CTE-based query, select * from t returns the tuple as inserted in the CTE. Why is this tuple inserted, despite the fact that the CTE is not used?
Is this by design or specification? Is it safe to rely on this behavior? This allows to execute multiple queries which are totally uncorrelated in one single query.
This seems to contradict the following information: https://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/#comment-19121
syntactically a CTE behaves like any other table expression.
semantically it is different. [in Postgres] it will always be executed once , even if it is referenced more than once.
[in Postgres] a CTE will act as an optimisation barrier; query terms cannot be moved between (into or outof) the CTE and the main query.
The second and third point can have serious implications. Because of the barrier&exactly once, a CTE-scan can hardly make use of implicit order or the presence of indexes inside the CTE. A CTE scan more or less behaves like a sequential scan on an unordered table or materialised view. For small CTEs this will be no problem, since a hash-join can be used. Large CTE's will need materialising+sorting to join the CTE to the main query.
In postgres, a cte should not be considered an inline view, although thinking of it as a materialized view that lives within a scope of a statement is useful. A CTE will be materialized if it is referenced in another part of the query, or if it is alters data (INSERT / UPDATE / DELETE).
So, since your example alters data, the CTE is evaluated, whereas the link you refer to has a CTE that doesn't alter data.
Whereas in other databases, predicates from the outer query will be pushed down to the CTE by the optimizer, in postgresql, the CTE will be fully materialized.
e.g.
WITH cte AS (SELECT * FROM foo WHERE foo.bar = True)
SELECT * FROM cte WHERE cte.id > 10 AND cte.id < 20
is slower than in postgresql
SELECT * FROM (SELECT * FROM foo WHERE bar = TRUE) cte
WHERE cte.id > 10 AND cte.id < 20
This is a consideration where one has optional or dynamic predicates in the outer query. The slightly faster CTE version would be
WITH cte AS (SELECT * FROM foo WHERE foo.bar = True AND foo.id > 10 AND foo.id < 20)
SELECT * FROM cte
This is by design, and you can rely on this behaviour to create an optimization barrier.
CTEs are allowed wherever a SELECT clause is allowed. So, it is possible to use CTEs inside of INSERT, UPDATE or DELETE statements. I'm not sure if this is part of the SQL standard.
For instance, until version 9.5 and the introduction of INSERT ... ON CONFLICT syntax, we could use a CTE to perform an UPSERT. Here's a SO thread that illustrates with an example
There is a 2nd more interesting type of CTE, RECURSIVE CTE, where a CTE is composed of a union of an iterative part and an applicative part that can work on values generated in the iterative part. I don't think this type of query could be inlined anyway.

How to Convert a SQL Query using Common Table Expressions to One Without (for SQL Server 2000)

I just found an adequate solution to How to Find Rows which are Duplicates by a Key but Not Duplicates in All Columns?, coded the stored procedure, then learned that the database is stuck at SQL Server 2000.
My solution, of course, relies heavily on Common Table Expressions.
Can anyone provide me a set of ru les for converting back to the SQL Server 2000 dialect?
Note that I have things like thisL:
;
WITH CTE1 AS ( ... ),
CTE2 AS (SELECT ... FROM CTE1 ... ),
CTE3 AS (SELECT ... FROM CTE1 INNER JOIN CTE2 ...)
SELECT * FROM CTE3
WHERE criteria
ORDER BY sequence
This would appear to make things more interesting...
Update: None of the CTEs are recursive.
Two options (granted, neither are pretty -- that's why we like CTE's)
OPTION 1
Create a temp table (#, or if small enough #) and refer to it as you would the CTE. Drop when you are done.
OPTION 2
Put the entire CTE SELECT as a table in the FROM portion of the query.
SELECT *
FROM (SELECT *
FROM table1) oldCTE
I don't think it is possible to come up with rules that would easily convert any cte into a non-cte statement. as the possibilities are too open-ended (particularly if you're working with recursive CTEs). The closest I can think of would be to take each CTE in order, break it into it's own query, and use it to populate a temporary table that's used by the following queries. Hardly efficient, and not guarnateed to work in all possible situations.

Is derived table executed once or three times?

Every time you make use of a derived table, that query is going to be executed. When using a CTE, that result set is pulled back once and only once within a single query.
Does the quote suggest that the following query will cause derived table to be executed three times ( once for each aggregate function’s call ):
SELECT
AVG(OrdersPlaced),MAX(OrdersPlaced),MIN(OrdersPlaced)
FROM (
SELECT
v.VendorID,
v.[Name] AS VendorName,
COUNT(*) AS OrdersPlaced
FROM Purchasing.PurchaseOrderHeader AS poh
INNER JOIN Purchasing.Vendor AS v ON poh.VendorID = v.VendorID
GROUP BY v.VendorID, v.[Name]
) AS x
thanx
No that should be one pass, take a look at the execution plan
here is an example where something will run for every row in table table2
select *,(select COUNT(*) from table1 t1 where t1.id <= t2.id) as Bla
from table2 t2
Stuff like this with a running counts will fire for each row in the table2 table
CTE or a nested (uncorrelated) subquery will generally have no different execution plan. Whether a CTE or a subquery is used has never had an effect on my intermediate queries being spooled.
With regard to the Tony Rogerson link - the explicit temp table performs better than the self-join to the CTE because it's indexed better - many times when you go beyond declarative SQL and start to anticipate the work process for the engine, you can get better results.
Sometimes, the benefit of a simpler and more maintainable query with many layered CTEs instead of a complex multi-temp-table process outweighs the performance benefits of a multi-table process. A CTE-based approach is a single SQL statement, which cannot be as quietly broken by a step being accidentally commented out or a schema changing.
Probably not, but it may spool the derived results so it only needs to access it once.
In this case, there should be no difference between a CTE and derived table.
Where is the quote from?

Why is inserting into and joining #temp tables faster?

I have a query that looks like
SELECT
P.Column1,
P.Column2,
P.Column3,
...
(
SELECT
A.ColumnX,
A.ColumnY,
...
FROM
dbo.TableReturningFunc1(#StaticParam1, #StaticParam2) AS A
WHERE
A.Key = P.Key
FOR XML AUTO, TYPE
),
(
SELECT
B.ColumnX,
B.ColumnY,
...
FROM
dbo.TableReturningFunc2(#StaticParam1, #StaticParam2) AS B
WHERE
B.Key = P.Key
FOR XML AUTO, TYPE
)
FROM
(
<joined tables here>
) AS P
FOR XML AUTO,ROOT('ROOT')
P has ~ 5000 rows
A and B ~ 4000 rows each
This query has a runtime performance of ~10+ minutes.
Changing it to this however:
SELECT
P.Column1,
P.Column2,
P.Column3,
...
INTO #P
SELECT
A.ColumnX,
A.ColumnY,
...
INTO #A
FROM
dbo.TableReturningFunc1(#StaticParam1, #StaticParam2) AS A
SELECT
B.ColumnX,
B.ColumnY,
...
INTO #B
FROM
dbo.TableReturningFunc2(#StaticParam1, #StaticParam2) AS B
SELECT
P.Column1,
P.Column2,
P.Column3,
...
(
SELECT
A.ColumnX,
A.ColumnY,
...
FROM
#A AS A
WHERE
A.Key = P.Key
FOR XML AUTO, TYPE
),
(
SELECT
B.ColumnX,
B.ColumnY,
...
FROM
#B AS B
WHERE
B.Key = P.Key
FOR XML AUTO, TYPE
)
FROM #P AS P
FOR XML AUTO,ROOT('ROOT')
Has a performance of ~4 seconds.
This makes not a lot of sense, as it would seem the cost to insert into a temp table and then do the join should be higher by default. My inclination is that SQL is doing the wrong type of "join" with the subquery, but maybe I've missed it, there's no way to specify the join type to use with correlated subqueries.
Is there a way to achieve this without using #temp tables/#table variables via indexes and/or hints?
EDIT: Note that dbo.TableReturningFunc1 and dbo.TableReturningFunc2 are inline TVF's, not multi-statement, or they are "parameterized" view statements.
Your procedures are being reevaluated for each row in P.
What you do with the temp tables is in fact caching the resultset generated by the stored procedures, thus removing the need to reevaluate.
Inserting into a temp table is fast because it does not generate redo / rollback.
Joins are also fast, since having a stable resultset allows possibility to create a temporary index with an Eager Spool or a Worktable
You can reuse the procedures without temp tables, using CTE's, but for this to be efficient, SQL Server needs to materialize the results of CTE.
You may try to force it do this with using an ORDER BY inside a subquery:
WITH f1 AS
(
SELECT TOP 1000000000
A.ColumnX,
A.ColumnY
FROM dbo.TableReturningFunc1(#StaticParam1, #StaticParam2) AS A
ORDER BY
A.key
),
f2 AS
(
SELECT TOP 1000000000
B.ColumnX,
B.ColumnY,
FROM dbo.TableReturningFunc2(#StaticParam1, #StaticParam2) AS B
ORDER BY
B.Key
)
SELECT …
, which may result in Eager Spool generated by the optimizer.
However, this is far from being guaranteed.
The guaranteed way is to add an OPTION (USE PLAN) to your query and wrap the correspondind CTE into the Spool clause.
See this entry in my blog on how to do that:
Generating XML in subqueries
This is hard to maintain, since you will need to rewrite your plan each time you rewrite the query, but this works well and is quite efficient.
Using the temp tables will be much easier, though.
This answer needs to be read together with Quassnoi's article
http://explainextended.com/2009/05/28/generating-xml-in-subqueries/
With judicious application of CROSS APPLY, you can force the caching or shortcut evaluation of inline TVFs. This query returns instantaneously.
SELECT *
FROM (
SELECT (
SELECT f.num
FOR XML PATH('fo'), ELEMENTS ABSENT
) AS x
FROM [20090528_tvf].t_integer i
cross apply (
select num
from [20090528_tvf].fn_num(9990) f
where f.num = i.num
) f
) q
--WHERE x IS NOT NULL -- covered by using CROSS apply
FOR XML AUTO
You haven't provided real structures so it's hard to construct something meaningful, but the technique should apply as well.
If you change the multi-statement TVF in Quassnoi's article to an inline TVF, the plan becomes even faster (at least one order of magnitude) and the plan magically reduces to something I cannot understand (it's too simple!).
CREATE FUNCTION [20090528_tvf].fn_num(#maxval INT)
RETURNS TABLE
AS RETURN
SELECT num + #maxval num
FROM t_integer
Statistics
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
(10 row(s) affected)
Table 't_integer'. Scan count 2, logical reads 22, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.
It is a problem with your sub-query referencing your outer query, meaning the sub query has to be compiled and executed for each row in the outer query.
Rather than using explicit temp tables, you can use a derived table.
To simplify your example:
SELECT P.Column1,
(SELECT [your XML transformation etc] FROM A where A.ID = P.ID) AS A
If P contains 10,000 records then SELECT A.ColumnX FROM A where A.ID = P.ID will be executed 10,000 times.
You can instead use a derived table as thus:
SELECT P.Column1, A2.Column FROM
P LEFT JOIN
(SELECT A.ID, [your XML transformation etc] FROM A) AS A2
ON P.ID = A2.ID
Okay, not that illustrative pseudo-code, but the basic idea is the same as the temp table, except that SQL Server does the whole thing in memory: It first selects all the data in "A2" and constructs a temp table in memory, then joins on it. This saves you having to select it to TEMP yourself.
Just to give you an example of the principle in another context where it may make more immediate sense. Consider employee and absence information where you want to show the number of days absence recorded for each employee.
Bad: (runs as many queryes as there are employees in the DB)
SELECT EmpName,
(SELECT SUM(absdays) FROM Absence where Absence.PerID = Employee.PerID) AS Abstotal
FROM Employee
Good: (Runs only two queries)
SELECT EmpName, AbsSummary.Abstotal
FROM Employee LEFT JOIN
(SELECT PerID, SUM(absdays) As Abstotal
FROM Absence GROUP BY PerID) AS AbsSummary
ON AbsSummary.PerID = Employee.PerID
There are several possible reasons why using intermediate Temp tables might speed up a query, but the most likely in your case is that the functions which are being called (but are not listed), are probably Multi-statement TVF's and not in-line TVF's. Multi-statement TVF's are opaque to the optimization of their calling queries and thus the optimizer cannot tell if there are any oppurtunities for re-use of data, or other logical/physical operator re-ordering optimizations. Thus, all it can do is to re-execute the TVFs every time that the containing query is supposed to produce another row with the XML columns.
In short, multi-statement TVF's frustrate the optimizer.
The usual solutions, in order of (typical) preference are:
Re-write the offending multi-statement TVF to be an in-line TVF
In-line the function code into the calling query, or
Dump the offending TVF's data into a temp table. which is what you've done...
Consider using the WITH common_table_expression construct for what you now have as sub-selects or temporary tables, see http://msdn.microsoft.com/en-us/library/ms175972(SQL.90).aspx .
This makes not a lot of sense, as it
would seem the cost to insert into a
temp table and then do the join should
be higher by de> This makes not a lot of sense, as it
would seem the cost to insert into a
temp table and then do the join should
be higher by default.fault.
With temporary tables, you explitly instruct Sql Server which intermediate storage to use. But if you stash everything in a big query, Sql Server will decide for itself. The difference is not really that big; at the end of the day, temporary storage is used, whether you specify it as a temp table or not.
In your case, temporary tables work faster, so why not stick to them?
I agreed, Temp table is a good concept. When the row count increases in a table an example of 40 million rows and i want to update multiple columns on a table by applying joins with other table in that case i would always prefer to use Common table expression to update the columns in select statement using case statement, now my select statement result set contains updated rows.Inserting 40 million records into a temp table with select statement using case statement took 21 minutes for me and then creating an index took 10 minutes so my insert and index creation time took 30 minutes. Then i am going to apply update by joining temp table updated result set with main table. It took 5 minutes to update 10 million records out of 40 million records so my overall update time for 10 million records took almost 35 minutes vs 5 minutes from Common table expression. My choice in that case is Common table expression.
If temp tables turn out to be faster in your particular instance, you should instead use a table variable.
There is a good article here on the differences and performance implications:
http://www.codeproject.com/KB/database/SQP_performance.aspx