What are the pros/cons of using SQL variables versus subqueries? - sql

I'm wondering there is a difference between SQL variables and subqueries. Whether one uses more processing power, or one is quicker, or even if one merely is more readable.
For (a very basic) example, I like to use variables to hold polygon and transformations in PostGIS:
WITH region_polygon AS (
SELECT ST_Transform(wkb_geometry, %(fishnet_srid)d) geom
FROM regions
LIMIT 1
), raster_pixels AS (
SELECT (ST_PixelAsPolygons(rast)).*
FROM test_regions_raster
LIMIT 1
)
SELECT x, y
FROM raster_pixels a, region_polygon b
WHERE ST_Within(a.geom, b.geom)
But would it be better in any way to use subqueries?
SELECT x, y
FROM (
SELECT ST_Transform(wkb_geometry, %(fishnet_srid)d) geom
FROM regions
LIMIT 1
) a, (
SELECT (ST_PixelAsPolygons(rast)).*
FROM test_regions_raster
LIMIT 1
) b
WHERE ST_Within(a.geom, b.geom)
Note that I'm using PostgreSQL.

There's an important syntactic advantage of common table expressions over derived tables when it comes to reuse. Consider the following, equivalent examples using self-joins:
Using common table expressions
WITH a(v) AS (SELECT 1 UNION SELECT 2)
SELECT *
FROM a AS x, a AS y
Using derived tables
SELECT *
FROM (SELECT 1 UNION SELECT 2) x(v),
(SELECT 1 UNION SELECT 2) y(v)
As you can see, using common table expressions, the view (SELECT 1 UNION SELECT 2) can be reused multiple times in your query. With derived tables, you will have to repeat your view declaration. In my example, this is still OK. In your own example, this starts getting a bit more hairy.
It's all about scope
Views in SQL are all about scoping. There are essentially four levels of declaring views:
As derived tables. They can be consumed exactly once.
As common table expressions. They can be consumed several times, but only in one query.
As views. They can be consumed several times in several queries.
As materialized views. Same as views, but the data is pre-calculated.
Some databases (in particular PostgreSQL) also know table-valued functions. From a mere syntax perspective, they're just like views - parameterised views.
Performance
Note that these thoughts only focus on syntax, not query planning. The different approaches may have very different performance implications, depending on the database vendor.

Those aren't variables, they're common table expressions (cte). In your query above, the execution plans are likely identical, because the optimizer should recognize they are equivalent queries. I prefer to use cte's because I think they're easier to read than subqueries, but that's it.
Edit: Upon further reading it looks like PostgreSQL does treat common table expressions differently than other databases, you can't update a cte in PostgreSQL, for instance. I'll leave my answer here because I believe for your query there won't be a difference, but I'm not terribly familiar with PostgreSQL.

As pointed out this construct is called Common Table Expression, not a variable.
I prefer to use CTE, rather than subquery, because it is way easier to read and write for me, especially when you have several nested CTEs.
You can write CTE once and refer to it several times in the rest of the query. With subquery you'll have to repeat the code several times.
Important difference of PostgreSQL from other databases (at least from MS SQL Server) is that PostgreSQL evaluates each CTE only once.
A useful property of WITH queries is that they are evaluated only once
per execution of the parent query, even if they are referred to more
than once by the parent query or sibling WITH queries. Thus, expensive
calculations that are needed in multiple places can be placed within a
WITH query to avoid redundant work. Another possible application is to
prevent unwanted multiple evaluations of functions with side-effects.
However, the other side of this coin is that the optimizer is less
able to push restrictions from the parent query down into a WITH query
than an ordinary sub-query. The WITH query will generally be evaluated
as written, without suppression of rows that the parent query might
discard afterwards. (But, as mentioned above, evaluation might stop
early if the reference(s) to the query demand only a limited number of
rows.)
MS SQL Server would inline each reference of CTE into the main query and optimize the whole result, but PostgreSQL doesn't. In some sense PostgreSQL is more flexible here. If you want the subquery to be evaluated only once, put it in CTE. If you don't want, put it in subquery and repeat the code. In SQL Server you'd have to use temporary table explicitly.
Your example in the question is too simple and most likely both variants are equivalent - check the execution plan.
Official docs mention it, as I quoted above, but Nick Barnes gave a link to a good article explaining it in more details and I thought it is worth putting it in an answer, rather that comment.
When optimising queries in PostgreSQL (true at least in 9.4 and
older), it’s worth keeping in mind that – unlike newer versions of
various other databases – PostgreSQL will always materialise a CTE
term in a query.
This can have quite surprising effects for those used to working with
DBs like MS SQL:
A query that should touch a small amount of data instead reads a whole
table and possibly spills it to a tempfile;
and You cannot UPDATE or
DELETE FROM a CTE term, because it’s more like a read-only temp table
rather than a dynamic view.
So, there is no definite answer whether CTE is better than subquery in PostgreSQL. In some cases it can be faster, in some cases it can be slower. But, IMHO, in most cases CTE is easier to write, read and maintain.
And, obviously, there is a case when you have no other option, but to use so-called recursive CTE (recursive queries are typically used to deal with hierarchical or tree-structured data).

Related

performance difference between queries with inner queries

Let's use a simple "redundant" query like this.
SELECT * FROM
(SELECT * FROM
(SELECT * FROM mytable) AS X) AS Y
Are there any optimisation (prior to execution) on the database engine that converge to the last one without loss in performance? What database engines do that?
as suggested I will put the real question. And a possible bet on the performance in each query :)
here it goes..
SELECT * FROM t1 JOIN t2 ON t1.chkId = t2.xchkId;
and
SELECT * FROM
(SELECT * FROM t1) AS X
JOIN
(SELECT * FROM t2) AS Y
ON X.chkId = Y.xchkId;
of course I can reduce the domain on the last (but its not the case now).
doing the same! any difference in performance?
Examining the query plan on SQLServer 2012 shows that it compiles to a single scan. Pretty much every serious database should exhibit the same behaviour.
The database that you are talking about is very important.
As mentioned in the comments, pretty much every reasonable database would ignore the subqueries and compile the queries to the same underlying code. An easy way to understand this is that SQL is not a procedural language; a SQL query specifies the structure of the output, not how it is generated.
In general, the underlying engine is a dataflow engine that contains a bunch of algorithms for different tasks, such as joining tables, using indexes, and aggregation. What gets executed is pretty far from the SQL statement itself.
All that said, not all databases are "reasonable". In particular, MySQL (and hence MariaDB) materialize subqueries. Hence, the structure does differ. Other simple databases may do this as well.
Having two sub queries that are combined like your example is a pretty common sql pattern. Conceptually, each sub query is a temporary table. You'll see it show up in the query plan as some sort of table operator like an index seek or scan.
Depending on the complexity of the sub query:
1. using the sub-query like your example can be best.
2. placing the sub-query's results into an actual temp table or table variable can be best.
It's helpful to get much more specific in your question and focus on the query plan.

Is there a performance difference between CTE , Sub-Query, Temporary Table or Table Variable?

In this excellent SO question, differences between CTE and sub-queries were discussed.
I would like to specifically ask:
In what circumstance is each of the following more efficient/faster?
CTE
Sub-Query
Temporary Table
Table Variable
Traditionally, I've used lots of temp tables in developing stored procedures - as they seem more readable than lots of intertwined sub-queries.
Non-recursive CTEs encapsulate sets of data very well, and are very readable, but are there specific circumstances where one can say they will always perform better? or is it a case of having to always fiddle around with the different options to find the most efficient solution?
EDIT
I've recently been told that in terms of efficiency, temporary tables are a good first choice as they have an associated histogram i.e. statistics.
SQL is a declarative language, not a procedural language. That is, you construct a SQL statement to describe the results that you want. You are not telling the SQL engine how to do the work.
As a general rule, it is a good idea to let the SQL engine and SQL optimizer find the best query plan. There are many person-years of effort that go into developing a SQL engine, so let the engineers do what they know how to do.
Of course, there are situations where the query plan is not optimal. Then you want to use query hints, restructure the query, update statistics, use temporary tables, add indexes, and so on to get better performance.
As for your question. The performance of CTEs and subqueries should, in theory, be the same since both provide the same information to the query optimizer. One difference is that a CTE used more than once could be easily identified and calculated once. The results could then be stored and read multiple times. Unfortunately, SQL Server does not seem to take advantage of this basic optimization method (you might call this common subquery elimination).
Temporary tables are a different matter, because you are providing more guidance on how the query should be run. One major difference is that the optimizer can use statistics from the temporary table to establish its query plan. This can result in performance gains. Also, if you have a complicated CTE (subquery) that is used more than once, then storing it in a temporary table will often give a performance boost. The query is executed only once.
The answer to your question is that you need to play around to get the performance you expect, particularly for complex queries that are run on a regular basis. In an ideal world, the query optimizer would find the perfect execution path. Although it often does, you may be able to find a way to get better performance.
There is no rule. I find CTEs more readable, and use them unless they exhibit some performance problem, in which case I investigate the actual problem rather than guess that the CTE is the problem and try to re-write it using a different approach. There is usually more to the issue than the way I chose to declaratively state my intentions with the query.
There are certainly cases when you can unravel CTEs or remove subqueries and replace them with a #temp table and reduce duration. This can be due to various things, such as stale stats, the inability to even get accurate stats (e.g. joining to a table-valued function), parallelism, or even the inability to generate an optimal plan because of the complexity of the query (in which case breaking it up may give the optimizer a fighting chance). But there are also cases where the I/O involved with creating a #temp table can outweigh the other performance aspects that may make a particular plan shape using a CTE less attractive.
Quite honestly, there are way too many variables to provide a "correct" answer to your question. There is no predictable way to know when a query may tip in favor of one approach or another - just know that, in theory, the same semantics for a CTE or a single subquery should execute the exact same. I think your question would be more valuable if you present some cases where this is not true - it may be that you have discovered a limitation in the optimizer (or discovered a known one), or it may be that your queries are not semantically equivalent or that one contains an element that thwarts optimization.
So I would suggest writing the query in a way that seems most natural to you, and only deviate when you discover an actual performance problem the optimizer is having. Personally I rank them CTE, then subquery, with #temp table being a last resort.
#temp is materalized and CTE is not.
CTE is just syntax so in theory it is just a subquery. It is executed. #temp is materialized. So an expensive CTE in a join that is execute many times may be better in a #temp. On the other side if it is an easy evaluation that is not executed but a few times then not worth the overhead of #temp.
The are some people on SO that don't like table variable but I like them as the are materialized and faster to create than #temp. There are times when the query optimizer does better with a #temp compared to a table variable.
The ability to create a PK on a #temp or table variable gives the query optimizer more information than a CTE (as you cannot declare a PK on a CTE).
Just 2 things I think make it ALWAYS preferable to use a # Temp Table rather then a CTE are:
You can not put a primary key on a CTE so the data being accessed by the CTE will have to traverse each one of the indexes in the CTE's tables rather then just accessing the PK or Index on the temp table.
Because you can not add constraints, indexes and primary keys to a CTE they are more prone to bugs creeping in and bad data.
-onedaywhen yesterday
Here is an example where #table constraints can prevent bad data which is not the case in CTE's
DECLARE #BadData TABLE (
ThisID int
, ThatID int );
INSERT INTO #BadData
( ThisID
, ThatID
)
VALUES
( 1, 1 ),
( 1, 2 ),
( 2, 2 ),
( 1, 1 );
IF OBJECT_ID('tempdb..#This') IS NOT NULL
DROP TABLE #This;
CREATE TABLE #This (
ThisID int NOT NULL
, ThatID int NOT NULL
UNIQUE(ThisID, ThatID) );
INSERT INTO #This
SELECT * FROM #BadData;
WITH This_CTE
AS (SELECT *
FROM #BadData)
SELECT *
FROM This_CTE;

Do CTEs improve performance?

with ini as
(
select ...
)
select ini.a
join ini.b
join ini.c
How many times does the SQL Server engine calculate the results from the ini table ?
My question which I'm trying to answer (with your help) is if the with statement (CTE) improves performance by aliasing the results.
The CTE ini is simply a macro that expands and this use is syntax/clarity only.
MSDN says:
Using a CTE offers the advantages of improved readability and ease in maintenance of complex queries
Nothing about performance.
It is evaluated per mention: so three times here which you can see from an execution plan.
For recursive CTEs it's somewhat different as the CTE builds upon itself but it will still be evaluated once per mention
A CTE (common table expression, the part that is wrapped in the "with") is essentially a 1-time view. If you think of it in terms of a temporary view, perhaps the answer will become more clear. As far as I know, the interpreter will simply do the equivalent of copy/pasting whatever is within the CTE into the main query wherever it finds the reference.
I'm sure there are outside instances where it appears to help, but more often than not, I'd assume that the mere presence of a CTE itself is not going to improve the performance of a query. It'll help with readability and re-usability within that single select statement (i.e., you won't have to re-type the same sub-query multiple times), but I don't believe it will magically make things run faster (all things being equal). Of course, if your query is structured differently within the CTE than you would have done w/ sub-queries, then it's quite possible the CTE runs faster at that point, but you're now comparing apples to oranges.
I suppose it would also depend on whther you were using it to replace a derived table or a correlated subquery. Performance would be about the same in the first case and probably significantly better in the second if you joined to the CTE rather than just replaced the suquery code with a reference to the CTE. If you used it to replace a where NOT EXISTS clause with a left join to a CTE (in order to find the records in one table but not the other), I'd expect performance to be worse as Where Exists is usually the fastets way to do that type of task. I guess what I'm saying is that performance will still depend on how you use the CTE not just the fact that you generated one.

SQL Server: Inline Table-Value UDF vs. Inline View

I'm working with a medical-record system that stores data in a construct that resembles a spreadsheet--date/time in column headers, measurements (e.g. physician name, Rh, blood type) in first column of each row, and a value in the intersecting cell. Reports that are based on this construct often require 10 or more of these measures to be displayed.
For reporting purposes, the dataset needs to have one row for each patient, the date/time the measurement was taken, and a column for each measurement. In essence, one needs to pivot the construct by 90 degrees.
At one point, I actually used SQL Server's PIVOT functionality to do just that. For a variety of reasons, it became apparent that this approach wouldn't work. I decided that I would use an inline view (IV) to massage the data into the desired format. The simplified query resembles:
SELECT patient_id,
datetime,
m1.value AS physician_name,
m2.value AS blood_type,
m3.value AS rh
FROM patient_table
INNER JOIN ( complex query here
WHERE measure_id=1) m1...
INNER JOIN (complex query here
WHERE measure_id=2) m2...
LEFT OUTER JOIN (complex query here
WHERE measure_id=3) m3...
As you can see, in some cases these IVs are used to restrict the resulting dataset (INNER JOIN), in other cases they do not restrict the dataset (LEFT OUTER JOIN). However, the 'complex query' part is essentially the same for each of these measure, except for the difference in measure_id. While this approach works, it leads to fairly large SQL statements, limits reuse, and exposes the query to errors.
My thought was to replace the 'complex query' and WHERE clause with a Inline Table-Value UDF. This would simplify the queries quite a bit, reduce errors, and increase code reuse. The only question on my mind is performance. Will the UDF approach lead to significant decreases in performance? Might it improve matters?
Thanks for your time and consideration.
A correctly defined TVF will not introduce any problem. You'll find many claims on the interned blasting TVFs for performance problems as compared to views or temp tables and variables. What is usualy not understood is that a TVF behaves differently from a view. A View definition is placed into the original query and then the optimizer wil rearrange the query tree as it sees fit (unless the NOEXPAND clause is used on indexed views). A TVF has different semantics and sometimes, specially when updating data, this results in the TVF output being spooled for haloween protection. It helps to mark the function WITH SCHEMABINDING, see Improving query plans with the SCHEMABINDING option on T-SQL UDFs.
Alsois important to understant the concepts of deterministic and precise function. Although they apply mostly to scalar value funcitons, TVFs can be also affected. See User-Defined Function Design Guidelines.
Since you need a SQL String and may not have the ability to add a view or UDF to the system, you may want to use WITH ... AS to limit the complex query to one place (At least for this statement.).
WITH complex(patientid, datetime, measure_id, value) AS
(Select... Complex Query)
SELECT patient_id
, datetime
, m1.value AS physician_name
, m2.value AS blood_type
, m3.value AS rh
FROM patient_table
INNER JOIN (Select ,,,, From complex WHERE measure_id=1) m1...
INNER JOIN (Select ,,,, From complex WHERE measure_id=2) m2...
LEFT OUTER JOIN (Select ,,,, From complex WHERE measure_id=3) m3...
You also have a third option; a traditional VIEW (assuming that you have a key to join to). In theory, there shouldn't be a performance difference between the three options because SQL Server should evaluate and optimize the plans accordingly. The reality is that sometimes that doesn't happen as well as we'd like.
The benefit of a traditional view is that you could make it an indexed view, and give SQL Server another performance aid; however, you'll just have to test and see.
Sql Server 2005 answer:
You can reduce the inline view by using temp/var tables. Performace issues on these are the temp inserts you require per hit on the query, but if the result sets are small enough, they can help. you can use primary keys on var tables, and primary keys/ indexes on temp tables. Other than normal belive, i have found a couple of articles indicating that both temp/var tables are stored in the temp db.
UDF functions, we have found to be less performant, when you have multi layer udfs in complex queries, but will maintain usability.
Be sure to create the function correctly for the various conditions specified. Those that WILL be used for inner joins, and those that will be used for left joins.
So, in general. We do use UDFs, but when we find that the performance degrade, we move the query to insert UDF selections into temp/var tables and join on those.
Create functionality for ease of use/maintinance, and apply performance inhancements where and when required.
EDIT:
If you are required to run this for crystal, and you plan to use Stored procedures, Yes, you can execute sql statements inside the SP to temp/var tables.
Let me know if you are going to use SPs. Sql will then also cache the sp plans with given params as requied.
Also from previous experiance with crystal, things to avoid, is grouping in Crystal that can be done in the SP, page numbers if not required. and function calls, if this can be handled on the server.

What are the advantages/disadvantages of using a CTE?

I'm looking at improving the performance of some SQL, currently CTEs are being used and referenced multiple times in the script. Would I get improvements using a table variable instead? (Can't use a temporary table as the code is within functions).
You'll really have to performance test - There is no Yes/No answer. As per Andy Living's post above links to, a CTE is just shorthand for a query or subquery.
If you are calling it twice or more in the same function, you might get better performance if you fill a table variable and then join to/select from that. However, as table variables take up space somewhere, and don't have indexes/statistics (With the exception of any declared primary key on the table variable) there's no way of saying which will be faster.
They both have costs and savings, and which is the best way depends on the data they pull in and what they do with it. I've been in your situation, and after testing for speed under various conditions - Some functions used CTEs, and others used table variables.
A CTE is not much more than syntactic sugar.
It enhances the readability and allows to avoid repetition.
Just think of it as a placeholder for the actual statement specified in the WITH()-clause. The engine will replace any occurance of the CTE's name in your query with this statement (quite similar to a view). This is the meaning of inline.
Compared to a previously filled table (delared or created) You'll find advantages:
useable in ad-hoc-queries (functions, views)
no unexpected side effects (most narrow scope)
...and disadvantages:
You cannot use the CTE's result in different statements
You cannot use indexes, statistics to optimize your CTE's set (although it will implicitly use existing indexes and statistics of the targeted objects - if appropriate).
In terms of performance a persisted set (declared or created table) can be (much!) better in some cases, but it forces you into procedural code. You will have to race your horses to find out which is better...
Example: Various approaches to do the same
The following simple (rather useless) example describes a set of user tables together with their columns. I use various different approaches to tell SQL-Server what I want:
Try this with "include actual execution plan"
USE master; --in my case the master database has just 5 "user tables", you can use any other DB of course
GO
--simple join, first the small set joining to the large set
SELECT o.name AS TableName
,c.name AS ColumnName
FROM sys.objects o
INNER JOIN sys.columns c ON c.object_id=o.object_id
WHERE o.type='U';
GO
--simple join "the other way round" with the filter as part of the ON-clause
SELECT o.name AS TableName
,c.name AS ColumnName
FROM sys.columns c
INNER JOIN sys.objects o ON c.object_id=o.object_id AND o.type='U';
GO
--join from the large set with a sub-query to the small set
SELECT o.name AS TableName
,c.name AS ColumnName
FROM sys.columns c
INNER JOIN (
SELECT o.*
FROM sys.objects o
WHERE o.type='U' --user tables
) o ON c.object_id=o.object_id;
GO
--join for large to small with a row-wise APPLY
SELECT o.name AS TableName
,c.name AS ColumnName
FROM sys.columns c
CROSS APPLY (
SELECT o.*
FROM sys.objects o
WHERE o.type='U' --user tables
AND o.object_id=c.object_id
) o;
GO
--use a CTE to "pre-filter" the small set
WITH cte AS
(
SELECT o.*
FROM sys.objects o
WHERE o.type='U' --user tables
)
SELECT cte.name AS TableName
,c.name AS ColumnName
FROM sys.columns c
INNER JOIN cte ON c.object_id=cte.object_id;
GO
Now look at the result and at the execution plans:
All queries return the same result.
All queries produce the same execution plan
Important hint: This might differ on your machine!
Why is this?
T-SQL is a declarative language. Your statement is a description of WHAT you want to retrieve. It is not your job to tell the engine HOW this is done.
SQL-Server's extremely smart engine will find the best way to get the set you asked for. In the case above all result descriptions point to the same goal. The engine can deduce this from various statements and finds the same plan for all of them.
Well, is it just a matter of taste?
In a way...
There are some important things to keep in mind:
There is no reason for the engine to compute the CTE's result before the rest (although the statement might look so). Therefore it is wrong to describe a CTE as something like a temp table...
In other words: The visible order of your statement does not predict the actual order of execution!
The smart engine will reach its limits with complexity and nest level. Imagine various VIEWs, all using CTEs and calling each-other...
There are cases where the engine really f**s up. I remember a case where a CTE did not much more than a TRY_CAST. The idea was to ensure valid values in the query below. But the engine thought "Oh, just a CAST, not expensiv!" and included the acutal CAST to the execution plan on a higher position. I remember another case where the engine performed an expensive operation against millions of rows (unnecessarily, the final result was filtered to a tiny set), just because the actual order of execution was not as expected.
Okay... So when should I use a CTE?
The following points are good reasons to use a CTE:
A CTE can help you to avoid repeated sub queries.
A CTE can be used multiple times within your statement, e.g. within a JOIN with a dynamic behavior depending on the actual row-count.
You can use multiple CTEs within one statement and you can use the result of one CTE within a later CTE.
There are recursive (or better iterative) CTEs.
Sometimes I used single-row-CTEs to define / pre-compute variables later used in the query. Things you would do with declared variables in procedural T-SQL. You can use A CROSS JOIN to get them into your query easily.
and also very nice: the updatable CTE allows for very easy-to-read statements, same applies for DELETE.
As above: Nothing one could not do without the CTE, but it is far better to read (I really like speaking names).
Final hints
Well, there are cases, where ugly code performs better :-)
It is always good to have clean and readable code. A CTE will help you with this. So give it a try. If the performance is bad, get into depth, look at the execution plans and try to find a reason where the engine might decide wrong.
In most cases it is a bad idea trying to outsmart the engine with hints such as FORCE ORDER (but in can help)
UPDATE
I was asked to point to advantages and disadvantages specifically:
Uhm, technically there are no real advantages or disadvantages. Disregarding recursive CTEs there's nothing one couldn't solve without a CTE.
Advantages
The main advantage is readability and maintainabilty.
Sometimes a CTE can save hundreds of lines of code. Instead of a repeating a huge sub-query one can use just a name as a variable. Corrections to the sub-query can be solved just in one place.
The CTE can serve in ad-hoc queries and make your life easier.
Disadvantages
One possible disadvantage is that it's very easy, even for experienced developers, to mistake a CTE as a temp table, assume that the visible order of steps will be the same as the acutal order of execution and stumble into unexpected results or even errors.
And - of course :-) - the strange wrong syntax error you'll see when you write a CTE after another statement without a separating ;. That's why many people tend to use ;WITH.
Probably not. CTE's are especially good at querying data for tree structures.
The information and quotes are from the following article on mssqltips.com "Choose Between SQL Server Subquery T-SQL Code" by Eric Blinn. https://www.mssqltips.com/sqlservertip/6618/sql-server-query-performance-cte-view-subquery-temp-table-table-variable/
SQL Server 2019 CTEs, subqueries, and views
The SQL Server [2019] engine optimizes every query that is given to it. When
it encounters a CTE, traditional subquery, or view, it sees them all
the same way and optimizes them the same way. This involves looking
at the underlying tables, considering their statistics, and choosing
the best way to proceed. In most cases they will return the same plan
and therefore perform exactly the same.
TempDB table
For the query that inserted rows into the temporary table, the
optimizer looked at the table statistics and chose the best way
forward. It actually made new table statistics for the temporary
table and then used them to run the second. This brings about very
similar performance.
Table variable
The table variable has poor performance in the example given in the article due to lack of table statistics.
...the table variable does not have any table statistics generated for
it like the TempDB table did. This means the optimizer has to make a
wild guess as to how to proceed. In this example it made a very, very
poor decision.
This is not to write off table variables. They surely have their
place as will be discussed later in the tip.
Temp table vs Table variable
A temporary table will be stored on disk and have statistics
calculated on it and a table variable will not. Because of this
difference temporary tables are best when the expected row count
is >100 and the table variable for smaller expected row counts where the lack of statistics will be less likely to lead to a bad query plan.
Advantages of CTE
CTE can be termed as 'Temporary View' used as a good alternative for a View in some cases.
The main advantage over a view is usage of memory. As CTE's scope is limited only to its batch, the memory allocated for it is flushed as soon as its batch is crossed. But once a view is created, it is stored until user drops it. If the view is not used after creation then it's a mere waste of memory.
CPU cost for CTE execution is lesser when compared to that of View.
Like View, CTE doesn't store any metadata of its definition and provides better readability.
A CTE can be referred for multiple times in a query.
As the scope is limited to the batch, multiple CTEs can have the same name which a view cannot have.
It can be made recursive.
Disadvantages of CTE
Though using CTE is advantageous, it does have some limitations to be kept in mind,
We knew that it is a substitute for a view but a CTE cannot be nested while Views can be nested.
View once declared can be used for any number of times but CTE cannot be used. It should be declared every time you want to use it. For this scenario, CTE is not recommended to use as it is a tiring job for user to declare the batches again and again.
Between the anchor members there should be operators like UNION, UNION ALL or EXCEPT etc.
In Recursive CTEs, you can define many Anchor Members and Recursive Members but all the Anchor Members must be defined before the first Recursive Member. You cannot define an Anchor Member between two Recursive Member.
The number of columns, the data types used in Anchor and Recursive Members should be same.
In Recursive Member, aggregate functions like TOP, operator like DISTINCT, clause like HAVING and GROUP BY, Sub-queries, joins like Left Outer or Right Outer or Full Outer are not allowed. Regarding Joins, only Inner Join is allowed in Recursive Member.
Recursion Limit is 32767, crossing which results in the crash of server due to infinite loop.