SQL Server: Inline Table-Value UDF vs. Inline View - sql

I'm working with a medical-record system that stores data in a construct that resembles a spreadsheet--date/time in column headers, measurements (e.g. physician name, Rh, blood type) in first column of each row, and a value in the intersecting cell. Reports that are based on this construct often require 10 or more of these measures to be displayed.
For reporting purposes, the dataset needs to have one row for each patient, the date/time the measurement was taken, and a column for each measurement. In essence, one needs to pivot the construct by 90 degrees.
At one point, I actually used SQL Server's PIVOT functionality to do just that. For a variety of reasons, it became apparent that this approach wouldn't work. I decided that I would use an inline view (IV) to massage the data into the desired format. The simplified query resembles:
SELECT patient_id,
datetime,
m1.value AS physician_name,
m2.value AS blood_type,
m3.value AS rh
FROM patient_table
INNER JOIN ( complex query here
WHERE measure_id=1) m1...
INNER JOIN (complex query here
WHERE measure_id=2) m2...
LEFT OUTER JOIN (complex query here
WHERE measure_id=3) m3...
As you can see, in some cases these IVs are used to restrict the resulting dataset (INNER JOIN), in other cases they do not restrict the dataset (LEFT OUTER JOIN). However, the 'complex query' part is essentially the same for each of these measure, except for the difference in measure_id. While this approach works, it leads to fairly large SQL statements, limits reuse, and exposes the query to errors.
My thought was to replace the 'complex query' and WHERE clause with a Inline Table-Value UDF. This would simplify the queries quite a bit, reduce errors, and increase code reuse. The only question on my mind is performance. Will the UDF approach lead to significant decreases in performance? Might it improve matters?
Thanks for your time and consideration.

A correctly defined TVF will not introduce any problem. You'll find many claims on the interned blasting TVFs for performance problems as compared to views or temp tables and variables. What is usualy not understood is that a TVF behaves differently from a view. A View definition is placed into the original query and then the optimizer wil rearrange the query tree as it sees fit (unless the NOEXPAND clause is used on indexed views). A TVF has different semantics and sometimes, specially when updating data, this results in the TVF output being spooled for haloween protection. It helps to mark the function WITH SCHEMABINDING, see Improving query plans with the SCHEMABINDING option on T-SQL UDFs.
Alsois important to understant the concepts of deterministic and precise function. Although they apply mostly to scalar value funcitons, TVFs can be also affected. See User-Defined Function Design Guidelines.

Since you need a SQL String and may not have the ability to add a view or UDF to the system, you may want to use WITH ... AS to limit the complex query to one place (At least for this statement.).
WITH complex(patientid, datetime, measure_id, value) AS
(Select... Complex Query)
SELECT patient_id
, datetime
, m1.value AS physician_name
, m2.value AS blood_type
, m3.value AS rh
FROM patient_table
INNER JOIN (Select ,,,, From complex WHERE measure_id=1) m1...
INNER JOIN (Select ,,,, From complex WHERE measure_id=2) m2...
LEFT OUTER JOIN (Select ,,,, From complex WHERE measure_id=3) m3...

You also have a third option; a traditional VIEW (assuming that you have a key to join to). In theory, there shouldn't be a performance difference between the three options because SQL Server should evaluate and optimize the plans accordingly. The reality is that sometimes that doesn't happen as well as we'd like.
The benefit of a traditional view is that you could make it an indexed view, and give SQL Server another performance aid; however, you'll just have to test and see.

Sql Server 2005 answer:
You can reduce the inline view by using temp/var tables. Performace issues on these are the temp inserts you require per hit on the query, but if the result sets are small enough, they can help. you can use primary keys on var tables, and primary keys/ indexes on temp tables. Other than normal belive, i have found a couple of articles indicating that both temp/var tables are stored in the temp db.
UDF functions, we have found to be less performant, when you have multi layer udfs in complex queries, but will maintain usability.
Be sure to create the function correctly for the various conditions specified. Those that WILL be used for inner joins, and those that will be used for left joins.
So, in general. We do use UDFs, but when we find that the performance degrade, we move the query to insert UDF selections into temp/var tables and join on those.
Create functionality for ease of use/maintinance, and apply performance inhancements where and when required.
EDIT:
If you are required to run this for crystal, and you plan to use Stored procedures, Yes, you can execute sql statements inside the SP to temp/var tables.
Let me know if you are going to use SPs. Sql will then also cache the sp plans with given params as requied.
Also from previous experiance with crystal, things to avoid, is grouping in Crystal that can be done in the SP, page numbers if not required. and function calls, if this can be handled on the server.

Related

Materialized view Vs Temp tables in Oracle

I have a base transaction table. Then I have around 15 intermediate steps, where I'm combining dimension tables, performing some aggregation and implementing business logic. The way I'm handling currently is creating temporary tables for intermediate stages and post these 15 steps populating the final result in a physical table. It it a better approach or using materialized view instead of these intermediate temp tables is a better approach. If using materialized views for the intermediate steps is a better approach, can you kindly let me know why?
Have already tried scripting both the approaches, scripted 15 intermediate steps as global temporary table as well as Materialized view. I found marginal improvement in performance in MVs when compared to Temp tables, but comes at the cost of excess physical storage. Not sure which is the best practice and why
Temporary tables write to disk, so there's I/O costs for both reading and writing. Also most sites don't manage their temporary tables properly and they end up on the default temporary tablespace, which is the same TEMP tablespace everybody uses for sorting, etc. So there's potential for resource contention there.
Materialized views are intended for materializing aspects of our data set which are commonly reused by many different queries. That's why the most common use case is for storing higher level aggregates of low level data. That doesn't sound like the use case you have here. And lo!
I'm doing a complete refresh of MVs and not a incremental refresh
So nope.
Then I have around 15 intermediate steps, where I'm combining dimension tables, performing some aggregation and implementing business logic.
This is a terribly procedural way of querying data. Sometimes there's no way of avoiding this, especially in certain data warehouse scenarios. However, it doesn't follow that we need to materialize the outputs of those queries. An alternative approach is to use WITH clauses. The output from one WITH subquery can feed into lower subqueries.
with sq1 as (
select whatever
, count(*) as t1_tot
from t1
group by whatever
) , sq2 as (
select sq1.whatever
, max(t2.blah) as max_blah
from sq1
join t2 on t2.whatever = sq1.whatever
) , sq3 as (
select sq2.whatever
,(t3.meh + t3.huh) as qty
from sq2
join t3 on t3.whatever = sq2.whatever
where t3.something >= sq2.max_blah
)
select sq1.whatever
,sq1.t1_tot
,sq2.max_blah
,sq3.qty
from sq1
join sq2 on sq2.whatever = sq1.whatever
join sq3 on sq3.whatever = sq1.whatever
Not saying it won't be a monstrous query, the terror of the department. But it will probably perform way better than your MViews ot GTTs. (Oracle may choose to materialize those intermediate result sets but we can use hints to affect that.)
You may even find from taking this approach that some of your steps are unnecessary and you can combine several steps into one query. Certainly in real life I would write my toy statement above as one query not a join of three subqueries.
From what you said, I'd say that using (global or private, depending on database version you use) temporary tables is a better choice. Why? Because you are "calculating" something, storing results of those calculations into some tables, reusing them for additional processing. All of that - if it can't be done without temporary tables - is to be done with tables.
Materialized view is, as its name says, a view. It is a result of some query, but - opposed to "normal" views, it actually takes space. Can be refreshed (either on demand, when source data is changed, or based on a schedule). Yes, it has its advantages, though I can't see any in what you are currently doing.

What are the pros/cons of using SQL variables versus subqueries?

I'm wondering there is a difference between SQL variables and subqueries. Whether one uses more processing power, or one is quicker, or even if one merely is more readable.
For (a very basic) example, I like to use variables to hold polygon and transformations in PostGIS:
WITH region_polygon AS (
SELECT ST_Transform(wkb_geometry, %(fishnet_srid)d) geom
FROM regions
LIMIT 1
), raster_pixels AS (
SELECT (ST_PixelAsPolygons(rast)).*
FROM test_regions_raster
LIMIT 1
)
SELECT x, y
FROM raster_pixels a, region_polygon b
WHERE ST_Within(a.geom, b.geom)
But would it be better in any way to use subqueries?
SELECT x, y
FROM (
SELECT ST_Transform(wkb_geometry, %(fishnet_srid)d) geom
FROM regions
LIMIT 1
) a, (
SELECT (ST_PixelAsPolygons(rast)).*
FROM test_regions_raster
LIMIT 1
) b
WHERE ST_Within(a.geom, b.geom)
Note that I'm using PostgreSQL.
There's an important syntactic advantage of common table expressions over derived tables when it comes to reuse. Consider the following, equivalent examples using self-joins:
Using common table expressions
WITH a(v) AS (SELECT 1 UNION SELECT 2)
SELECT *
FROM a AS x, a AS y
Using derived tables
SELECT *
FROM (SELECT 1 UNION SELECT 2) x(v),
(SELECT 1 UNION SELECT 2) y(v)
As you can see, using common table expressions, the view (SELECT 1 UNION SELECT 2) can be reused multiple times in your query. With derived tables, you will have to repeat your view declaration. In my example, this is still OK. In your own example, this starts getting a bit more hairy.
It's all about scope
Views in SQL are all about scoping. There are essentially four levels of declaring views:
As derived tables. They can be consumed exactly once.
As common table expressions. They can be consumed several times, but only in one query.
As views. They can be consumed several times in several queries.
As materialized views. Same as views, but the data is pre-calculated.
Some databases (in particular PostgreSQL) also know table-valued functions. From a mere syntax perspective, they're just like views - parameterised views.
Performance
Note that these thoughts only focus on syntax, not query planning. The different approaches may have very different performance implications, depending on the database vendor.
Those aren't variables, they're common table expressions (cte). In your query above, the execution plans are likely identical, because the optimizer should recognize they are equivalent queries. I prefer to use cte's because I think they're easier to read than subqueries, but that's it.
Edit: Upon further reading it looks like PostgreSQL does treat common table expressions differently than other databases, you can't update a cte in PostgreSQL, for instance. I'll leave my answer here because I believe for your query there won't be a difference, but I'm not terribly familiar with PostgreSQL.
As pointed out this construct is called Common Table Expression, not a variable.
I prefer to use CTE, rather than subquery, because it is way easier to read and write for me, especially when you have several nested CTEs.
You can write CTE once and refer to it several times in the rest of the query. With subquery you'll have to repeat the code several times.
Important difference of PostgreSQL from other databases (at least from MS SQL Server) is that PostgreSQL evaluates each CTE only once.
A useful property of WITH queries is that they are evaluated only once
per execution of the parent query, even if they are referred to more
than once by the parent query or sibling WITH queries. Thus, expensive
calculations that are needed in multiple places can be placed within a
WITH query to avoid redundant work. Another possible application is to
prevent unwanted multiple evaluations of functions with side-effects.
However, the other side of this coin is that the optimizer is less
able to push restrictions from the parent query down into a WITH query
than an ordinary sub-query. The WITH query will generally be evaluated
as written, without suppression of rows that the parent query might
discard afterwards. (But, as mentioned above, evaluation might stop
early if the reference(s) to the query demand only a limited number of
rows.)
MS SQL Server would inline each reference of CTE into the main query and optimize the whole result, but PostgreSQL doesn't. In some sense PostgreSQL is more flexible here. If you want the subquery to be evaluated only once, put it in CTE. If you don't want, put it in subquery and repeat the code. In SQL Server you'd have to use temporary table explicitly.
Your example in the question is too simple and most likely both variants are equivalent - check the execution plan.
Official docs mention it, as I quoted above, but Nick Barnes gave a link to a good article explaining it in more details and I thought it is worth putting it in an answer, rather that comment.
When optimising queries in PostgreSQL (true at least in 9.4 and
older), it’s worth keeping in mind that – unlike newer versions of
various other databases – PostgreSQL will always materialise a CTE
term in a query.
This can have quite surprising effects for those used to working with
DBs like MS SQL:
A query that should touch a small amount of data instead reads a whole
table and possibly spills it to a tempfile;
and You cannot UPDATE or
DELETE FROM a CTE term, because it’s more like a read-only temp table
rather than a dynamic view.
So, there is no definite answer whether CTE is better than subquery in PostgreSQL. In some cases it can be faster, in some cases it can be slower. But, IMHO, in most cases CTE is easier to write, read and maintain.
And, obviously, there is a case when you have no other option, but to use so-called recursive CTE (recursive queries are typically used to deal with hierarchical or tree-structured data).

Sql Join on User Defined Function: how to optimize

I'm trying to optimize a query in a database. That query is similar to the following:
select * from Account
inner join udf_Account('user') udfAccount
on Account.Id = udfAccount.AccountId
Actually the real query is much longer but the most important point is that it contains a few inner join on user defined functions (udf) which depend on the user id. (So this is constant parameter which do not change during the query evaluation).
Due to a large amount of data, my query takes approximatively 20 seconds on a production database which is not acceptable.
I have already seen that by storing the results of the functions in temporary tables and using these tables in the query reduces a lot the duration of the query.
I'm asking the following questions:
Can I avoid the temporary tables. Isn't it a way to tell sql that the function can be evaluated only once ? Using temporary tables would imply some important changes in my code this is why I would be happy if I had another solution.
Are there any other ways to optimize my query ?
In SQL Server, if your functions are Inline rather than Multi-Statement, SQL Server explands tham (macro-like) into your queries. It's just like they become sub-queries in your main query.
This notionally allows the optimiser to make a 'better' execution plan.
For example; Provided that the fields you are joining on are directly derived from their source tables, this should make indexes on those fields available.
Without looking at the whole query and your individual functions, it appears that you're already in a good place with regards to your syntax. The next place to look is at the indexes that exist, and aim for index-seeks rather than table-scans or index-scans.
(That's all a bit simplistic, but it's a good start for query optimisation, which is an immense topic.)
Another option is to look at using CROSS APPLY with your inline table valued functions.
(Available in SQL Server 2005 onwards)
This allows the values from tables in your queries to be used as parameters to your functions. Again, provided that the functions are inline, SQL Server expands the function inline when building the execution plan.
An example could be...
SELECT
Account.AccountID,
subAccount.AccountID AS SubAccountID,
Balance.currentAvailable AS SubAccountBalance
FROM
Account
CROSS APPLY
dbo.getSubAccounts('User', Account.AccountID) AS SubAccount
CROSS APPLY
dbo.getCurrentBalance(SubAccount.AccountID) AS Balance
WHERE
Account.AccountID = 1234
I believe you want to define what mysql calls a "deterministic" function. Depending on your flavor of SQL this will have different syntax. But ultimately the biggest optimisation would be to not use a function at all, but simply add an account column to the user table.

What generic techniques can be applied to optimize SQL queries?

What techniques can be applied effectively to improve the performance of SQL queries? Are there any general rules that apply?
Use primary keys
Avoid select *
Be as specific as you can when building your conditional statements
De-normalisation can often be more efficient
Table variables and temporary tables (where available) will often be better than using a large source table
Partitioned views
Employ indices and constraints
Learn what's really going on under the hood - you should be able to understand the following concepts in detail:
Indexes (not just what they are but actually how they work).
Clustered indexes vs heap allocated tables.
Text and binary lookups and when they can be in-lined.
Fill factor.
How records are ghosted for update/delete.
When page splits happen and why.
Statistics, and how they effect various query speeds.
The query planner, and how it works for your specific database (for instance on some systems "select *" is slow, on modern MS-Sql DBs the planner can handle it).
The biggest thing you can do is to look for table scans in sql server query analyzer (make sure you turn on "show execution plan"). Otherwise there are a myriad of articles at MSDN and elsewhere that will give good advice.
As an aside, when I started learning to optimize queries I ran sql server query profiler against a trace, looked at the generated SQL, and tried to figure out why that was an improvement. Query profiler is far from optimal, but it's a decent start.
There are a couple of things you can look at to optimize your query performance.
Ensure that you just have the minimum of data. Make sure you select only the columns you need. Reduce field sizes to a minimum.
Consider de-normalising your database to reduce joins
Avoid loops (i.e. fetch cursors), stick to set operations.
Implement the query as a stored procedure as this is pre-compiled and will execute faster.
Make sure that you have the correct indexes set up. If your database is used mostly for searching then consider more indexes.
Use the execution plan to see how the processing is done. What you want to avoid is a table scan as this is costly.
Make sure that the Auto Statistics is set to on. SQL needs this to help decide the optimal execution. See Mike Gunderloy's great post for more info. Basics of Statistics in SQL Server 2005
Make sure your indexes are not fragmented. Reducing SQL Server Index Fragmentation
Make sure your tables are not fragmented. How to Detect Table Fragmentation in SQL Server 2000 and 2005
Use a with statment to handle query filtering.
Limit each subquery to the minimum number of rows possible.
then join the subqueries.
WITH
master AS
(
SELECT SSN, FIRST_NAME, LAST_NAME
FROM MASTER_SSN
WHERE STATE = 'PA' AND
GENDER = 'M'
),
taxReturns AS
(
SELECT SSN, RETURN_ID, GROSS_PAY
FROM MASTER_RETURNS
WHERE YEAR < 2003 AND
YEAR > 2000
)
SELECT *
FROM master,
taxReturns
WHERE master.ssn = taxReturns.ssn
A subqueries within a with statement may end up as being the same as inline views,
or automatically generated temp tables. I find in the work I do, retail data, that about 70-80% of the time, there is a performance benefit.
100% of the time, there is a maintenance benefit.
I think using SQL query analyzer would be a good start.
In Oracle you can look at the explain plan to compare variations on your query
Make sure that you have the right indexes on the table. if you frequently use a column as a way to order or limit your dataset an index can make a big difference. I saw in a recent article that select distinct can really slow down a query, especially if you have no index.
The obvious optimization for SELECT queries is ensuring you have indexes on columns used for joins or in WHERE clauses.
Since adding indexes can slow down data writes you do need to monitor performance to ensure you don't kill the DB's write performance, but that's where using a good query analysis tool can help you balanace things accordingly.
Indexes
Statistics
on microsoft stack, Database Engine Tuning Advisor
Some other points (Mine are based on SQL server, since each db backend has it's own implementations they may or may not hold true for all databases):
Avoid correlated subqueries in the select part of a statement, they are essentially cursors.
Design your tables to use the correct datatypes to avoid having to apply functions on them to get the data out. It is far harder to do date math when you store your data as varchar for instance.
If you find that you are frequently doing joins that have functions in them, then you need to think about redesigning your tables.
If your WHERE or JOIN conditions include OR statements (which are slower) you may get better speed using a UNION statement.
UNION ALL is faster than UNION if (And only if) the two statments are mutually exclusive and return the same results either way.
NOT EXISTS is usually faster than NOT IN or using a left join with a WHERE clause of ID = null
In an UPDATE query add a WHERE condition to make sure you are not updating values that are already equal. The difference between updating 10,000,000 records and 4 can be quite significant!
Consider pre-calculating some values if you will be querying them frequently or for large reports. A sum of the values in an order only needs to be done when the order is made or adjusted, rather than when you are summarizing the results of 10,000,000 million orders in a report. Pre-calculations should be done in triggers so that they are always up-to-date is the underlying data changes. And it doesn't have to be just numbers either, we havea calculated field that concatenates names that we use in reports.
Be wary of scalar UDFs, they can be slower than putting the code in line.
Temp table tend to be faster for large data set and table variables faster for small ones. In addition you can index temp tables.
Formatting is usually faster in the user interface than in SQL.
Do not return more data than you actually need.
This one seems obvious but you would not believe how often I end up fixing this. Do not join to tables that you are not using to filter the records or actually calling one of the fields in the select part of the statement. Unnecessary joins can be very expensive.
It is an very bad idea to create views that call other views that call other views. You may find you are joining to the same table 6 times when you only need to once and creating 100,000,00 records in an underlying view in order to get the 6 that are in your final result.
In designing a database, think about reporting not just the user interface to enter data. Data is useless if it is not used, so think about how it will be used after it is in the database and how that data will be maintained or audited. That will often change the design. (This is one reason why it is a poor idea to let an ORM design your tables, it is only thinking about one use case for the data.) The most complex queries affecting the most data are in reporting, so designing changes to help reporting can speed up queries (and simplify them) considerably.
Database-specific implementations of features can be faster than using standard SQL (That's one of the ways they sell their product), so get to know your database features and find out which are faster.
And because it can't be said too often, use indexes correctly, not too many or too few. And make your WHERE clauses sargable (Able to use indexes).

What are the advantages/disadvantages of using a CTE?

I'm looking at improving the performance of some SQL, currently CTEs are being used and referenced multiple times in the script. Would I get improvements using a table variable instead? (Can't use a temporary table as the code is within functions).
You'll really have to performance test - There is no Yes/No answer. As per Andy Living's post above links to, a CTE is just shorthand for a query or subquery.
If you are calling it twice or more in the same function, you might get better performance if you fill a table variable and then join to/select from that. However, as table variables take up space somewhere, and don't have indexes/statistics (With the exception of any declared primary key on the table variable) there's no way of saying which will be faster.
They both have costs and savings, and which is the best way depends on the data they pull in and what they do with it. I've been in your situation, and after testing for speed under various conditions - Some functions used CTEs, and others used table variables.
A CTE is not much more than syntactic sugar.
It enhances the readability and allows to avoid repetition.
Just think of it as a placeholder for the actual statement specified in the WITH()-clause. The engine will replace any occurance of the CTE's name in your query with this statement (quite similar to a view). This is the meaning of inline.
Compared to a previously filled table (delared or created) You'll find advantages:
useable in ad-hoc-queries (functions, views)
no unexpected side effects (most narrow scope)
...and disadvantages:
You cannot use the CTE's result in different statements
You cannot use indexes, statistics to optimize your CTE's set (although it will implicitly use existing indexes and statistics of the targeted objects - if appropriate).
In terms of performance a persisted set (declared or created table) can be (much!) better in some cases, but it forces you into procedural code. You will have to race your horses to find out which is better...
Example: Various approaches to do the same
The following simple (rather useless) example describes a set of user tables together with their columns. I use various different approaches to tell SQL-Server what I want:
Try this with "include actual execution plan"
USE master; --in my case the master database has just 5 "user tables", you can use any other DB of course
GO
--simple join, first the small set joining to the large set
SELECT o.name AS TableName
,c.name AS ColumnName
FROM sys.objects o
INNER JOIN sys.columns c ON c.object_id=o.object_id
WHERE o.type='U';
GO
--simple join "the other way round" with the filter as part of the ON-clause
SELECT o.name AS TableName
,c.name AS ColumnName
FROM sys.columns c
INNER JOIN sys.objects o ON c.object_id=o.object_id AND o.type='U';
GO
--join from the large set with a sub-query to the small set
SELECT o.name AS TableName
,c.name AS ColumnName
FROM sys.columns c
INNER JOIN (
SELECT o.*
FROM sys.objects o
WHERE o.type='U' --user tables
) o ON c.object_id=o.object_id;
GO
--join for large to small with a row-wise APPLY
SELECT o.name AS TableName
,c.name AS ColumnName
FROM sys.columns c
CROSS APPLY (
SELECT o.*
FROM sys.objects o
WHERE o.type='U' --user tables
AND o.object_id=c.object_id
) o;
GO
--use a CTE to "pre-filter" the small set
WITH cte AS
(
SELECT o.*
FROM sys.objects o
WHERE o.type='U' --user tables
)
SELECT cte.name AS TableName
,c.name AS ColumnName
FROM sys.columns c
INNER JOIN cte ON c.object_id=cte.object_id;
GO
Now look at the result and at the execution plans:
All queries return the same result.
All queries produce the same execution plan
Important hint: This might differ on your machine!
Why is this?
T-SQL is a declarative language. Your statement is a description of WHAT you want to retrieve. It is not your job to tell the engine HOW this is done.
SQL-Server's extremely smart engine will find the best way to get the set you asked for. In the case above all result descriptions point to the same goal. The engine can deduce this from various statements and finds the same plan for all of them.
Well, is it just a matter of taste?
In a way...
There are some important things to keep in mind:
There is no reason for the engine to compute the CTE's result before the rest (although the statement might look so). Therefore it is wrong to describe a CTE as something like a temp table...
In other words: The visible order of your statement does not predict the actual order of execution!
The smart engine will reach its limits with complexity and nest level. Imagine various VIEWs, all using CTEs and calling each-other...
There are cases where the engine really f**s up. I remember a case where a CTE did not much more than a TRY_CAST. The idea was to ensure valid values in the query below. But the engine thought "Oh, just a CAST, not expensiv!" and included the acutal CAST to the execution plan on a higher position. I remember another case where the engine performed an expensive operation against millions of rows (unnecessarily, the final result was filtered to a tiny set), just because the actual order of execution was not as expected.
Okay... So when should I use a CTE?
The following points are good reasons to use a CTE:
A CTE can help you to avoid repeated sub queries.
A CTE can be used multiple times within your statement, e.g. within a JOIN with a dynamic behavior depending on the actual row-count.
You can use multiple CTEs within one statement and you can use the result of one CTE within a later CTE.
There are recursive (or better iterative) CTEs.
Sometimes I used single-row-CTEs to define / pre-compute variables later used in the query. Things you would do with declared variables in procedural T-SQL. You can use A CROSS JOIN to get them into your query easily.
and also very nice: the updatable CTE allows for very easy-to-read statements, same applies for DELETE.
As above: Nothing one could not do without the CTE, but it is far better to read (I really like speaking names).
Final hints
Well, there are cases, where ugly code performs better :-)
It is always good to have clean and readable code. A CTE will help you with this. So give it a try. If the performance is bad, get into depth, look at the execution plans and try to find a reason where the engine might decide wrong.
In most cases it is a bad idea trying to outsmart the engine with hints such as FORCE ORDER (but in can help)
UPDATE
I was asked to point to advantages and disadvantages specifically:
Uhm, technically there are no real advantages or disadvantages. Disregarding recursive CTEs there's nothing one couldn't solve without a CTE.
Advantages
The main advantage is readability and maintainabilty.
Sometimes a CTE can save hundreds of lines of code. Instead of a repeating a huge sub-query one can use just a name as a variable. Corrections to the sub-query can be solved just in one place.
The CTE can serve in ad-hoc queries and make your life easier.
Disadvantages
One possible disadvantage is that it's very easy, even for experienced developers, to mistake a CTE as a temp table, assume that the visible order of steps will be the same as the acutal order of execution and stumble into unexpected results or even errors.
And - of course :-) - the strange wrong syntax error you'll see when you write a CTE after another statement without a separating ;. That's why many people tend to use ;WITH.
Probably not. CTE's are especially good at querying data for tree structures.
The information and quotes are from the following article on mssqltips.com "Choose Between SQL Server Subquery T-SQL Code" by Eric Blinn. https://www.mssqltips.com/sqlservertip/6618/sql-server-query-performance-cte-view-subquery-temp-table-table-variable/
SQL Server 2019 CTEs, subqueries, and views
The SQL Server [2019] engine optimizes every query that is given to it. When
it encounters a CTE, traditional subquery, or view, it sees them all
the same way and optimizes them the same way. This involves looking
at the underlying tables, considering their statistics, and choosing
the best way to proceed. In most cases they will return the same plan
and therefore perform exactly the same.
TempDB table
For the query that inserted rows into the temporary table, the
optimizer looked at the table statistics and chose the best way
forward. It actually made new table statistics for the temporary
table and then used them to run the second. This brings about very
similar performance.
Table variable
The table variable has poor performance in the example given in the article due to lack of table statistics.
...the table variable does not have any table statistics generated for
it like the TempDB table did. This means the optimizer has to make a
wild guess as to how to proceed. In this example it made a very, very
poor decision.
This is not to write off table variables. They surely have their
place as will be discussed later in the tip.
Temp table vs Table variable
A temporary table will be stored on disk and have statistics
calculated on it and a table variable will not. Because of this
difference temporary tables are best when the expected row count
is >100 and the table variable for smaller expected row counts where the lack of statistics will be less likely to lead to a bad query plan.
Advantages of CTE
CTE can be termed as 'Temporary View' used as a good alternative for a View in some cases.
The main advantage over a view is usage of memory. As CTE's scope is limited only to its batch, the memory allocated for it is flushed as soon as its batch is crossed. But once a view is created, it is stored until user drops it. If the view is not used after creation then it's a mere waste of memory.
CPU cost for CTE execution is lesser when compared to that of View.
Like View, CTE doesn't store any metadata of its definition and provides better readability.
A CTE can be referred for multiple times in a query.
As the scope is limited to the batch, multiple CTEs can have the same name which a view cannot have.
It can be made recursive.
Disadvantages of CTE
Though using CTE is advantageous, it does have some limitations to be kept in mind,
We knew that it is a substitute for a view but a CTE cannot be nested while Views can be nested.
View once declared can be used for any number of times but CTE cannot be used. It should be declared every time you want to use it. For this scenario, CTE is not recommended to use as it is a tiring job for user to declare the batches again and again.
Between the anchor members there should be operators like UNION, UNION ALL or EXCEPT etc.
In Recursive CTEs, you can define many Anchor Members and Recursive Members but all the Anchor Members must be defined before the first Recursive Member. You cannot define an Anchor Member between two Recursive Member.
The number of columns, the data types used in Anchor and Recursive Members should be same.
In Recursive Member, aggregate functions like TOP, operator like DISTINCT, clause like HAVING and GROUP BY, Sub-queries, joins like Left Outer or Right Outer or Full Outer are not allowed. Regarding Joins, only Inner Join is allowed in Recursive Member.
Recursion Limit is 32767, crossing which results in the crash of server due to infinite loop.