Postgresql view with many common table expressions is slow - sql

This is a huge simplification of my query, but essentially I have a series of common table expressions that build off of each other which I would like to turn into a view. The problem is it's extremely slow when I try to use a view, but very fast when I run the query.
CREATE VIEW user_view AS
WITH cte AS(
SELECT first,middle,last FROM user
),
cte2 AS(
SELECT *,first + middle AS first_middle FROM cte
),
cte3 AS(
SELECT *,first_middle + last AS full_name FROM cte2
)
SELECT * from cte3;
Fast query
WITH cte AS(
SELECT first,middle,last FROM user WHERE user_id = 5
),
cte2 AS(
SELECT *,first + middle AS first_middle FROM cte
),
cte3 AS(
SELECT *,first_middle + last AS full_name FROM cte2
)
SELECT * from cte3;
Slow query using the view
SELECT * from user_view WHERE user_id = 5

Postgres implements something called an "optimization fence" for CTEs. That means that Postgres materializes each CTE for subsequent processing. One nice effect is that a CTE can be referenced multiple times, but the code is only executed once. The downside is that conveniences such as indexes are "forgotten" after the CTE has been materialized.
For your question, the view is actually immaterial (no pun intended). In this version:
WITH cte AS (
SELECT first, middle, last FROM user WHERE user_id = 5
),
cte2 AS (
SELECT *, first || middle AS first_middle FROM cte
),
cte3 AS (
SELECT *, first_middle || last AS full_name FROM cte2
)
SELECT *
FROM cte3;
The first CTE presumably pulls one record out from the table. Presumably, it uses an index on the id and even that operation is very fast. That one record is the only record processed by the remaining CTEs.
In this version:
WITH cte AS (
SELECT first, middle, last FROM user
),
cte2 AS (
SELECT *, first || middle AS first_middle FROM cte
),
cte3 AS (
SELECT *, first_middle || last AS full_name FROM cte2
)
SELECT *
FROM cte3
WHERE user_id = 5;
The CTEs are processing all the data in the user table. At the end, the row meeting the WHERE condition needs to be found. The materialized CTE no longer has an index . . . so the data is searched sequentially.
This behavior does not apply to subqueries, so you can try rewriting your logic using subqueries rather than CTEs.
Postgres optimizes CTEs differently from other databases. For instance, SQL Server never materializes subqueries; the code is always "inserted" into the query and optimized as a whole. In fact, SQL Server forums have the opposite concern -- to implement an option to materialize the CTEs. is different from other databases. Oracle is one database that seems to take both approaches.

Related

Use recursive on temp. view with "with" [duplicate]

Is it possible to combine multiple CTEs in single query?
I am looking for way to get result like this:
WITH cte1 AS (
...
),
WITH RECURSIVE cte2 AS (
...
),
WITH cte3 AS (
...
)
SELECT ... FROM cte3 WHERE ...
As you can see, I have one recursive CTE and two non recursive.
Use the key word WITH once at the top. If any of your Common Table Expressions (CTE) are recursive (rCTE) you have to add the keyword RECURSIVE at the top once also, even if not all CTEs are recursive:
WITH RECURSIVE
cte1 AS (...) -- can still be non-recursive
, cte2 AS (SELECT ...
UNION ALL
SELECT ...) -- recursive term
, cte3 AS (...)
SELECT ... FROM cte3 WHERE ...
The manual:
If RECURSIVE is specified, it allows a SELECT subquery to
reference itself by name.
Bold emphasis mine. And, even more insightful:
Another effect of RECURSIVE is that WITH queries need not be ordered:
a query can reference another one that is later in the list. (However,
circular references, or mutual recursion, are not implemented.)
Without RECURSIVE, WITH queries can only reference sibling WITH
queries that are earlier in the WITH list.
Bold emphasis mine again. Meaning that the order of WITH clauses is meaningless when the RECURSIVE key word has been used.
BTW, since cte1 and cte2 in the example are not referenced in the outer SELECT and are plain SELECT commands themselves (no collateral effects), they are never executed (unless referenced in cte3).
Yes. You don't repeat the WITH. You just use a comma:
WITH cte1 AS (
...
),
cte2 AS (
...
),
cte3 AS (
...
)
SELECT ... FROM 'cte3' WHERE ...
And: Only use single quotes for string and date constants. Don't use them for column aliases. They are not allowed for CTE names anyway.
As the accepted answer correctly says, the with clause is used only once per a CTE chain. However, for sake of completeness, I would like to add it does not stop you from nesting CTEs.
If cte2 uses cte1, cte3 uses cte2 etc., then the dependency chain between CTEs is linear and it is expressed as with with 3 CTEs. On the contrary, if cte2 doesn't need cte1 and both are needed only in cte3 it should be considered to nest them under definition of cte3 (with cte3 as (with cte1 as (...), cte2 as (...) select...)).
The syntax of CTEs then reflects the dependency tree between CTEs and literally visualizes the scope of partial datasets which can improve readability and prevents scope leakage bugs. Not all db vendors support it but Postgres does.
Example:
with cte1(id,capital) as (
values(1,'Prague'),(2,'Bratislava')
), cte2(id,code) as (
with cte2inner1(id,code) as (
values(1,'CZ'),(2,'SK')
), cte2inner2(id,country) as (
values(1,'Czech Republic'),(2,'Slovakia')
)
select id,country from cte2inner1 join cte2inner2 using (id)
)
select *
from cte1 join cte2 using (id)
--join cte2inner1 not possible here
Problem Reason: Here, you don't have to use multiple WITH clause for combine Multiple CTE.
Solution: It is possible to create the Multiple Common Table Expression's using single WITH clause in SQL. The two different CTE's are created using Single WITH Clause and this is separated by comma to create multiple CTE's.
Sample Multiple CTE's using single
With EmpCount1(DeptName,TotalEmployees)
as
(
Select DeptName, COUNT(*) as TotalEmployees
from Tbl_EmpDetails
join Tbl_Dept Dept
on Tbl_EmpDetails.DeptId = Dept.DeptId
WHERE DeptName IN ('BI','DOTNET')
group by DeptName
),
EmpCount2(DeptName,TotalEmployees)
as
(
Select DeptName, COUNT(*) as TotalEmployees
from Tbl_EmpDetails
join Tbl_Dept Dept
on Tbl_EmpDetails.DeptId = Dept.DeptId
WHERE DeptName IN ('JAVA','AI')
group by DeptName
)
Select * from EmpCount1
UNION
Select * from EmpCount2
This is sample syntax for creating multiple Common Table Expression's with a single With Clause.

Performance when using distinct and row_number pagination

I have a SQL something like this:
SELECT A,B,C,FUN(A) AS A FROM someTable
The problem is FUN() is a function which quite slow, so if there are a lot of records in someTable, there will be a big performance issue.
If we using a pagination, we can resolve this issue, we do the pagination like this:
SELECT * FROM(
SELECT A,B,C,FUN(A), Row_number()OVER( ORDER BY B ASC) AS rownum FROM someTable
)T WHERE T.rownum >=1 AND T.rownum<20
In this script, the FUN() will only execute 20 times so the performance is OK.
But we need use alias to order by, so we can't write rownum inline, have to move to sub query or CTE, we chose CTE and it looks like this:
;WITH CTE AS (
SELECT A,B AS alias,C,FUN(A) FROM someTable
)
SELECT * FROM(
SELECT *,Row_number()OVER( ORDER BY alias ASC) AS rownum FROM CTE
)T WHERE T.rownum >=1 AND T.rownum<20
So far we are going fine, we get pagination to solve performance issue, we solve the alias order problem, but somehow we need to add DISTINCT to the query:
;WITH CTE AS (
SELECT DISTINCT A,B AS alias,C,FUN(A) FROM someTable
)
SELECT * FROM(
SELECT *,Row_number()OVER( ORDER BY alias ASC) AS rownum FROM CTE
)T WHERE T.rownum >=1 AND T.rownum<20
After this, the optimize of this SQL seems gone, the FUN() will execute many times as much as the records count in someTable, we get the performance issue again.
Basically we are blocked at here, is there any suggestions?
The problem is that in order to get distinct values, the database engine must run the fun(a) function on all the records being selected.
If you do the fun(a) only in the final select, the distinct should not effect it, so it should run only on the final 20 records.
I've changed the derived table you've used to another cte (but that's a personal preference - seems to me more tidy not to mix derived tables and ctes):
;WITH CTE1 AS (
SELECT DISTINCT A,B AS alias,C
FROM someTable
),
CTE2 AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY alias) As RowNum
FROM CTE1
)
SELECT *, FUN(A)
FROM CTE2
WHERE RowNum >= 1
AND RowNum < 20
Please note that since the fun function is not deterministic you might get results that are different from your original query - so before adapting this solution compare the results first.

How to use multiple CTEs in a single SQL query?

Is it possible to combine multiple CTEs in single query?
I am looking for way to get result like this:
WITH cte1 AS (
...
),
WITH RECURSIVE cte2 AS (
...
),
WITH cte3 AS (
...
)
SELECT ... FROM cte3 WHERE ...
As you can see, I have one recursive CTE and two non recursive.
Use the key word WITH once at the top. If any of your Common Table Expressions (CTE) are recursive (rCTE) you have to add the keyword RECURSIVE at the top once also, even if not all CTEs are recursive:
WITH RECURSIVE
cte1 AS (...) -- can still be non-recursive
, cte2 AS (SELECT ...
UNION ALL
SELECT ...) -- recursive term
, cte3 AS (...)
SELECT ... FROM cte3 WHERE ...
The manual:
If RECURSIVE is specified, it allows a SELECT subquery to
reference itself by name.
Bold emphasis mine. And, even more insightful:
Another effect of RECURSIVE is that WITH queries need not be ordered:
a query can reference another one that is later in the list. (However,
circular references, or mutual recursion, are not implemented.)
Without RECURSIVE, WITH queries can only reference sibling WITH
queries that are earlier in the WITH list.
Bold emphasis mine again. Meaning that the order of WITH clauses is meaningless when the RECURSIVE key word has been used.
BTW, since cte1 and cte2 in the example are not referenced in the outer SELECT and are plain SELECT commands themselves (no collateral effects), they are never executed (unless referenced in cte3).
Yes. You don't repeat the WITH. You just use a comma:
WITH cte1 AS (
...
),
cte2 AS (
...
),
cte3 AS (
...
)
SELECT ... FROM 'cte3' WHERE ...
And: Only use single quotes for string and date constants. Don't use them for column aliases. They are not allowed for CTE names anyway.
As the accepted answer correctly says, the with clause is used only once per a CTE chain. However, for sake of completeness, I would like to add it does not stop you from nesting CTEs.
If cte2 uses cte1, cte3 uses cte2 etc., then the dependency chain between CTEs is linear and it is expressed as with with 3 CTEs. On the contrary, if cte2 doesn't need cte1 and both are needed only in cte3 it should be considered to nest them under definition of cte3 (with cte3 as (with cte1 as (...), cte2 as (...) select...)).
The syntax of CTEs then reflects the dependency tree between CTEs and literally visualizes the scope of partial datasets which can improve readability and prevents scope leakage bugs. Not all db vendors support it but Postgres does.
Example:
with cte1(id,capital) as (
values(1,'Prague'),(2,'Bratislava')
), cte2(id,code) as (
with cte2inner1(id,code) as (
values(1,'CZ'),(2,'SK')
), cte2inner2(id,country) as (
values(1,'Czech Republic'),(2,'Slovakia')
)
select id,country from cte2inner1 join cte2inner2 using (id)
)
select *
from cte1 join cte2 using (id)
--join cte2inner1 not possible here
Problem Reason: Here, you don't have to use multiple WITH clause for combine Multiple CTE.
Solution: It is possible to create the Multiple Common Table Expression's using single WITH clause in SQL. The two different CTE's are created using Single WITH Clause and this is separated by comma to create multiple CTE's.
Sample Multiple CTE's using single
With EmpCount1(DeptName,TotalEmployees)
as
(
Select DeptName, COUNT(*) as TotalEmployees
from Tbl_EmpDetails
join Tbl_Dept Dept
on Tbl_EmpDetails.DeptId = Dept.DeptId
WHERE DeptName IN ('BI','DOTNET')
group by DeptName
),
EmpCount2(DeptName,TotalEmployees)
as
(
Select DeptName, COUNT(*) as TotalEmployees
from Tbl_EmpDetails
join Tbl_Dept Dept
on Tbl_EmpDetails.DeptId = Dept.DeptId
WHERE DeptName IN ('JAVA','AI')
group by DeptName
)
Select * from EmpCount1
UNION
Select * from EmpCount2
This is sample syntax for creating multiple Common Table Expression's with a single With Clause.

SQL Server view inside CTE causing poor performance

When I use a view inside of a CTE, each subquery that references the CTE seems to re-query the view. There are large chunks of the execution plan that are repeated for each subquery. This isn't the case when selecting from a table. Is this expected? Is there any way to get around it?
WITH cte AS (
SELECT v.id
FROM test_view AS v
)
SELECT TOP 25 *,
(SELECT COUNT(*) FROM cte) AS subquery
FROM cte
I'm working with SQL Server 2005
EDIT:
I'm trying to get data from a view in pages with the query below. I need the total number of rows in the view, the number of rows that match a search, and a subset of the matching rows. This works well when selecting from tables, but using a view causes repeated execution of the CTE. I attempted to force intermediate materialization a variety of different ways from the link in Martin's answer, but didn't have any luck.
WITH tableRecords AS (
SELECT *
FROM example_view
),
filteredTableRecords AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY id ASC) AS tableRecordNumber
FROM tableRecords
WHERE 1 = 1
)
SELECT *,
(SELECT COUNT(*) FROM tableRecords) AS totalRecords,
(SELECT COUNT(*) FROM filteredTableRecords) AS totalDisplayRecords
FROM filteredTableRecords
WHERE tableRecordNumber BETWEEN 1 AND 25
ORDER BY tableRecordNumber ASC
Yes it is largely expected.
See Provide a hint to force intermediate materialization of CTEs or derived tables
For the query in your question you can do this though
WITH CTE AS
(
SELECT v.id,
count(*) OVER () AS Cnt
FROM test_view AS v
)
SELECT TOP 25 *
FROM CTE
ORDER BY v.id
I suggest you re-write your query as below
There are a few improvements I did over your query
removal of where 1=1, it is unnecessary as it is always true.
the sub-query in select clause will be called everytime you execute the sql script, so instead of that you can actually use the cross apply to increase the performance.
;WITH tableRecords AS(
SELECT *
FROM example_view
),
filteredTableRecords AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY id ASC) AS tableRecordNumber
FROM tableRecords
),TotalNumber
(
SELECT (SELECT COUNT(1) FROM tableRecords) AS totalRecords,
(SELECT COUNT(1) FROM filteredTableRecords) AS totalDisplayRecords
)
SELECT *
FROM filteredTableRecords F
CROSS APPLY TotalNumber AS T
WHERE tableRecordNumber BETWEEN 1 AND 25
ORDER BY tableRecordNumber ASC

How to join two equivalent tables which are the result of the previous recursive select in SQL Server

Good day everyone! Firstly, I'm sorry for my poor english. Well, I've got a question that you can read in the title of this message.
SQL Server returns this message(Error 253) when I'm trying to select necessary data.
Translate "Recursive element from CTE (which name is 'recurse' - my
note) has multiple reference in CTE.
How can I solve this problem?
Can you advice me how to join two tables (with 2 columns(for example : a and b) which are the result of previous recursive select (I'm writing about the same select, but about another iteration of if)
with recurse (who_acts,on_whom_influence)
as
(
-------------------------------------------FIRST SELECT
select distinct interface_1.robot_name as who_acts,interface_2.robot_name as on_whom_influence
from INTERFACE as interface_1,INTERFACE as interface_2
where (interface_1.number in ( select INPUT_INTERFACE.source
from INPUT_INTERFACE
)
and interface_2.number in (
select INPUT_INTERFACE.number
from INPUT_INTERFACE
where (INPUT_INTERFACE.source=interface_1.number )
)
)
-------------------------------------------RECURSIVE PART
union all
select rec1.who_acts,rec1.on_whom_influence
from recurse as rec1
inner join
(select rec2.who_acts,rec2.on_whom_influence
from recurse as rec2) as P on (1=1)
)
select * from recurse
The problem is in recurse CTE.The connecting condition is not simple, but it have no
influence on this problem.
Can you type some working code in comments
Here's a dummy table
create table tbl1 ( a int, b int );
insert tbl1 select 1,2;
insert tbl1 select 11,12;
insert tbl1 select 2,3;
insert tbl1 select 4,5;
And a similar query to yours
with cte as (
select a,b from tbl1
union all
select x.a,x.b from cte x join cte y on x.a=y.a+1
)
select * from cte;
The error:
Recursive member of a common table expression 'cte' has multiple recursive references.: with cte as ( select a,b from tbl1 union all select x.a,x.b from cte x join cte y on x.a=y.a+1 ) select * from cte
Basically, the error is exactly what it says. You cannot have a recursive CTE appear more than ONCE in a recursive section. Above, you see CTE aliased as both x and y. There are various reasons for this limitation, such as the fact that CTEs are recursed depth-first and not generation-by-generation.
What you should think about is why you would need it more than once. Your recursive portion doesn't make sense.
select rec1.who_acts,rec1.on_whom_influence
from recurse as rec1
inner join
( select rec2.who_acts,rec2.on_whom_influence
from recurse as rec2) as P on (1=1)
On the surface, the following are true if recurse were a real table (non-CTE):
The number of rows generated is count(recurse as [rec1]) x count(recurse as [rec2]).
The rows in recurse (rec1) are each replicated per row in recurse, hence #1
Columns from rec2 are never used. rec2 serves only to multiply
If this were permitted to run, the recursive portion of the query would keep quadratically increasing its number of rows and never finish.