Slow query operation with recursive CTE - sql

I have a query in which I use a recursive CTE. Unfortunately, when I'm extending the entry dates, the time increases significantly.
Would anyone be able to help me how to change the code to make the query work more efficiently?
The problem seems to be in CTE4.
I am surprised by the slow operation of the query, because the result is generally a simple excel action. Like this:
Performance = x_prev * (1 + x)
DECLARE #BegOfPeriod DATE = '20100101'
,#EndOfPeriod DATE = '20191231'
,#clientID INT = 200010;
WITH
CTE AS
(
SELECT d.Date
,1+COALESCE(twr.day_diff_pct,0) as DayPct_plus1
FROM Days d
LEFT JOIN [dbo].[DailyTWR] TWR ON d.Date=twr.date AND Clientid=#clientID
WHERE d.Date between #BegOfPeriod and #EndOfPeriod AND d.Date>=(SELECT min(date) FROM DailyTWR WHERE ClientiD=#clientID)
),
CTE2 AS
(
SELECT *
,LAG(DayPct_plus1,1,1) OVER (order BY date) as DayPct_plus1_Prev
,ROW_NUMBER() OVER (order by date) as rownum
FROM cte
),
CTE3 AS
(
SELECT *
,c2.DayPct_plus1*c2.DayPct_plus1_Prev Performance
FROM CTE2 c2
),
CTE4 AS
(
SELECT c3.date,c3.DayPct_plus1,c3.DayPct_plus1_prev,c3.rownum,c3.Performance
FROM CTE3 c3
WHERE rownum=1
union all
SELECT c3.date,c3.DayPct_plus1,c3.DayPct_plus1_prev,c3.rownum
,c3.DayPct_plus1*c4.Performance as Performance
FROM CTE4 c4
JOIN CTE3 c3 ON c3.rownum=c4.rownum+1
)
SELECT c4.Date,c4.Performance
FROM CTE4 c4
option (maxrecursion 0)

Related

BigQuery - Cannot join on repeated field

Im trying to create a table that is 1 column with each row being a new date between 2 separate dates. The query works fine until I add a where clause that contains a subquery ie. NOT IN (SELECT ....). It works fine if I do something like NOT IN (TIMESTAMP('xyz')).
I keep getting an error saying "Cannot join on repeated field t2.f0__group.SomeDate"
I have no clue why this is happening. Also Im fairly new to BQ so if there is an easier way to do this please let me know. Thanks
SELECT SomeDate FROM
(
SELECT DATE_ADD(Day, i, "DAY") SomeDate
FROM
(
SELECT '2020-01-03' Day
) T1
CROSS JOIN
(
SELECT
POSITION(
SPLIT(
RPAD('', DATEDIFF('2020-01-30','2020-01-03') * 2, 'a,'))) i
FROM
(
SELECT NULL
)
) T2
)
WHERE SomeDate NOT IN (SELECT OtherDate FROM
(
SELECT TIMESTAMP('2020-01-04 00:00:00 UTC') AS OtherDate
),
(
SELECT TIMESTAMP('2020-01-06 00:00:00 UTC') AS OtherDate
),
(
SELECT TIMESTAMP('2020-01-08 00:00:00 UTC') AS OtherDate
)
)
I suggest to start over from scratch using below example
I think it does exactly what you are trying to achieve with probably minor adjustments
SELECT SomeDate
FROM (
SELECT
DATE(DATE_ADD(TIMESTAMP('2020-01-03'), pos - 1, "DAY")) AS SomeDate
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP('2020-01-30'), TIMESTAMP('2020-01-03')), '.'),'') AS h
FROM (SELECT NULL)),h
))
)
) a
LEFT JOIN (
SELECT OtherDate FROM
(SELECT '2020-01-04' AS OtherDate),
(SELECT '2020-01-06' AS OtherDate),
(SELECT '2020-01-08' AS OtherDate)
) b
ON b.OtherDate = a.SomeDate
WHERE b.OtherDate IS NULL

concatenate recursive cross join

I need to concatenate the name in a recursive cross join way. I don't know how to do this, I have tried a CTE using WITH RECURSIVE but no success.
I have a table like this:
group_id | name
---------------
13 | A
13 | B
19 | C
19 | D
31 | E
31 | F
31 | G
Desired output:
combinations
------------
ACE
ACF
ACG
ADE
ADF
ADG
BCE
BCF
BCG
BDE
BDF
BDG
Of course, the results should multiply if I add a 4th (or more) group.
Native Postgresql Syntax:
SqlFiddleDemo
WITH RECURSIVE cte1 AS
(
SELECT *, DENSE_RANK() OVER (ORDER BY group_id) AS rn
FROM mytable
),cte2 AS
(
SELECT
CAST(name AS VARCHAR(4000)) AS name,
rn
FROM cte1
WHERE rn = 1
UNION ALL
SELECT
CAST(CONCAT(c2.name,c1.name) AS VARCHAR(4000)) AS name
,c1.rn
FROM cte1 c1
JOIN cte2 c2
ON c1.rn = c2.rn + 1
)
SELECT name as combinations
FROM cte2
WHERE LENGTH(name) = (SELECT MAX(rn) FROM cte1)
ORDER BY name;
Before:
I hope if you don't mind that I use SQL Server Syntax:
Sample:
CREATE TABLE #mytable(
ID INTEGER NOT NULL
,TYPE VARCHAR(MAX) NOT NULL
);
INSERT INTO #mytable(ID,TYPE) VALUES (13,'A');
INSERT INTO #mytable(ID,TYPE) VALUES (13,'B');
INSERT INTO #mytable(ID,TYPE) VALUES (19,'C');
INSERT INTO #mytable(ID,TYPE) VALUES (19,'D');
INSERT INTO #mytable(ID,TYPE) VALUES (31,'E');
INSERT INTO #mytable(ID,TYPE) VALUES (31,'F');
INSERT INTO #mytable(ID,TYPE) VALUES (31,'G');
Main query:
WITH cte1 AS
(
SELECT *, rn = DENSE_RANK() OVER (ORDER BY ID)
FROM #mytable
),cte2 AS
(
SELECT
TYPE = CAST(TYPE AS VARCHAR(MAX)),
rn
FROM cte1
WHERE rn = 1
UNION ALL
SELECT
[Type] = CAST(CONCAT(c2.TYPE,c1.TYPE) AS VARCHAR(MAX))
,c1.rn
FROM cte1 c1
JOIN cte2 c2
ON c1.rn = c2.rn + 1
)
SELECT *
FROM cte2
WHERE LEN(Type) = (SELECT MAX(rn) FROM cte1)
ORDER BY Type;
LiveDemo
I've assumed that the order of "cross join" is dependent on ascending ID.
cte1 generate DENSE_RANK() because your IDs contain gaps
cte2 recursive part with CONCAT
main query just filter out required length and sort string
The recursive query is a bit simpler in Postgres:
WITH RECURSIVE t AS ( -- to produce gapless group numbers
SELECT dense_rank() OVER (ORDER BY group_id) AS grp, name
FROM tbl
)
, cte AS (
SELECT grp, name
FROM t
WHERE grp = 1
UNION ALL
SELECT t.grp, c.name || t.name
FROM cte c
JOIN t ON t.grp = c.grp + 1
)
SELECT name AS combi
FROM cte
WHERE grp = (SELECT max(grp) FROM t)
ORDER BY 1;
The basic logic is the same as in the SQL Server version provided by #lad2025, I added a couple of minor improvements.
Or you can use a simple version if your maximum number of groups is not too big (can't be very big, really, since the result set grows exponentially). For a maximum of 5 groups:
WITH t AS ( -- to produce gapless group numbers
SELECT dense_rank() OVER (ORDER BY group_id) AS grp, name AS n
FROM tbl
)
SELECT concat(t1.n, t2.n, t3.n, t4.n, t5.n) AS combi
FROM (SELECT n FROM t WHERE grp = 1) t1
LEFT JOIN (SELECT n FROM t WHERE grp = 2) t2 ON true
LEFT JOIN (SELECT n FROM t WHERE grp = 3) t3 ON true
LEFT JOIN (SELECT n FROM t WHERE grp = 4) t4 ON true
LEFT JOIN (SELECT n FROM t WHERE grp = 5) t5 ON true
ORDER BY 1;
Probably faster for few groups. LEFT JOIN .. ON true makes this work even if higher levels are missing. concat() ignores NULL values. Test with EXPLAIN ANALYZE to be sure.
SQL Fiddle showing both.

Efficient way to write this query

I am trying to order the records by 3 columns and then select a particular ID and the record before that plus the row after that. Here is my query:
;With Cte As
(
SELECT ROW_NUMBER() Over(Order By Book, PageINT, [IDAuto]) as RowNum, [IdAuto]
FROM CCWiseInstr2
)
Select * From Cte
Where RowNum = (Select RowNum From Cte
Where IdAuto = 211079)
UNION
Select * From Cte
Where RowNum = (Select RowNum - 1 From Cte
Where IdAuto = 211079)
UNION
Select * From Cte
Where RowNum = (Select RowNum + 1 From Cte
Where IdAuto = 211079)
What could the other efficient way to write this query. At the moment the query takes about 336 ms after creating all indexes which looks like a bit higher to me.
Here is the plan for the query:
http://gyazo.com/9a7f1c37d4433665d0949acf03c4561c
Any help is appreciated.
How about this query:
;With Cte As
(
SELECT ROW_NUMBER() Over(Order By Book, PageINT, [IDAuto]) as RowNum, [IdAuto]
FROM CCWiseInstr2
)
Select RowNum, IDAuto From Cte
Where RowNum IN (
Select RowNumber From
(
Select RowNum - 1 as RowNumPrev,
RowNum as RowNum,
RowNum + 1 as RowNumNext
From Cte
Where IdAuto = 211079
) vw unpivot (
RowNumber For
IdAuto IN (RowNumPrev, RowNum, RowNumNext )
) unpw )
Instead of UNION just use UNPIVOT which will convert your columns into rows which you could then use in IN. Let me know how it goes.
You can use the LEAD and LAG functions with SQL Server. Here's a great article on Simple Talk covering all of the options. (Code below is untested)
https://www.simple-talk.com/sql/t-sql-programming/sql-server-2012-window-function-basics/
SELECT
[IdAuto],
LAG([IDAuto], 1) OVER(Order By Book, PageINT, [IDAuto]) AS PreviousSale,
LEAD([IDAuto], 1) OVER(Order By Book, PageINT, [IDAuto]) AS NextSale
FROM
CCWiseInstr2
WHERE [IdAuto] = 211079;

SQL stored procedure to add up values and stop once the maximum has been reached

I would like to write a SQL query (SQL Server) that will return rows (in a given order) but only up to a given total. My client has paid me a given amount, and I want to return only those rows that are <= to that amount.
For example, if the client paid me $370, and the data in the table is
id amount
1 100
2 122
3 134
4 23
5 200
then I would like to return only rows 1, 2 and 3
This needs to be efficient, since there will be thousands of rows, so a for loop would not be ideal, I guess. Or is SQL Server efficient enough to optimise a stored proc with for loops?
Thanks in advance. Jim.
A couple of options are.
1) Triangular Join
SELECT *
FROM YourTable Y1
WHERE (SELECT SUM(amount)
FROM YourTable Y2
WHERE Y1.id >= Y2.id ) <= 370
2) Recursive CTE
WITH RecursiveCTE
AS (
SELECT TOP 1 id, amount, CAST(amount AS BIGINT) AS Total
FROM YourTable
ORDER BY id
UNION ALL
SELECT R.id, R.amount, R.Total
FROM (
SELECT T.*,
T.amount + Total AS Total,
rn = ROW_NUMBER() OVER (ORDER BY T.id)
FROM YourTable T
JOIN RecursiveCTE R
ON R.id < T.id
) R
WHERE R.rn = 1 AND Total <= 370
)
SELECT id, amount, Total
FROM RecursiveCTE
OPTION (MAXRECURSION 0);
The 2nd one will likely perform better.
In SQL Server 2012 you will be able to so something like
;WITH CTE AS
(
SELECT id,
amount,
SUM(amount) OVER(ORDER BY id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RunningTotal
FROM YourTable
)
SELECT *
FROM CTE
WHERE RunningTotal <=370
Though there will probably be a more efficient way (to stop the scan as soon as the total is reached)
Straight-forward approach :
SELECT a.id, a.amount
FROM table1 a
INNER JOIN table1 b ON (b.id <=a.id)
GROUP BY a.id, a.amount
HAVING SUM(b.amount) <= 370
Unfortunately, it has N^2 performance issue.
something like this:
select id from
(
select t1.id, t1.amount, sum( t2.amount ) s
from tst t1, tst t2
where t2.id <= t1.id
group by t1.id, t1.amount
)
where s < 370

Optimize select query (inner select + group)

My current version is :
SELECT DT, AVG(DP_H2O) AS Tx,
(SELECT AVG(Abs_P) / 1000000 AS expr1
FROM dbo.BACS_MinuteFlow_1
WHERE (DT =
(SELECT MAX(DT) AS Expr1
FROM dbo.BACS_MinuteFlow_1
WHERE DT <= dbo.BACS_KongPrima.DT ))
GROUP BY DT) AS Px
FROM dbo.BACS_KongPrima
GROUP BY DT
but it works very slow.
basically in inner select I'm selecting maximum near time to my time, then group by this nearest time.
Is there possible optimizations ? Maybe I can join it somehow , but the trouble I'm not sure how to group by this nearest date.
Thank you
You could try to rearrange it to use the code below using a cross apply. Am not sure if this will improve performance but generally I try to avoid at all costs using a query on a specific column and SQL Server is pretty good at optimising the Apply statement.
WITH Bacs_MinuteFlow_1 (Abs_P ,DT ) AS
(SELECT 5.3,'2011/10/10'
UNION SELECT 6.2,'2011/10/10'
UNION SELECT 7.8,'2011/10/10'
UNION SELECT 5.0,'2011/03/10'
UNION SELECT 4.3,'2011/03/10'),
BACS_KongPrima (DP_H2O ,DT)AS
(SELECT 2.3,'2011/10/15'
UNION SELECT 2.6,'2011/10/15'
UNION SELECT 10.2,'2011/03/15')
SELECT DT, AVG(DP_H2O) AS Tx,
a.Px
FROM BACS_KongPrima
CROSS APPLY
(
SELECT AVG(Abs_P) / 1000000 AS Px
FROM BACS_MinuteFlow_1
WHERE DT =
(SELECT MAX(DT) AS maxdt
FROM BACS_MinuteFlow_1
WHERE DT <= BACS_KongPrima.DT
)
) a
GROUP BY DT,a.Px
Cheers