Oracle SQL re-use subquery "WITH" clause assistance - sql

I have an SQL query in which I need to take the output of a subquery and use it more than once. My existing query works, but only if I repeat the subquery each time I need it. Unfortunately the subquery is complex, and takes time to execute - meaning that multiple iterations really slow the whole thing down.
I have read that you can use the "WITH" statement to assign a subquery output to a variable, in order to re-use that variable. However the problem I'm having is that within the subquery, I need to reference values from the main query. And it appears that if I use WITH - before the main query SELECT - then those references are not recognised. I'll give you a simplified example:
WITH
DateX AS
(
SELECT
MAX(TableSub.Date)
FROM
TableA TableSub
WHERE
TableSub.ID = TableMain.ID
AND TableSub.Event = 'AnotherEvent'
AND TableSub.Date BETWEEN '01-Jan-2015' AND '31-Dec-2015'
)
SELECT
TableMain.ID
FROM
TableA TableMain
WHERE
TableMain.Event = 'MainEvent'
AND TableMain.Date >= DateX
AND (
SELECT
TableSub2.ID
FROM
TableA TableSub2
WHERE
TableSub2.ID = TableMain.ID
TableSub2.Event = 'ThirdEvent'
AND TableSub2.Date <= DateX
) IS NULL
I hope this is clear. It's a simplified version of what I have, but you can see that DateX is used in more than one place: within the main query, and within a subquery. However the problem is that when DateX is defined by WITH, I need to link the ID back to the ID of the main query. And it's not working...
I would be grateful for any advice on this. Am I doing it wrong? Is there a way, or is it just impossible? If so, then should I be using another approach entirely? Thanks.

A better way:
SELECT ID
FROM (
SELECT ID,
"Date",
Event,
LAST_VALUE( CASE Event WHEN 'AnotherEvent' THEN "Date" END IGNORE NULLS )
OVER ( PARTITION BY ID ORDER BY "Date"
ROWS BETWEEN UNBOUNDED PRECEEDING AND UNBOUNDED FOLLOWING
) AS another_date,
FIRST_VALUE( CASE Event WHEN 'ThirdEvent' THEN "Date" END IGNORE NULLS )
OVER ( PARTITION BY ID ORDER BY "Date"
ROWS BETWEEN UNBOUNDED PRECEEDING AND UNBOUNDED FOLLOWING
) AS third_date
FROM TableA
WHERE Event IN ( 'MainEvent', 'ThirdEvent' )
OR ( Event = 'AnotherEvent' AND EXTRACT( YEAR FROM "Date" ) = 2015 )
)
WHERE Event = 'MainEvent'
AND "Date" >= another_date
AND ( third_date IS NULL OR third_date > another_date );

You need to join your DateX CTE on the ID column. Something like:
WITH
DateX AS
(
SELECT
TableSub.ID,
MAX(TableSub.Date) AS MaxDate
FROM
TableA TableSub
WHERE
AND TableSub.Event = 'AnotherEvent'
AND TableSub.Date >= DATE '2015-01-01'
AND TableSub.Date < DATE '2016-01-01'
GROUP BY
TableSub.ID
)
SELECT
TableMain.ID
FROM
TableA TableMain
JOIN
DateX
ON
DateX.ID = TableMain.ID
WHERE
TableMain.Event = 'MainEvent'
AND TableMain.Date >= DateX.MaxDate
AND (
SELECT
TableSub2.ID
FROM
TableA TableSub2
JOIN
DateX
ON
DateX.ID = TableSub2.ID
WHERE
TableSub2.ID = TableMain.ID
TableSub2.Event = 'ThirdEvent'
AND TableSub2.Date <= DateX.MaxDate
) IS NULL
The CTE also needs a column alias for the aggregate; and as you need to join in the ID, you need to include that and group by it.
The last subquery looks odd; you might want NOT EXISTS rather than IS NULL if you're looking for no record. Perhaps your real query is using an aggregate, but even so that might be quicker.
This still may not be the best approach but it's hard to tell from your example. Hitting the same table three times may be unnecessarily expensive.

Related

postgres: COUNT, DISTINCT is not implemented for window functions

I am trying to use COUNT(DISTINC column) OVER(PARTITION BY column) when I am using COUNT + window function(OVER).
I get an error like the one in the title and can't get it to work.
I have looked into how to deal with this error, but I have not found an example of how to deal with such a complex query as the one below.
I cannot find an example of how to deal with such a complex query as shown below, and I am not sure how to handle it.
The COUNT part of the problem exists on line 65.
How can such a complex query be resolved without slowing down?
WITH RECURSIVE "cte" AS((
SELECT
"videos_productvideocomment"."id",
"videos_productvideocomment"."user_id",
"videos_productvideocomment"."video_id",
"videos_productvideocomment"."parent_id",
"videos_productvideocomment"."text",
"videos_productvideocomment"."commented_at",
"videos_productvideocomment"."edited_at",
"videos_productvideocomment"."created_at",
"videos_productvideocomment"."updated_at",
"videos_productvideocomment"."id" AS "root_id"
FROM
"videos_productvideocomment"
WHERE
(
"videos_productvideocomment"."parent_id" IS NULL
AND "videos_productvideocomment"."video_id" = 'f264433c-c0af-49cc-8b40-84453da71b2d'
)
) UNION(
SELECT
"videos_productvideocomment"."id",
"videos_productvideocomment"."user_id",
"videos_productvideocomment"."video_id",
"videos_productvideocomment"."parent_id",
"videos_productvideocomment"."text",
"videos_productvideocomment"."commented_at",
"videos_productvideocomment"."edited_at",
"videos_productvideocomment"."created_at",
"videos_productvideocomment"."updated_at",
"cte"."root_id" AS "root_id"
FROM
"videos_productvideocomment"
INNER JOIN
"cte"
ON "videos_productvideocomment"."parent_id" = "cte"."id"
))
SELECT
*,
EXISTS(
SELECT
(1) AS "a"
FROM
"videos_productvideolikecomment" U0
WHERE
(
U0."comment_id" = t."id"
AND U0."user_id" = '3bd3bc86-0335-481e-9fd2-eb2fb1168f48'
)
LIMIT 1
) AS "liked"
FROM
(
SELECT DISTINCT
"cte"."id",
"cte"."created_at",
"cte"."updated_at",
"cte"."user_id",
"cte"."text",
"cte"."commented_at",
"cte"."edited_at",
"cte"."parent_id",
"cte"."video_id",
"cte"."root_id" AS "root_id",
COUNT(DISTINCT "cte"."root_id") OVER(PARTITION BY "cte"."root_id") AS "reply_count", <--- here
COUNT("videos_productvideolikecomment"."id") OVER(PARTITION BY "cte"."id") AS "liked_count"
FROM
"cte"
LEFT OUTER JOIN
"videos_productvideolikecomment"
ON (
"cte"."id" = "videos_productvideolikecomment"."comment_id"
)
) t
WHERE
t."id" = t."root_id"
ORDER BY
CASE
WHEN t."user_id" = '3bd3bc86-0335-481e-9fd2-eb2fb1168f48' THEN 0
ELSE 1
END ASC,
"liked_count" DESC
DISTINCT will look for duplicates and remove it, but in big data it will take a lot of time to process this query, you should process the middle of the record in the programming part I think it will be fast than. Thank

How to avoid rerunning a subquery used multiple times?

The following is a simplified example of a problem I'm running into. Assume I have a query "SomeQuery" (SELECT... FROM... WHERE...) that gives an output that looks like this:
Status
MyDate
A
6/14/2021
A
6/12/2021
B
6/10/2021
A
6/8/2021
B
6/6/2021
A
6/4/2021
I need to get the earliest status A date that is greater than the maximum status B date. In this case 6/12/2021.
I have a query that looks like this:
SELECT
MIN(MyDate) AS DateNeeded
FROM
SomeQuery
WHERE
Status = 'A'
AND MyDate > (
SELECT
MAX(MyDate) AS MaxDateB
FROM
SomeQuery
WHERE
Status = 'B'
)
This works, but I would like to avoid running the subquery twice. I tried assigning an alias to the first instance of the subquery, and then using that alias in place of the second instance, but this results in an "Invalid object name" error.
Any help would be appreciated. Thanks.
but to avoid hitting table twice you could use window function:
select top(1) Mydate from
(
select *, max(case when Status = 'B' then Mydate end) over () MaxBDate from data
) t
where status = 'A'
and MyDate > MaxBDate
order by Mydate
db<>fiddle here
I'm not sure I understand what you're after exactly, but it is possible to do something like this:
;WITH cte (MaxMyDate) as
(
SELECT
MAX(MyDate) AS MaxDateB
FROM
SomeQuery
WHERE
Status = 'B'
)
SELECT
MIN(MyDate) AS DateNeeded
FROM
SomeQuery
WHERE
Status = 'A'
AND MyDate > (SELECT MaxMyDate from cte)
Some may find this a bit easier to read, since some of the complexity is moved to a cte.

Use of MAX function in SQL query to filter data

The code below joins two tables and I need to extract only the latest date per account, though it holds multiple accounts and history records. I wanted to use the MAX function, but not sure how to incorporate it for this case. I am using My SQL server.
Appreciate any help !
select
PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label
from
Property.dbo.PROP
inner join
Property.dbo.PROP_DATA on Property.dbo.PROP.FileID = Actuarial.dbo.PROP_DATA.FileID
where
(PROP_DATA.Label in ('Occupancy' , 'OccupancyTIV'))
and (PROP.EffDate >= '42278' and PROP.EffDate <= '42643')
and (PROP.Status = 'Bound')
and (Prop.FileTime = Max(Prop.FileTime))
order by
PROP.EffDate DESC
Assuming your DBMS supports windowing functions and the with clause, a max windowing function would work:
with all_data as (
select
PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label,
max (PROP.EffDate) over (partition by PROP.PolNo) as max_date
from Actuarial.dbo.PROP
inner join Actuarial.dbo.PROP_DATA
on Actuarial.dbo.PROP.FileID = Actuarial.dbo.PROP_DATA.FileID
where (PROP_DATA.Label in ('Occupancy' , 'OccupancyTIV'))
and (PROP.EffDate >= '42278' and PROP.EffDate <= '42643')
and (PROP.Status = 'Bound')
and (Prop.FileTime = Max(Prop.FileTime))
)
select
FileName, InsName, Status, FileTime, SubmissionNo,
PolNo, EffDate, ExpDate, Region, UnderWriter, Data, Label
from all_data
where EffDate = max_date
ORDER BY EffDate DESC
This also presupposes than any given account would not have two records on the same EffDate. If that's the case, and there is no other objective means to determine the latest account, you could also use row_numer to pick a somewhat arbitrary record in the case of a tie.
Using straight SQL, you can use a self-join in a subquery in your where clause to eliminate values smaller than the max, or smaller than the top n largest, and so on. Just set the number in <= 1 to the number of top values you want per group.
Something like the following might do the trick, for example:
select
p.FileName
, p.InsName
, p.Status
, p.FileTime
, p.SubmissionNo
, p.PolNo
, p.EffDate
, p.ExpDate
, p.Region
, p.Underwriter
, pd.Data
, pd.Label
from Actuarial.dbo.PROP p
inner join Actuarial.dbo.PROP_DATA pd
on p.FileID = pd.FileID
where (
select count(*)
from Actuarial.dbo.PROP p2
where p2.FileID = p.FileID
and p2.EffDate <= p.EffDate
) <= 1
and (
pd.Label in ('Occupancy' , 'OccupancyTIV')
and p.Status = 'Bound'
)
ORDER BY p.EffDate DESC
Have a look at this stackoverflow question for a full working example.
Not tested
with temp1 as
(
select foo
from bar
whre xy = MAX(xy)
)
select PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label
from Actuarial.dbo.PROP
inner join temp1 t
on Actuarial.dbo.PROP.FileID = t.dbo.PROP_DATA.FileID
ORDER BY PROP.EffDate DESC

Rank & find difference of value in the same column

I have the below table -
Here, I have created the "Order" column by using the rank function partitioned by case_identifier ordered by audit_date.
Now, I want to create a new column as below -
The logic for the new column would be -
select *,
case when [order] = '1' then [days_diff]
else (val of [days_diff] in rank 2) - (val of [days_diff] in rank 1) ...
end as '[New_Col]'
from TABLE
Can you please help me with the syntax? Thanks.
Take a look at the LAG function. It will provide you with what you want.
something like:
declare #temptable TABLE (case_id varchar(2), row_order int, days_diff float)
INSERT INTO #temptable values ('A',1,5)
INSERT INTO #temptable values ('A',2,3)
INSERT INTO #temptable values ('A',3,2)
INSERT INTO #temptable values ('B',1,5)
INSERT INTO #temptable values ('B',2,1)
--select * from #temptable
SELECT case_id,row_order, LAG(days_diff,1) OVER (PARTITION BY case_id ORDER BY row_order) AS prev_row,days_diff,
CASE
WHEN row_order = 1 THEN days_diff
ELSE LAG(days_diff,1) OVER (PARTITION BY case_id ORDER BY row_order) - days_diff
END AS newcolumn
FROM #temptable
order by case_id,row_order asc
SELECT case_id,row_order,LAG(days_diff,1) OVER (PARTITION BY case_id ORDER BY row_order) AS prev_row, days_diff,
COALESCE(LAG(days_diff,1) OVER (PARTITION BY case_id ORDER BY row_order) - days_diff , days_diff)
FROM #temptable
order by case_id,row_order asc
Other answers will use a coalesce in place of the CASE statement. It's probably faster, but I feel like this is clearer.
If you run both and look at the execution plans they are the same.
I believe the following query gets you what you want.
SELECT a.*,
'NEW DAYS DIFF' =
CASE
WHEN a.[order] = 1 THEN a.days_diff
ELSE a.days_diff - b.days_diff
END
FROM dbo.tblCaseDaysDiff a
INNER JOIN dbo.tblCaseDaysDiff b
ON
(b.CASE_ID = a.CASE_ID AND b.[order] + 1 = a.[order] ) -- Get the current row and compare with the next highest order
OR (b.CASE_ID = a.CASE_ID AND b.[order] = 1 AND a.[order] = 1) --WHEN ORDER = 1 Get days_diff value
ORDER BY a.CASE_ID, a.[order]
As it happens, you're already hip-deep in window functions, and as others have pointed out, LAG will do the trick. In general, though, you can always get the difference of two rows by making one row: by joining the table to itself.
with T (CASE_IDENTIFIER, AUDIT_DATE, order, days_diff)
as (
... your query ...
)
select a.*,
a.days_diff - coalesce(b.days_diff, 0) as delta_days_diff
from T as a left join T as b
on a.CASE_IDENTIFIER = b.CASE_IDENTIFIER
and b.days_diff = a.days_diff - 1
LAG METHOD
SELECT
CASE_IDENTIFIER
,AUDIT_DATE
,[order]
,days_diff
,days_diff - ISNULL(LAG(days_diff,1) OVER (PARTITION BY CASE_IDENTIFIER ORDER BY [order]),0) AS New_Column
FROM #Table
SELF JOIN METHOD
SELECT
t1.CASE_IDENTIFIER
,AUDIT_DATE
,t1.[order]
,t1.days_diff
,t1.days_diff - ISNULL(t2.days_diff,0) AS New_Column
FROM
#Table t1
LEFT JOIN #Table t2
ON t1.CASE_IDENTIFIER = t2.CASE_IDENTIFIER
AND t1.[order] - 1 = t2.[order]
I feel like a lot of the other answers are on the right track but there are some nuances or easier ways of writing some of them. Or also some of the answer provide the write direction but had something wrong with their join or syntax. Anyways, you don't need the CASE STATEMENT whether you use the LAG of SELF JOIN Method. Next COALESCE() is great but you are only comparing 2 values so ISNULL() works fine too for sql-server but either will do.

Bubbling Up Columns in Sql

Pardon the convoluted example, but I believe there is something fundamental about sql I am missing and I'm not sure what it is. I have this crazy query...
SELECT *
FROM (
SELECT *
FROM (
SELECT #t1 := #t1 +1 AS leaderboard_entry_youngness_rank, 1 - #t1 /100 AS
leaderboard_entry_youngness_based_on_expiry, leaderboard_entry . * ,
NOW( ) - leaderboard_entry_timestamp AS leaderboard_entry_age_in_some_units,
TO_DAYS( NOW( ) ) - TO_DAYS( leaderboard_entry_timestamp )
AS leaderboard_entry_age_in_days
FROM leaderboard_entry) AS inner_temp
NATURAL JOIN leaderboard
NATURAL JOIN user
WHERE (
leaderboard_load_key = 'sk-en-adjectives-1'
OR leaderboard_load_key = '-sk-en-adjectives-1'
)
AND leaderboard_quiz_mode = '0'
ORDER BY leaderboard_entry_age_in_some_units ASC , leaderboard_entry_timestamp ASC
LIMIT 0 , 100
) AS outer_temp
ORDER BY leaderboard_entry_elapsed_time_ms ASC , leaderboard_entry_timestamp ASC
LIMIT 0 , 50
I added the second nested SELECT statement because the user_name in the user table was not being returned in the outermost query. But now the leaderboard_entry_youngness_based_on_expiry field, which is being generated based on a row index ratio, is not working correctly.
If I remove the second nested SELECT statement, the leaderboard_entry_youngness_based_on_expiry works as expected, but the user_name column is not returned.
How can I satisfy both? Why is this happening?
Thanks!
This stems from the following question:
Add a numbered list column to a returned MySQL query
In your inner SELECT statement, you do not have user.user_name, that's why username is not returned. Remove the outer query, do it like earlier but with user.user_name like this:
....
SELECT #t1 := #t1 +1 AS leaderboard_entry_youngness_rank, 1 - #t1 /100 AS
leaderboard_entry_youngness_based_on_expiry, leaderboard_entry . * ,
NOW( ) - leaderboard_entry_timestamp AS leaderboard_entry_age_in_some_units,
TO_DAYS( NOW( ) ) - TO_DAYS( leaderboard_entry_timestamp )
AS leaderboard_entry_age_in_days, user.user_name
....
Try putting a ORDER BY in the inner most query, since there currently is no ORDER BY clause, its wrong to say that "is not working correctly".
Check if you take away the outer SELECT * FROM..., see if there are duplicate user_name columns.
BTW, Since you are not using the row index columns in your query, why not just put this logic in the application itself? it will be more reliable doing so.