How to avoid rerunning a subquery used multiple times? - sql

The following is a simplified example of a problem I'm running into. Assume I have a query "SomeQuery" (SELECT... FROM... WHERE...) that gives an output that looks like this:
Status
MyDate
A
6/14/2021
A
6/12/2021
B
6/10/2021
A
6/8/2021
B
6/6/2021
A
6/4/2021
I need to get the earliest status A date that is greater than the maximum status B date. In this case 6/12/2021.
I have a query that looks like this:
SELECT
MIN(MyDate) AS DateNeeded
FROM
SomeQuery
WHERE
Status = 'A'
AND MyDate > (
SELECT
MAX(MyDate) AS MaxDateB
FROM
SomeQuery
WHERE
Status = 'B'
)
This works, but I would like to avoid running the subquery twice. I tried assigning an alias to the first instance of the subquery, and then using that alias in place of the second instance, but this results in an "Invalid object name" error.
Any help would be appreciated. Thanks.

but to avoid hitting table twice you could use window function:
select top(1) Mydate from
(
select *, max(case when Status = 'B' then Mydate end) over () MaxBDate from data
) t
where status = 'A'
and MyDate > MaxBDate
order by Mydate
db<>fiddle here

I'm not sure I understand what you're after exactly, but it is possible to do something like this:
;WITH cte (MaxMyDate) as
(
SELECT
MAX(MyDate) AS MaxDateB
FROM
SomeQuery
WHERE
Status = 'B'
)
SELECT
MIN(MyDate) AS DateNeeded
FROM
SomeQuery
WHERE
Status = 'A'
AND MyDate > (SELECT MaxMyDate from cte)
Some may find this a bit easier to read, since some of the complexity is moved to a cte.

Related

postgres: COUNT, DISTINCT is not implemented for window functions

I am trying to use COUNT(DISTINC column) OVER(PARTITION BY column) when I am using COUNT + window function(OVER).
I get an error like the one in the title and can't get it to work.
I have looked into how to deal with this error, but I have not found an example of how to deal with such a complex query as the one below.
I cannot find an example of how to deal with such a complex query as shown below, and I am not sure how to handle it.
The COUNT part of the problem exists on line 65.
How can such a complex query be resolved without slowing down?
WITH RECURSIVE "cte" AS((
SELECT
"videos_productvideocomment"."id",
"videos_productvideocomment"."user_id",
"videos_productvideocomment"."video_id",
"videos_productvideocomment"."parent_id",
"videos_productvideocomment"."text",
"videos_productvideocomment"."commented_at",
"videos_productvideocomment"."edited_at",
"videos_productvideocomment"."created_at",
"videos_productvideocomment"."updated_at",
"videos_productvideocomment"."id" AS "root_id"
FROM
"videos_productvideocomment"
WHERE
(
"videos_productvideocomment"."parent_id" IS NULL
AND "videos_productvideocomment"."video_id" = 'f264433c-c0af-49cc-8b40-84453da71b2d'
)
) UNION(
SELECT
"videos_productvideocomment"."id",
"videos_productvideocomment"."user_id",
"videos_productvideocomment"."video_id",
"videos_productvideocomment"."parent_id",
"videos_productvideocomment"."text",
"videos_productvideocomment"."commented_at",
"videos_productvideocomment"."edited_at",
"videos_productvideocomment"."created_at",
"videos_productvideocomment"."updated_at",
"cte"."root_id" AS "root_id"
FROM
"videos_productvideocomment"
INNER JOIN
"cte"
ON "videos_productvideocomment"."parent_id" = "cte"."id"
))
SELECT
*,
EXISTS(
SELECT
(1) AS "a"
FROM
"videos_productvideolikecomment" U0
WHERE
(
U0."comment_id" = t."id"
AND U0."user_id" = '3bd3bc86-0335-481e-9fd2-eb2fb1168f48'
)
LIMIT 1
) AS "liked"
FROM
(
SELECT DISTINCT
"cte"."id",
"cte"."created_at",
"cte"."updated_at",
"cte"."user_id",
"cte"."text",
"cte"."commented_at",
"cte"."edited_at",
"cte"."parent_id",
"cte"."video_id",
"cte"."root_id" AS "root_id",
COUNT(DISTINCT "cte"."root_id") OVER(PARTITION BY "cte"."root_id") AS "reply_count", <--- here
COUNT("videos_productvideolikecomment"."id") OVER(PARTITION BY "cte"."id") AS "liked_count"
FROM
"cte"
LEFT OUTER JOIN
"videos_productvideolikecomment"
ON (
"cte"."id" = "videos_productvideolikecomment"."comment_id"
)
) t
WHERE
t."id" = t."root_id"
ORDER BY
CASE
WHEN t."user_id" = '3bd3bc86-0335-481e-9fd2-eb2fb1168f48' THEN 0
ELSE 1
END ASC,
"liked_count" DESC
DISTINCT will look for duplicates and remove it, but in big data it will take a lot of time to process this query, you should process the middle of the record in the programming part I think it will be fast than. Thank

Find the difference between 1 column depending on date

When I run this:
SELECT NAME FROM T1
WHERE _LOAD_DATETIME::date = '2022-01-31'
I see 62 rows
but when I do
SELECT NAME FROM T1
WHERE _LOAD_DATETIME::date = '2022-02-01'
I see 59
I want to see what NAME's are missing when it ran for _LOAD_DATETIME::date = '2022-02-01'
I thought this would work but it doesn't:
SELECT NAME FROM table
WHERE _LOAD_DATETIME::date = '2022-02-01'
AND NOT EXISTS (
SELECT NAME FROM
table
WHERE _LOAD_DATETIME::date = '2022-01-31')
You have to use MINUS for your purposes:
SELECT NAME FROM T1
WHERE _LOAD_DATETIME::date = '2022-01-31'
MINUS
SELECT NAME FROM T1
WHERE _LOAD_DATETIME::date = '2022-02-01'
If we are talking about PostgreSQL, you have to use EXCEPT instead of MINUS.
There are two set operators MINUS or EXCEPT you can use (they are aliases for each other)
SELECT column1 FROM values (1),(2),(3),(4)
MINUS
SELECT column1 FROM values (2),(3),(4),(5);
gives 1 if you want to see 5 you need to flip the order of SELECTs.

Why this sql will cause type conversion error?

WITH tb_testl AS (
SELECT 1 AS id ,'hehe' AS value
UNION ALL
SELECT 1 AS id, '1' AS value
UNION ALL
SELECT 2 AS id, '2' AS value
UNION ALL
SELECT 2 AS id, '2' AS value
), tb_test2 AS (
SELECT CONVERT(INT , value) AS value FROM tb_testl WHERE id = 2
)
SELECT * FROM tb_test2 WHERE value = 2;
this sql will cause error
Conversion failed when converting the varchar value 'hehe' to data
type int.
but the table tb_test2 dosen't have the value 'hehe' which is in the anthor table tb_test1. And I found that this sql will work well if I don't append the statement WHERE value = 2; .I've tried ISNUMBERIC function but it didn't work.
version:mssql2008 R2
With respect to the why this occurs:
There is a Logical Processing Order, which describes the order in which clauses are evaluated. The order is:
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
You can also see the processing order when you SET SHOWPLAN_ALL ON. For this query, the processing is as follows:
Constant scan - this is the FROM clause, which consists of hard coded values, hence the constants.
Filter - this is the WHERE clause. While it looks like there are two where clauses (WHERE id = 2 and WHERE value = 2). SQL Server sees this differently, it considers a single WHERE clause: WHERE CONVERT(INT , value) = 2 AND id = 2.
Compute scaler. This is the CONVERT function in the select.
Because both WHERE clauses are executed simultaneously, the hehe value is not filtered out of the CONVERT scope.
Effectively, the query is simplified to something like:
SELECT CONVERT(INT, tb_testl.value) AS Cvalue
FROM (
SELECT 1 AS id
, 'hehe' AS value
UNION ALL
SELECT 1 AS id
, '1' AS value
UNION ALL
SELECT 2 AS id
, '2' AS value
UNION ALL
SELECT 2 AS id
, '2' AS value
) tb_testl
WHERE CONVERT(INT, tb_testl.value) = 2
AND tb_testl.id = 2
Which should clarify why the error occurs.
With SQL, you cannot read code in the same way as imperative languages like C. Lines of SQL code are not necessarily (mostly not at all, in fact) executed in the same order it is written in. In this case, it's an error to think the inner where is executed before the outer where.
SQL Server does not guarantee the order of processing of statements (with one exception below). That is, there is no guarantee that WHERE filtering happens before the SELECT. Or that one CTE is evaluated before another. This is considered an advantage because it allows SQL Server to rearrange the processing to optimize performance (although I consider the issue that you are seeing a bug).
Obviously, the problem is in this part of the code:
tb_test2 AS (
SELECT CONVERT(INT, value) AS value
FROM tb_testl
WHERE id = 2
)
(Well, actually, it is where tb_test2 is referenced.)
What is happening is that SQL Server pushes the CONVERT() to where the values are being read, so the conversion is attempted before the WHERE clause is processed. Hence, the error.
In SQL Server 2012+, you can easily solve this using TRY_CNVERT():
tb_test2 AS (
SELECT TRY_CONVERT(INT, value) AS value
FROM tb_testl
WHERE id = 2
)
However, that doesn't work in SQL Server 2008. You can use the fact that CASE does have some guarantees on the order of processing:
tb_test2 AS (
SELECT (CASE WHEN value NOT LIKE '%[^0-9]%' THEN CONVERT(INT, value)
END) AS value
FROM tb_testl
WHERE id = 2
)
error caused by this part of statement
), tb_test2 AS (
SELECT CONVERT(INT , value) AS value FROM tb_testl WHERE id = 2
value has type of varchar and 'hehe' value cannot be converted to integer
WITH tb_testl AS (
SELECT 1 AS id ,'hehe' AS value
UPDATE: sql try convert all value(s) to integer in you statement. to avoid error rewrite statement as
WITH tb_testl AS (
SELECT 1 AS id ,'hehe' AS value
UNION ALL SELECT 1 AS id, '1' AS value
UNION ALL SELECT 2 AS id, '2' AS value
UNION ALL SELECT 2 AS id, '2' AS value
), tb_test2 AS (
SELECT value AS value FROM tb_testl WHERE id = 2
),
tb_test3 AS (
SELECT cast(value as int) AS value FROM tb_test2
)
SELECT * FROM tb_test3

Oracle SQL re-use subquery "WITH" clause assistance

I have an SQL query in which I need to take the output of a subquery and use it more than once. My existing query works, but only if I repeat the subquery each time I need it. Unfortunately the subquery is complex, and takes time to execute - meaning that multiple iterations really slow the whole thing down.
I have read that you can use the "WITH" statement to assign a subquery output to a variable, in order to re-use that variable. However the problem I'm having is that within the subquery, I need to reference values from the main query. And it appears that if I use WITH - before the main query SELECT - then those references are not recognised. I'll give you a simplified example:
WITH
DateX AS
(
SELECT
MAX(TableSub.Date)
FROM
TableA TableSub
WHERE
TableSub.ID = TableMain.ID
AND TableSub.Event = 'AnotherEvent'
AND TableSub.Date BETWEEN '01-Jan-2015' AND '31-Dec-2015'
)
SELECT
TableMain.ID
FROM
TableA TableMain
WHERE
TableMain.Event = 'MainEvent'
AND TableMain.Date >= DateX
AND (
SELECT
TableSub2.ID
FROM
TableA TableSub2
WHERE
TableSub2.ID = TableMain.ID
TableSub2.Event = 'ThirdEvent'
AND TableSub2.Date <= DateX
) IS NULL
I hope this is clear. It's a simplified version of what I have, but you can see that DateX is used in more than one place: within the main query, and within a subquery. However the problem is that when DateX is defined by WITH, I need to link the ID back to the ID of the main query. And it's not working...
I would be grateful for any advice on this. Am I doing it wrong? Is there a way, or is it just impossible? If so, then should I be using another approach entirely? Thanks.
A better way:
SELECT ID
FROM (
SELECT ID,
"Date",
Event,
LAST_VALUE( CASE Event WHEN 'AnotherEvent' THEN "Date" END IGNORE NULLS )
OVER ( PARTITION BY ID ORDER BY "Date"
ROWS BETWEEN UNBOUNDED PRECEEDING AND UNBOUNDED FOLLOWING
) AS another_date,
FIRST_VALUE( CASE Event WHEN 'ThirdEvent' THEN "Date" END IGNORE NULLS )
OVER ( PARTITION BY ID ORDER BY "Date"
ROWS BETWEEN UNBOUNDED PRECEEDING AND UNBOUNDED FOLLOWING
) AS third_date
FROM TableA
WHERE Event IN ( 'MainEvent', 'ThirdEvent' )
OR ( Event = 'AnotherEvent' AND EXTRACT( YEAR FROM "Date" ) = 2015 )
)
WHERE Event = 'MainEvent'
AND "Date" >= another_date
AND ( third_date IS NULL OR third_date > another_date );
You need to join your DateX CTE on the ID column. Something like:
WITH
DateX AS
(
SELECT
TableSub.ID,
MAX(TableSub.Date) AS MaxDate
FROM
TableA TableSub
WHERE
AND TableSub.Event = 'AnotherEvent'
AND TableSub.Date >= DATE '2015-01-01'
AND TableSub.Date < DATE '2016-01-01'
GROUP BY
TableSub.ID
)
SELECT
TableMain.ID
FROM
TableA TableMain
JOIN
DateX
ON
DateX.ID = TableMain.ID
WHERE
TableMain.Event = 'MainEvent'
AND TableMain.Date >= DateX.MaxDate
AND (
SELECT
TableSub2.ID
FROM
TableA TableSub2
JOIN
DateX
ON
DateX.ID = TableSub2.ID
WHERE
TableSub2.ID = TableMain.ID
TableSub2.Event = 'ThirdEvent'
AND TableSub2.Date <= DateX.MaxDate
) IS NULL
The CTE also needs a column alias for the aggregate; and as you need to join in the ID, you need to include that and group by it.
The last subquery looks odd; you might want NOT EXISTS rather than IS NULL if you're looking for no record. Perhaps your real query is using an aggregate, but even so that might be quicker.
This still may not be the best approach but it's hard to tell from your example. Hitting the same table three times may be unnecessarily expensive.

SQL Server : connect by level for DATE type with union

I've found basic answer for replacing the Oracle's "CONNECT BY LEVEL" in this question but my case is little bit more complicated:
Basically things that I want to replace looks like this:
...
UNION ALL
Select
adate, 'ROAD' as TSERV_ID, 0 AS EQ_NBR
from
(SELECT
to_date(sysdate - 732,'dd/mm/yy') + rownum -1 as adate, rownum
FROM
(select rownum
from dual
connect by level <= 732)
WHERE rownum <= 732)
UNION ALL
Select
adate, 'PORTPACK' as TSERV_ID, 0 AS EQ_NBR
from
(SELECT
to_date(sysdate - 732,'dd/mm/yy') + rownum -1 as adate, rownum
FROM
(select rownum from dual connect by level <= 732)
WHERE rownum <= 732)
UNION ALL
....
Now, the single dual connect is easy, even if this is apparently not very efficient method
WITH CTE AS (
SELECT dateadd(day,-720,CONVERT (date, GETDATE())) as Datelist
UNION ALL
SELECT dateadd(day,1,Datelist)
FROM CTE
WHERE datelist < getdate() )
SELECT *,'ROAD' as Tserv_ID , 0 as EQ_NBR FROM CTE
option (maxrecursion 0)
repeating the union is hard because I get an error:
Incorrect syntax near the keyword 'with'. If this statement is a common table expression, an xmlnamespaces clause or a change tracking context clause, the previous statement must be terminated with a semicolon.
There are more parts of this union that I've provided here; I've tried to use the "WITH" only at start but no luck. Am I missing something obvious here?
EDIT: There is of course big question WHY I am even trying to do such thing: Personally, I wouldn't, but at the other end of the query there is a huge Crystal Report that runs once every month and which accepts data in this particular format. End of the FULL query's output is something like
3 columns of data
3 Columns of data
...
Currentdate-732,"ROAD",0
Currentdate -731,"ROAD",0
...
Currentdate, "ROAD,"0"
Currentdate -732, "PORTPAK", 0
Currentdate -731, "PORTPAK", 0
etc.
Are you trying to do:
WITH CTE1 AS (...),
CTE2 AS (...)
SELECT stuff FROM CTE1
UNION ALL
SELECT stuff FROM CTE2;
? This is a common challenge, I guess it is not very discoverable that in order to use more than one CTE, you just separate them with a comma.
That all said, it seems like you are just trying to generate a series of dates. A recursive CTE (never mind a series of many of them) is not the most efficient way to do this. Instead of telling us you want to replace CONNECT BY LEVEL and showing us the syntax you've tried, why don't you just show or describe the output you want? We've already got an appreciation that you've tried something on your own (thanks!) but we'd rather give you an efficient solution than bridging the gap to an inefficient one.
As an example, here is something that requires a lot less redundant code, and ( think gives you what you're after:
DECLARE #n INT = 722, #d DATE = CURRENT_TIMESTAMP;
;WITH v AS (SELECT v FROM (VALUES('ROAD'),('PORTPACK')) AS v(v)),
n AS (SELECT TOP (#n) n = ROW_NUMBER() OVER (ORDER BY number)
FROM master.dbo.spt_values ORDER BY n)
SELECT Datelist = DATEADD(DAY, 2-n.n, #d), Tserv_ID = v.v, EQ_NBR = 0
FROM n CROSS JOIN v
ORDER BY Tserv_ID, Datelist;