Redshift: Layered correlated subquery pattern not supported

Redshift: Layered correlated subquery pattern not supported - sql

I have a manifest table that has the latest processed timestamp of account/version combinations. I want to filter a raw events table to give me only the newest unprocessed timestamps based on the account/version combinations.
-- ERROR: This type of correlated subquery pattern is not supported due to
-- internal error
FROM events e
WHERE
CASE WHEN (e.account_id, e.app_version, e.app_build)
IN (SELECT DISTINCT account_id, app_version, app_build FROM manifest)
THEN
tstamp > (SELECT last_processed_tstamp FROM manifest m
WHERE m.account_id = e.account_id
AND m.app_version = e.app_version
AND m.app_build = e.app_build)
ELSE
1=1
END
Oddly too, if I only check one column in the CASE-WHEN, it works
-- Somehow this works
FROM events e
WHERE
CASE WHEN e.account_id IN (SELECT DISTINCT account_id FROM manifest)
THEN
tstamp > (SELECT last_processed_tstamp FROM manifest m
WHERE m.account_id = e.account_id
AND m.app_version = e.app_version
AND m.app_build = e.app_build)
ELSE
1=1
END
Unfortunately though this is the wrong logic since it isn't filtering by the correct account/version combination. Would appreciate any help. Thanks.

You could use an OR.
CASE WHEN
(e.account_id IN (SELECT DISTINCT account_id, app_version, app_build FROM
manifest)
OR( e.app_version IN (SELECT DISTINCT account_id, app_version, app_build
FROM manifest)
OR (e.app_build IN (SELECT DISTINCT account_id, app_version, app_build FROM
manifest))
THEN ....
I'd break out the sub select to make sure you're only running it once.

Related

postgres: COUNT, DISTINCT is not implemented for window functions

I am trying to use COUNT(DISTINC column) OVER(PARTITION BY column) when I am using COUNT + window function(OVER).
I get an error like the one in the title and can't get it to work.
I have looked into how to deal with this error, but I have not found an example of how to deal with such a complex query as the one below.
I cannot find an example of how to deal with such a complex query as shown below, and I am not sure how to handle it.
The COUNT part of the problem exists on line 65.
How can such a complex query be resolved without slowing down?
WITH RECURSIVE "cte" AS((
SELECT
"videos_productvideocomment"."id",
"videos_productvideocomment"."user_id",
"videos_productvideocomment"."video_id",
"videos_productvideocomment"."parent_id",
"videos_productvideocomment"."text",
"videos_productvideocomment"."commented_at",
"videos_productvideocomment"."edited_at",
"videos_productvideocomment"."created_at",
"videos_productvideocomment"."updated_at",
"videos_productvideocomment"."id" AS "root_id"
FROM
"videos_productvideocomment"
WHERE
(
"videos_productvideocomment"."parent_id" IS NULL
AND "videos_productvideocomment"."video_id" = 'f264433c-c0af-49cc-8b40-84453da71b2d'
)
) UNION(
SELECT
"videos_productvideocomment"."id",
"videos_productvideocomment"."user_id",
"videos_productvideocomment"."video_id",
"videos_productvideocomment"."parent_id",
"videos_productvideocomment"."text",
"videos_productvideocomment"."commented_at",
"videos_productvideocomment"."edited_at",
"videos_productvideocomment"."created_at",
"videos_productvideocomment"."updated_at",
"cte"."root_id" AS "root_id"
FROM
"videos_productvideocomment"
INNER JOIN
"cte"
ON "videos_productvideocomment"."parent_id" = "cte"."id"
))
SELECT
*,
EXISTS(
SELECT
(1) AS "a"
FROM
"videos_productvideolikecomment" U0
WHERE
(
U0."comment_id" = t."id"
AND U0."user_id" = '3bd3bc86-0335-481e-9fd2-eb2fb1168f48'
)
LIMIT 1
) AS "liked"
FROM
(
SELECT DISTINCT
"cte"."id",
"cte"."created_at",
"cte"."updated_at",
"cte"."user_id",
"cte"."text",
"cte"."commented_at",
"cte"."edited_at",
"cte"."parent_id",
"cte"."video_id",
"cte"."root_id" AS "root_id",
COUNT(DISTINCT "cte"."root_id") OVER(PARTITION BY "cte"."root_id") AS "reply_count", <--- here
COUNT("videos_productvideolikecomment"."id") OVER(PARTITION BY "cte"."id") AS "liked_count"
FROM
"cte"
LEFT OUTER JOIN
"videos_productvideolikecomment"
ON (
"cte"."id" = "videos_productvideolikecomment"."comment_id"
)
) t
WHERE
t."id" = t."root_id"
ORDER BY
CASE
WHEN t."user_id" = '3bd3bc86-0335-481e-9fd2-eb2fb1168f48' THEN 0
ELSE 1
END ASC,
"liked_count" DESC

DISTINCT will look for duplicates and remove it, but in big data it will take a lot of time to process this query, you should process the middle of the record in the programming part I think it will be fast than. Thank

Compare the same table and fetch the satisfied results

I am trying to achieve the below requirement and need some help.
I created the below query,
SELECT * from
(
select b.extl_acct_nmbr, b.TRAN_DATE, b.tran_time,
case when (a.amount > b.amount) then b.amount
end as amount
,b.ivst_grup, b.grup_prod, b.pensionpymt
from ##pps a
join #pps b
on a.extl_acct_nmbr = b.extl_acct_nmbr
where a.pensionpymt <=2 and b.pensionpymt <=2) rslt
where rstl.amount is not null
Output I am getting,
Requirement is to get
The lowest amount row having same account number. (Completed and getting in the output)
In case both the amounts are same for same account (get the pensionpymt =1) (not sure how to get)
In case only one pensionpymt there add that too in the result set. (not sure how to get)
could you please help, expected output should be like this,

you can use window function:
select * from (
select * , row_number() over (partition by extl_acct_nmbr order by amount asc,pensionpymt) rn
from ##pps a
join #pps b
on a.extl_acct_nmbr = b.extl_acct_nmbr
) t
where rn = 1

SQL count row where progres not done

Need some help, dont know the keyword of this problem to search online
I want to count some progres that isn't done
when progres has step3 then its not counted
desired result from that example is 2, im trying to do it alone, and it doesnt work
help is needed, Thanks Ahead

One method uses count(distinct) and filters in the where clause:
select count(distinct progres)
from t
where not exists (select 1 from t t2 where t2.progres = t.progres and t2.step = 'step3');
Another fun way uses a difference:
select count(distinct progres) - count(distinct case when step = 'step3' then progres end)
from t;
If 'step3' can appear at most once per progres, the above can be simplified to:
select count(distinct progres) - sum(step = 'step3')
from t;
Or using set operations:
select count(*)
from ((select progres from t)
except -- removes duplicates
(select progres from t where step = 'step3')
) t;

You can do:
select
count(distinct progress)
-
(select count(*) from t where step = 'step3')
from t

Looking for duplicates based on a few other columns

I am trying to find the rows where PilotID has used the shimpmentNumber more than once.
I have this so far.
select f_Shipment_ID
,f_date
,f_Pilot_ID
,f_Shipname
,f_SailedFrom
,f_SailedTo
,f_d_m
,f_Shipmentnumber
,f_NumberOfPilots
from t_shipment
where f_Pilot_ID < 10000
and f_NumberOfPilots=1
and f_Shipmentnumber in(select f_Shipmentnumber
from t_shipment
group by f_Shipmentnumber
Having count(*) >1)

Try something like this:
-- The CTE determines the f_Pilot_ID/f_Shipmentnumber combinations that appear more than once.
with DuplicateShipmentNumberCTE as
(
select
f_Pilot_ID,
f_Shipmentnumber
from
t_shipment
where
f_Pilot_ID < 10000 and
f_NumberOfPilots = 1
group by
f_Pilot_ID,
f_Shipmentnumber
having
count(1) > 1
)
select
Shipment.f_Shipment_ID,
Shipment.f_date,
Shipment.f_Pilot_ID,
Shipment.f_Shipname,
Shipment.f_SailedFrom,
Shipment.f_SailedTo,
Shipment.f_d_m,
Shipment.f_Shipmentnumber,
Shipment.f_NumberOfPilots
from
-- The join is used to restrict the result set to the shipments identified by the CTE.
t_shipment Shipment
inner join DuplicateShipmentNumberCTE CTE on
Shipment.f_Pilot_ID = CTE.f_Pilot_ID and
Shipment.f_Shipmentnumber = CTE.f_Shipmentnumber
where
f_NumberOfPilots = 1;
You can also do this with a subquery if you want to—or if you're using an old version of SQL Server that doesn't support CTEs—but I find the CTE syntax to be more natural, if only because it enables you to read and understand the query from the top down, rather than from the inside out.

In your sub select use:
select f_Shipmentnumber
from t_shipment
group by f_pilot_id, f_Shipmentnumber
Having count(*) >1

How about this
select f_Shipment_ID
,f_date
,f_Pilot_ID
,f_Shipname
,f_SailedFrom
,f_SailedTo
,f_d_m
,f_Shipmentnumber
,f_NumberOfPilots
from t_shipment
where f_Pilot_ID < 10000
and f_NumberOfPilots=1
and f_Pilot_ID IN (select f_Pilot_ID
from t_shipment
group by f_Pilot_ID, f_Shipmentnumber
Having count(*) >1)

Fastest way to check if the the most recent result for a patient has a certain value

Mssql < 2005
I have a complex database with lots of tables, but for now only the patient table and the measurements table matter.
What I need is the number of patient where the most recent value of 'code' matches a certain value. Also, datemeasurement has to be after '2012-04-01'. I have fixed this in two different ways:
SELECT
COUNT(P.patid)
FROM T_Patients P
WHERE P.patid IN (SELECT patid
FROM T_Measurements M WHERE (M.code ='xxxx' AND result= 'xx')
AND datemeasurement =
(SELECT MAX(datemeasurement) FROM T_Measurements
WHERE datemeasurement > '2012-01-04' AND patid = M.patid
GROUP BY patid
GROUP by patid)
AND:
SELECT
COUNT(P.patid)
FROM T_Patient P
WHERE 1 = (SELECT TOP 1 case when result = 'xx' then 1 else 0 end
FROM T_Measurements M
WHERE (M.code ='xxxx') AND datemeasurement > '2012-01-04' AND patid = P.patid
ORDER by datemeasurement DESC
)
This works just fine, but it makes the query incredibly slow because it has to join the outer table on the subquery (if you know what I mean). The query takes 10 seconds without the most recent check, and 3 minutes with the most recent check.
I'm pretty sure this can be done a lot more efficient, so please enlighten me if you will :).
I tried implementing HAVING datemeasurment=MAX(datemeasurement) but that keeps throwing errors at me.

So my approach would be to write a query just getting all the last patient results since 01-04-2012, and then filtering that for your codes and results. So something like
select
count(1)
from
T_Measurements M
inner join (
SELECT PATID, MAX(datemeasurement) as lastMeasuredDate from
T_Measurements M
where datemeasurement > '01-04-2012'
group by patID
) lastMeasurements
on lastMeasurements.lastmeasuredDate = M.datemeasurement
and lastMeasurements.PatID = M.PatID
where
M.Code = 'Xxxx' and M.result = 'XX'

The fastest way may be to use row_number():
SELECT COUNT(m.patid)
from (select m.*,
ROW_NUMBER() over (partition by patid order by datemeasurement desc) as seqnum
FROM T_Measurements m
where datemeasurement > '2012-01-04'
) m
where seqnum = 1 and code = 'XXX' and result = 'xx'
Row_number() enumerates the records for each patient, so the most recent gets a value of 1. The result is just a selection.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Redshift: Layered correlated subquery pattern not supported - sql

Related

postgres: COUNT, DISTINCT is not implemented for window functions

Compare the same table and fetch the satisfied results

SQL count row where progres not done

Looking for duplicates based on a few other columns

Fastest way to check if the the most recent result for a patient has a certain value

Categories

Resources