Impala SQL Query - sql

Error Message :
select list expression not produced by aggregation output (missing
from GROUP BY clause?): CASE WHEN (flag = 1) THEN date_add(lead_ctxdt,
-1) ELSE ctx_date END lot_endt
code :
select c.enrolid, c.ctx_date, c.ctx_regimen, c.lead_ctx, c.lead_ctxdt, min(c.ctx_date) as lot_stdt,
case when (flag = 1 ) then date_add(lead_ctxdt, -1)
else ctx_date
end as lot_endt
from
(
select p.*,
case when (ctx_regimen <> lead_ctx) then 1
else 0
end as flag
from
(
select a.*, lead(a.ctx_regimen, 1) over(partition by enrolid order by ctx_date) as lead_ctx,
lead(ctx_date, 1) over (partition by enrolid order by ctx_date) as lead_ctxdt
from
(
select enrolid, ctx_date, group_concat(distinct ctx_codes) as ctx_regimen
from lotinfo
where ctx_date between ctx_date and date_add(ctx_date, 5)
group by enrolid, ctx_date
) as a
) as p
) as c
group by c.enrolid, c.ctx_date, c.ctx_regimen, c.lead_ctx, c.lead_ctxdt
I want to get the lead_ctx date minus one as the date when the flag is 1

So i found the answer by executing a couple of times the minor changes. Let me tell you, that when you are trying to min or max alongside you have group_conact in the same query then in Impala this doesn't work. You have to write it in two queries per one more sub query and the min() of something in the outer query or vice versa.
Thank you #dnoeth for letting me understand I have the answer with me already.

Related

postgres: COUNT, DISTINCT is not implemented for window functions

I am trying to use COUNT(DISTINC column) OVER(PARTITION BY column) when I am using COUNT + window function(OVER).
I get an error like the one in the title and can't get it to work.
I have looked into how to deal with this error, but I have not found an example of how to deal with such a complex query as the one below.
I cannot find an example of how to deal with such a complex query as shown below, and I am not sure how to handle it.
The COUNT part of the problem exists on line 65.
How can such a complex query be resolved without slowing down?
WITH RECURSIVE "cte" AS((
SELECT
"videos_productvideocomment"."id",
"videos_productvideocomment"."user_id",
"videos_productvideocomment"."video_id",
"videos_productvideocomment"."parent_id",
"videos_productvideocomment"."text",
"videos_productvideocomment"."commented_at",
"videos_productvideocomment"."edited_at",
"videos_productvideocomment"."created_at",
"videos_productvideocomment"."updated_at",
"videos_productvideocomment"."id" AS "root_id"
FROM
"videos_productvideocomment"
WHERE
(
"videos_productvideocomment"."parent_id" IS NULL
AND "videos_productvideocomment"."video_id" = 'f264433c-c0af-49cc-8b40-84453da71b2d'
)
) UNION(
SELECT
"videos_productvideocomment"."id",
"videos_productvideocomment"."user_id",
"videos_productvideocomment"."video_id",
"videos_productvideocomment"."parent_id",
"videos_productvideocomment"."text",
"videos_productvideocomment"."commented_at",
"videos_productvideocomment"."edited_at",
"videos_productvideocomment"."created_at",
"videos_productvideocomment"."updated_at",
"cte"."root_id" AS "root_id"
FROM
"videos_productvideocomment"
INNER JOIN
"cte"
ON "videos_productvideocomment"."parent_id" = "cte"."id"
))
SELECT
*,
EXISTS(
SELECT
(1) AS "a"
FROM
"videos_productvideolikecomment" U0
WHERE
(
U0."comment_id" = t."id"
AND U0."user_id" = '3bd3bc86-0335-481e-9fd2-eb2fb1168f48'
)
LIMIT 1
) AS "liked"
FROM
(
SELECT DISTINCT
"cte"."id",
"cte"."created_at",
"cte"."updated_at",
"cte"."user_id",
"cte"."text",
"cte"."commented_at",
"cte"."edited_at",
"cte"."parent_id",
"cte"."video_id",
"cte"."root_id" AS "root_id",
COUNT(DISTINCT "cte"."root_id") OVER(PARTITION BY "cte"."root_id") AS "reply_count", <--- here
COUNT("videos_productvideolikecomment"."id") OVER(PARTITION BY "cte"."id") AS "liked_count"
FROM
"cte"
LEFT OUTER JOIN
"videos_productvideolikecomment"
ON (
"cte"."id" = "videos_productvideolikecomment"."comment_id"
)
) t
WHERE
t."id" = t."root_id"
ORDER BY
CASE
WHEN t."user_id" = '3bd3bc86-0335-481e-9fd2-eb2fb1168f48' THEN 0
ELSE 1
END ASC,
"liked_count" DESC
DISTINCT will look for duplicates and remove it, but in big data it will take a lot of time to process this query, you should process the middle of the record in the programming part I think it will be fast than. Thank

Compare the same table and fetch the satisfied results

I am trying to achieve the below requirement and need some help.
I created the below query,
SELECT * from
(
select b.extl_acct_nmbr, b.TRAN_DATE, b.tran_time,
case when (a.amount > b.amount) then b.amount
end as amount
,b.ivst_grup, b.grup_prod, b.pensionpymt
from ##pps a
join #pps b
on a.extl_acct_nmbr = b.extl_acct_nmbr
where a.pensionpymt <=2 and b.pensionpymt <=2) rslt
where rstl.amount is not null
Output I am getting,
Requirement is to get
The lowest amount row having same account number. (Completed and getting in the output)
In case both the amounts are same for same account (get the pensionpymt =1) (not sure how to get)
In case only one pensionpymt there add that too in the result set. (not sure how to get)
could you please help, expected output should be like this,
you can use window function:
select * from (
select * , row_number() over (partition by extl_acct_nmbr order by amount asc,pensionpymt) rn
from ##pps a
join #pps b
on a.extl_acct_nmbr = b.extl_acct_nmbr
) t
where rn = 1

SQL count row where progres not done

Need some help, dont know the keyword of this problem to search online
I want to count some progres that isn't done
when progres has step3 then its not counted
desired result from that example is 2, im trying to do it alone, and it doesnt work
help is needed, Thanks Ahead
One method uses count(distinct) and filters in the where clause:
select count(distinct progres)
from t
where not exists (select 1 from t t2 where t2.progres = t.progres and t2.step = 'step3');
Another fun way uses a difference:
select count(distinct progres) - count(distinct case when step = 'step3' then progres end)
from t;
If 'step3' can appear at most once per progres, the above can be simplified to:
select count(distinct progres) - sum(step = 'step3')
from t;
Or using set operations:
select count(*)
from ((select progres from t)
except -- removes duplicates
(select progres from t where step = 'step3')
) t;
You can do:
select
count(distinct progress)
-
(select count(*) from t where step = 'step3')
from t

troubles with next and previous query

I have a list and the returned table looks like this. I took the preview of only one car but there are many more.
What I need to do now is check that the current KM value is larger then the previous and smaller then the next. If this is not the case I need to make a field called Trustworthy and should fill it with either 1 or 0 (true/ false).
The result that I have so far is this:
validKMstand and validkmstand2 are how I calculate it. It did not work in one list so that is why I separated it.
In both of my tries my code does not work.
Here is the code that I have so far.
FullList as (
SELECT
*
FROM
eMK_Mileage as Mileage
)
, ValidChecked1 as (
SELECT
UL1.*,
CASE WHEN EXISTS(
SELECT TOP(1)UL2.*
FROM FullList AS UL2
WHERE
UL2.FK_CarID = UL1.FK_CarID AND
UL1.KM_Date > UL2.KM_Date AND
UL1.KM > UL2.KM
ORDER BY UL2.KM_Date DESC
)
THEN 1
ELSE 0
END AS validkmstand
FROM FullList as UL1
)
, ValidChecked2 as (
SELECT
List1.*,
(CASE WHEN List1.KM > ulprev.KM
THEN 1
ELSE 0
END
) AS validkmstand2
FROM ValidChecked1 as List1 outer apply
(SELECT TOP(1)UL3.*
FROM ValidChecked1 AS UL3
WHERE
UL3.FK_CarID = List1.FK_CarID AND
UL3.KM_Date <= List1.KM_Date AND
List1.KM > UL3.KM
ORDER BY UL3.KM_Date DESC) ulprev
)
SELECT * FROM ValidChecked2 order by FK_CarID, KM_Date
Maybe something like this is what you are looking for?
;with data as
(
select *, rn = row_number() over (partition by fk_carid order by km_date)
from eMK_Mileage
)
select
d.FK_CarID, d.KM, d.KM_Date,
valid =
case
when (d.KM > d_prev.KM /* or d_prev.KM is null */)
and (d.KM < d_next.KM /* or d_next.KM is null */)
then 1 else 0
end
from data d
left join data d_prev on d.FK_CarID = d_prev.FK_CarID and d_prev.rn = d.rn - 1
left join data d_next on d.FK_CarID = d_next.FK_CarID and d_next.rn = d.rn + 1
order by d.FK_CarID, d.KM_Date
With SQL Server versions 2012+ you could have used the lag() and lead() analytical functions to access the previous/next rows, but in versions before you can accomplish the same thing by numbering rows within partitions of the set. There are other ways too, like using correlated subqueries.
I left a couple of conditions commented out that deal with the first and last rows for every car - maybe those should be considered valid is they fulfill only one part of the comparison (since the previous/next rows are null)?

SELECTing only one copy of a row with a specific key that is coming from multiple tables

I am new to SQL so bear with me. I am returning data from multiple tables. Followed is my SQL (let me know if there is a better approach):
SELECT [NonScrumStory].[IncidentNumber], [NonScrumStory].[Description], [DailyTaskHours].[ActivityDate], [Application].[AppName], [SupportCatagory].[Catagory], [DailyTaskHours].[PK_DailyTaskHours],n [NonScrumStory].[PK_NonScrumStory]
FROM [NonScrumStory], [DailyTaskHours], [Application], [SupportCatagory]
WHERE ([NonScrumStory].[UserId] = 26)
AND ([NonScrumStory].[PK_NonScrumStory] = [DailyTaskHours].[NonScrumStoryId])
AND ([NonScrumStory].[CatagoryId] = [SupportCatagory].[PK_SupportCatagory])
AND ([NonScrumStory].[ApplicationId] = [Application].[PK_Application])
AND ([NonScrumStory].[Deleted] != 1)
AND [DailyTaskHours].[ActivityDate] >= '1/1/1990'
ORDER BY [DailyTaskHours].[ActivityDate] DESC
This is what is being returned:
This is nearly correct. I only want it to return one copy of PK_NonScrumStory though and I can't figure out how. Essentially, I only want it to return one copy so one of the top two rows would not be returned.
You could group by the NonScrumStore columns, and then aggregate the other columns like this:
SELECT [NonScrumStory].[IncidentNumber],
[NonScrumStory].[Description],
MAX( [DailyTaskHours].[ActivityDate]),
MAX( [Application].[AppName]),
MAX([SupportCatagory].[Catagory]),
MAX([DailyTaskHours].[PK_DailyTaskHours]),
[NonScrumStory].[PK_NonScrumStory]
FROM [NonScrumStory],
[DailyTaskHours],
[Application],
[SupportCatagory]
WHERE ([NonScrumStory].[UserId] = 26)
AND ([NonScrumStory].[PK_NonScrumStory] = [DailyTaskHours].[NonScrumStoryId])
AND ([NonScrumStory].[CatagoryId] = [SupportCatagory].[PK_SupportCatagory])
AND ([NonScrumStory].[ApplicationId] = [Application].[PK_Application])
AND ([NonScrumStory].[Deleted] != 1)
AND [DailyTaskHours].[ActivityDate] >= '1/1/1990'
group by [NonScrumStory].[IncidentNumber], [NonScrumStory].[Description],[NonScrumStory].[PK_NonScrumStory]
ORDER BY 3 DESC
From the screenshot it seems DISTINCT should have solved your issue but if not you could use the ROW_NUMBER function.
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY [NonScrumStory].[PK_NonScrumStory] ORDER BY [DailyTaskHours].[ActivityDate] DESC) AS RowNum,
[NonScrumStory].[IncidentNumber], [NonScrumStory].[Description], [DailyTaskHours].[ActivityDate], [Application].[AppName], [SupportCatagory].[Catagory], [DailyTaskHours].[PK_DailyTaskHours],n [NonScrumStory].[PK_NonScrumStory]
FROM [NonScrumStory], [DailyTaskHours], [Application], [SupportCatagory]
WHERE ([NonScrumStory].[UserId] = 26)
AND ([NonScrumStory].[PK_NonScrumStory] = [DailyTaskHours].[NonScrumStoryId])
AND ([NonScrumStory].[CatagoryId] = [SupportCatagory].[PK_SupportCatagory])
AND ([NonScrumStory].[ApplicationId] = [Application].[PK_Application])
AND ([NonScrumStory].[Deleted] != 1)
AND [DailyTaskHours].[ActivityDate] >= '1/1/1990'
)
SELECT * FROM CTE WHERE RowNum = 1 ORDER BY [ActivityDate] DESC
I believe if you add DISTINCT to your query that should solve your problem. Like so
SELECT DISTINCT [NonScrumStory].[IncidentNumber], [NonScrumStory].[Description],...