'LAG' function not working in Amazon Redshift - sql

I'm trying to find out the retention rate by using the following query on Amazon Redshift:
WITH t AS (
SELECT ga.ownerid,
DATE_PART('month',ga.creationtime) AS month,
COUNT(*) AS item_transactions,
LAG(DATE_PART('month',ga.creationtime)) OVER (PARTITION BY ownerid ORDER BY DATE_PART('month',ga.creationtime)) = DATE_PART('month',ga.creationtime) -interval '1 month' OR NULL AS repeat_transaction
FROM flx2.groupactivities ga
JOIN auth.members m ON ga.ownerid = m.id
WHERE ga.activitytype = 'assign'
AND ga.groupid NOT IN (SELECT groupid
FROM (SELECT groupid,
COUNT(DISTINCT memberid)
FROM flx2.grouphasmembers
GROUP BY groupid
HAVING COUNT(DISTINCT memberid) = 1))
AND ga.ownerid IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 5)
AND ga.ownerid NOT IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 25)
GROUP BY ga.ownerid,
DATE_TRUNC('month',ga.creationtime)
ORDER BY ga.ownerid,
DATE_TRUNC('month',ga.creationtime)
)
SELECT month
,sum(item_transactions) AS num_trans
,count(*) AS num_buyers
,count(repeat_transaction) AS repeat_buyers
,round(
CASE WHEN sum(item_transactions) > 0
THEN count(repeat_transaction) / sum(item_transactions) * 100
ELSE 0
END, 2) AS buyer_retention
FROM t
GROUP BY 1
ORDER BY 1;
but it gives me the following error:
An error occurred when executing the SQL command:
WITH t AS (
SELECT ga.ownerid,
DATE_TRUNC('month',ga.creationtime) AS month,
COUNT(*) AS item_transactions,
LAG(DATE_TRUNC('month...
[Amazon](500310) Invalid operation: ORDER/GROUP BY expression not found in targetlist;
Execution time: 0.29s
1 statement failed.
I believe there's something wrong with the LAG function here, but I'm not quite sure. I got this query from the post here and I modified it according to my requirements.
Would someone please be able to help me out with what's going wrong here?
I appreciate the help in advance.

A quick look, but lag by itself is not an aggregate function so repeat_transaction would need to be included in the group by.

Related

Compare the same table and fetch the satisfied results

I am trying to achieve the below requirement and need some help.
I created the below query,
SELECT * from
(
select b.extl_acct_nmbr, b.TRAN_DATE, b.tran_time,
case when (a.amount > b.amount) then b.amount
end as amount
,b.ivst_grup, b.grup_prod, b.pensionpymt
from ##pps a
join #pps b
on a.extl_acct_nmbr = b.extl_acct_nmbr
where a.pensionpymt <=2 and b.pensionpymt <=2) rslt
where rstl.amount is not null
Output I am getting,
Requirement is to get
The lowest amount row having same account number. (Completed and getting in the output)
In case both the amounts are same for same account (get the pensionpymt =1) (not sure how to get)
In case only one pensionpymt there add that too in the result set. (not sure how to get)
could you please help, expected output should be like this,
you can use window function:
select * from (
select * , row_number() over (partition by extl_acct_nmbr order by amount asc,pensionpymt) rn
from ##pps a
join #pps b
on a.extl_acct_nmbr = b.extl_acct_nmbr
) t
where rn = 1

Impala SQL Query

Error Message :
select list expression not produced by aggregation output (missing
from GROUP BY clause?): CASE WHEN (flag = 1) THEN date_add(lead_ctxdt,
-1) ELSE ctx_date END lot_endt
code :
select c.enrolid, c.ctx_date, c.ctx_regimen, c.lead_ctx, c.lead_ctxdt, min(c.ctx_date) as lot_stdt,
case when (flag = 1 ) then date_add(lead_ctxdt, -1)
else ctx_date
end as lot_endt
from
(
select p.*,
case when (ctx_regimen <> lead_ctx) then 1
else 0
end as flag
from
(
select a.*, lead(a.ctx_regimen, 1) over(partition by enrolid order by ctx_date) as lead_ctx,
lead(ctx_date, 1) over (partition by enrolid order by ctx_date) as lead_ctxdt
from
(
select enrolid, ctx_date, group_concat(distinct ctx_codes) as ctx_regimen
from lotinfo
where ctx_date between ctx_date and date_add(ctx_date, 5)
group by enrolid, ctx_date
) as a
) as p
) as c
group by c.enrolid, c.ctx_date, c.ctx_regimen, c.lead_ctx, c.lead_ctxdt
I want to get the lead_ctx date minus one as the date when the flag is 1
So i found the answer by executing a couple of times the minor changes. Let me tell you, that when you are trying to min or max alongside you have group_conact in the same query then in Impala this doesn't work. You have to write it in two queries per one more sub query and the min() of something in the outer query or vice versa.
Thank you #dnoeth for letting me understand I have the answer with me already.

Return calculated column values for SELECT DISTINCT query

I have the following table in SQlite:
_id|token|status |timestamp|mood|eta|name|calc_eta
__________________________________________________________________________ 168|iqmC.3aHMBGbl|ok|1516625084498|50|-4154|Sample Name|1516625533082
169|iqmC.3aHMBGbl|ok|1516625084498|50|-4214|Sample Name|1516625533108
170|iqmC.3aHMBGbl|ok|1516625084498|50|-4274|Sample Name|1516625533414
171|iqmC.3aHMBGbl|ok|1516625084498|50|-4334|Sample Name|1516625533160
172|iqmC.3aHMBGbl|ok|1516625084498|50|-4394|Sample Name|1516625533680
173|iqmC.3aHMBGbl|ok|1516625084498|50|-4420|Sample Name|1516625533068
174|iqmC.3aHMBGbl|ok|1516625084498|50|-4428|Sample Name|1516625533482
175|iqmC.3aHMBGbl|ok|1516625084498|50|-4483|Sample Name|1516625533155
176|iqmC.3aHMBGbl|ok|1516625084498|50|-4543|Sample Name|1516625533148
177|TFbintkHMBw4H|ok|1516630122485|50|2526|Sample Name|1516632672019
178|TFbintkHMBw4H|ok|1516630122485|50|2520|Sample Name|1516632671903
179|TFbintkHMBw4H|ok|1516630122485|50|2460|Sample Name|1516632672321
180|TFbintkHMBw4H|ok|1516630122485|50|2344|Sample Name|1516632672859
181|TFbintkHMBw4H|ok|1516630122485|50|2336|Sample Name|1516632671939
182|TFbintkHMBw4H|ok|1516630122485|50|2281|Sample Name|1516632672802
183|TFbintkHMBw4H|ok|1516630122485|50|2220|Sample Name|1516632671828
184|TFbintkHMBw4H|ok|1516630122485|50|2161|Sample Name|1516632672625
I'm trying to come up with a query on it that would give me the difference between the two newest(based on auto-increment _id), calc_eta values for each distinct token value.
So in this case the result should be:
iqmC.3aHMBGbl|-7
TFbintkHMBw4H|797
I got this far with the SQL but it is not providing the calculated value for each distinct token currently and I'm not sure how to go further.
SELECT DISTINCT token,
(SELECT calc_eta
FROM DATA s
WHERE
(SELECT count(*)
FROM DATA f
WHERE f.token = s.token
AND f._id >= s._id) <= 1) -
(SELECT calc_eta
FROM
(SELECT calc_eta,
MIN(_id)
FROM DATA s
WHERE
(SELECT count(*)
FROM DATA f
WHERE f.token = s.token
AND f._id >= s._id) <= 2)) AS delay
FROM DATA;
In most SQL dialects, you would use window functions such as lag():
select d.*,
(calc_eta - prev_calc_eta) as diff
from (select d.*,
lag(calc_eta) over (partition by token order by _id) as prev_calc_eta,
row_number() over (partition by token order by _id desc) as seqnum
from data d
) d
where seqnum = 1;

Group By & Having vs. SubQuery (Where Count is Greater Than 1)

I'm struggling here trying to write a script that finds where an order was returned multiple times by the same associate (count greater than 1). I'm guessing my syntax with the subquery is incorrect. When I run the script, I get a message back that the "SELECT failed.. [3669] More than one value was returned by the subquery."
I'm not tied to the subquery, and have tried using just the group by and having statements, but I get an error regarding a non-aggregate value. What's the best way to proceed here and how do I fix this?
Thank you in advance - code below:
SEL s.saletran
, s.saletran_dt SALE_DATE
, r.saletran_id RET_TRAN
, r.saletran_dt RET_DATE
, ra.user_id RET_ASSOC
FROM salestrans s
JOIN salestrans_refund r
ON r.orig_saletran_id = s.saletran_id
AND r.orig_saletran_dt = s.saletran_dt
AND r.orig_loc_id = s.loc_id
AND r.saletran_dt between s.saletran_dt and s.saletran_dt + 30
JOIN saletran rt
ON rt.saletran_id = r.saletran_id
AND rt.saletran_dt = r.saletran_dt
AND rt.loc_id = r.loc_id
JOIN assoc ra --Return Associate
ON ra.assoc_prty_id = rt.sls_assoc_prty_id
WHERE
(SELECT count(*)
FROM saletran_refund
GROUP BY ORIG_SLTRN_ID
) > 1
AND s.saletran_dt between '2015-01-01' and current_date - 1
Based on what you've got so far, I think you want to use this instead:
where r.ORIG_SLTRN_ID in
(select
ORIG_SLTRN_ID
from
saletran_refund
group by ORIG_SLTRN_ID
having count (*) > 1)
That will give you the ORIG_SLTRN_IDs that have more than one row.
you don't give enough for a full answer but this is a start
group by s.saletran
, s.saletran_dt SALE_DATE
, r.saletran_id RET_TRAN
, r.saletran_dt RET_DATE
, ra.user_id RET_ASSOC
having count(distinct(ORIG_SLTRN_ID)) > 0
this does return more the an one row
run it
SELECT count(*)
FROM saletran_refund
GROUP BY ORIG_SLTRN_ID

What is the mathematical way of calculating PERCENT_RANK() ,CUME_DIST() ,PERCENTILE_CONT() and PERCENTILE_DISC()

I am trying to figure out how these newly introduced function in SQL Server Denali CTP 3 is working but didnot understood properly.
There are ofcourse some articles on this but it has not been clearly mentioned as how they are working ... in other words the maths runing behind the scene.
Could anyone please explain that with some simple example.
I found one here but when I tried to put the logic of this author in the first link for getting the Percent_Rank and Cume_Dist of 5th item I am getting a different result.
The columns in this query will be equal:
SELECT value,
PERCENT_RANK() OVER (ORDER BY value),
(
SELECT COUNT(CASE WHEN qo.value < q.value THEN 1 END) / (COUNT(*) - 1)
FROM mytable qo
) AS percent_rank_formula,
CUME_DIST() OVER (ORDER BY value),
(
SELECT COUNT(CASE WHEN qo.value <= q.value THEN 1 END) / COUNT(*)
FROM mytable qo
) AS cume_dist_formula
FROM mytable q