PostgresSQL/SQL Query - sql

I want to get the activity_id of the row of every first "email" activity that happened in between the "completed_order" activity in a column "first_in_between"
I wrote this query
SELECT activity_id, customer , activity, ts,
case
when
activity = 'completed_order' and lead(activity) over (partition by customer order by ts) ='email'
then
lead(activity_id) over (partition by customer order by ts)
end as First_in_between
from activity_stream where customer = 'Lehmanns Marktstand'
order by ts
With the above Query, I am getting this result.
My Desired Results should be

You can readily get the timestamp of the email using:
select activity_id, customer , activity, ts,
(case when activity = 'completed_order' and
(min(ts) filter (where activity = 'email') over (partition by customer order by ts desc) <
min(ts) filter (where activity = 'completed_order') over (partition by customer order by ts desc)
)
then min(ts) filter (where activity = 'email') over (partition by customer order by ts desc)
end) as First_in_between
from activity_stream
where customer = 'Lehmanns Marktstand'
order by ts;
You can then join back to the table or use another level of window functions to get the corresponding activity_id for the timestamp.
Actually, I think I prefer another method, which is just to count the number of completed orders and then take the minimum ts:
select a.*,
min(ts) filter (where activity = 'email') over (partition by grp) as email_ts
from (select a.*,
count(*) filter (where activity = 'completed_order') over (partition by customer order by ts) as grp
from activity_stream a
where customer = 'Lehmanns Marktstand'
) a;
This should also allow you to use a twist to get the activity id without an additional subquery:
select a.*,
(array_agg(activity_id order by ts) filter (where activity = 'email') over (partition by grp))[1] as email_activity_id
from (select a.*,
count(*) filter (where activity = 'completed_order') over (partition by grp order by ts) as grp
from activity_stream a
where customer = 'Lehmanns Marktstand'
) a

Related

What is the difference in syntax between the following queries?

I have an huge table of policies and I need to find all policies with invalid movements. For example, if inforce - premium_paid to terminated - premium_paid is invalid, then I would need to find all policies with this movement.
My query was initially as follows:
SELECT *,
LEAD(STAT) OVER (PARTITION BY ID, ORDER BY PROCDT, PROCTIME) AS NEXT_STAT,
LEAD(EVENT) OVER (PARTITION BY ID, ORDER BY PROCDT, PROCTIME) AS NEXT_EVENT
FROM TABLE
WHERE STAT = 'inforce',
EVENT = 'premium_paid',
NEXT_STAT = 'terminated',
NEXT_EVENT = 'premium_paid'
ORDER BY STAT, EVENT, NEXT_STAT, NEXT_EVENT
However, when I ran it, the compiler said that my column names 'NEXT_POLSTAT' and 'NEXT_EVENT' were invalid. Then, when I tweaked it to the following, it worked:
SELECT *
FROM (
SELECT *,
LEAD(STAT) OVER (PARTITION BY ID, ORDER BY PROCDT, PROCTIME) AS NEXT_STAT,
LEAD(EVENT) OVER (PARTITION BY ID, ORDER BY PROCDT, PROCTIME) AS NEXT_EVENT
FROM TABLE) AS a
WHERE a.STAT = 'inforce',
a.EVENT = 'premium_paid',
a.NEXT_STAT = 'terminated',
a.NEXT_EVENT = 'premium_paid'
ORDER BY a.STAT, a.EVENT, a.NEXT_STAT, a.NEXT_EVENT
Thus, I am just curious to know why my initial query did not work.

Convert CTE Query into normal Query

I want to convert my #PostgreSQL, CTE Query, into Normal Query because the cte function is mainly used in data warehouse SQL and not efficient for Postgres production DBS.
So, need help in converting this CTE query into a normal Query
WITH
cohort AS (
SELECT
*
FROM (
select
activity_id,
ts,
customer,
activity,
case
when activity = 'completed_order' and lag(activity) over (partition by customer order by ts) != 'email'
then null
when activity = 'email' and lag(activity) over (partition by customer order by ts) !='email'
then 1
else 0
end as cndn
from activity_stream where customer in (select customer from activity_stream where activity='email')
order by ts
) AS s
)
(
select
*
from cohort as s
where cndn = 1 OR cndn is null order by ts)
You may just inline the CTE into your outer query:
select *
from
(
select activity_id, ts, customer, activity,
case when activity = 'completed_order' and lag(activity) over (partition by customer order by ts) != 'email'
then null
when activity = 'email' and lag(activity) over (partition by customer order by ts) !='email'
then 1
else 0
end as cndn
from activity_stream
where customer in (select customer from activity_stream where activity = 'email')
) as s
where cndn = 1 OR cndn is null
order by ts;
Note that you have an unnecessary subquery in the CTE, which does an ORDER BY which won't "stick" anyway. But other than this, you might want to keep your current code as is.

Update value based on value from another record of same table

Here I have a sample table of a website visitors. As we can see, sometimes visitor don't provide their email. Also they may switch to different email addresses over period.
**
Original table:
**
I want to update this table with following requirements:
First time when a visitor provides an email, all his past visits will be tagged to that email
Also all his future visits will be tag to that email until he switches to another email.
**
Expected table after update:
**
I was wondering if there is a way of doing it in Redshift or T-Sql?
Thanks everyone!
In SQL Server or Redshift, you can use a subquery to calculate the email:
select t.*,
coalesce(email,
max(email) over (partition by visitor_id, grp),
max(case when activity_date = first_email_date then email end) over (partition by visitor_id)
)
from (select t.*,
min(case when email is not null then activity_date end) over
(partition by visitor_id order by activity_date rows between unbounded preceding and current row) as first_email_date,
count(email) over (partition by visitor_id order by activity_date between unbounded preceding and current row) as grp
from t
) t;
You can then use this in an update:
update t
set emai = tt.imputed_email
from (select t.,
coalesce(email,
max(email) over (partition by visitor_id, grp),
max(case when activity_date = first_email_date then email end) over (partition by visitor_id)
) as imputed_email
from (select t.,
min(case when email is not null then activity_date end) over
(partition by visitor_id order by activity_date) as first_email_date,
count(email) over (partition by visitor_id order by activity_date) as grp
from t
) t
) tt
where tt.visitor_id = t.visitor_id and tt.activity_date = t.activity_date and
t.email is null;
If we suppose that the name of the table is Visits and the primary key of that table is made of the columns Visitor_id and Activity_Date then you can do in T-SQL following:
using correlated subquery:
update a
set a.Email = coalesce(
-- select the email used previously
(
select top 1 Email from Visits
where Email is not null and Activity_Date < a.Activity_Date and Visitor_id = a.Visitor_id
order by Activity_Date desc
),
-- if there was no email used previously then select the email used next
(
select top 1 Email from Visits
where Email is not null and Activity_Date > a.Activity_Date and Visitor_id = a.Visitor_id
order by Activity_Date
)
)
from Visits a
where a.Email is null;
using window function to provide the ordering:
update v
set Email = vv.Email
from Visits v
join (
select
v.Visitor_id,
coalesce(a.Email, b.Email) as Email,
v.Activity_Date,
row_number() over (partition by v.Visitor_id, v.Activity_Date
order by a.Activity_Date desc, b.Activity_Date) as Row_num
from Visits v
-- previous visits with email
left join Visits a
on a.Visitor_id = v.Visitor_id
and a.Email is not null
and a.Activity_Date < v.Activity_Date
-- next visits with email if there are no previous visits
left join Visits b
on b.Visitor_id = v.Visitor_id
and b.Email is not null
and b.Activity_Date > v.Activity_Date
and a.Visitor_id is null
where v.Email is null
) vv
on vv.Visitor_id = v.Visitor_id
and vv.Activity_Date = v.Activity_Date
where
vv.Row_num = 1;
For each visitor_id you can update the null email value with the previus non-null value. In case there is none, you will use the next non-null value.You can get those values as follows:
select
v.*, v_prev.email prev_email, v_next.email next_email
from
visits v
left join visits v_prev on v.visitor_id = v_prev.visitor_id
and v_prev.activity_date = (select max(v2.activity_date) from visits v2 where v2.visitor_id = v.visitor_id and v2.activity_date < v.activity_date and v2.email is not null)
left join visits v_next on v.visitor_id = v_next.visitor_id
and v_next.activity_date = (select min(v2.activity_date) from visits v2 where v2.visitor_id = v.visitor_id and v2.activity_date > v.activity_date and v2.email is not null)
where
v.email is null

I need to write a query to mark previous record as “Not eligible ” if a new record comes in within 30 days with same POS Order ID

I have a requirement to write a query to retrieve the records which have POS_ORDER_ID in the table with same POS_ORDER_ID which comes within 30days as new record with status 'Canceled', 'Discontinued' and need to mark previous POS_ORDER_ID record as it as not eligible
Table columns:
POS_ORDER_ID,
Status,
Order_date,
Error_description
A query containing MAX() and ROW_NUMBER() analytic functions might help you such as :
with t as
(
select t.*,
row_number() over (partition by pos_order_id order by Order_date desc ) as rn,
max(Order_date) over (partition by pos_order_id) as mx
from tab t -- your original table
)
select pos_order_id, Status, Order_date, Error_description,
case when rn >1
and t.status in ('Canceled','Discontinued')
and mx - t.Order_date <= 30
then
'Not eligible'
end as "Extra Status"
from t
Demo
Please use below query,
Select and validate
select POS_ORDER_ID, Status, Order_date, Error_description, row_number()
over(partition by POS_ORDER_ID order by Order_date desc)
from table_name;
Update query
merge into table_name t1
using
(select row_id, POS_ORDER_ID, Status, Order_date, Error_description,
row_number() over(partition by POS_ORDER_ID order by Order_date desc) as rnk
from table_name) t2
on (t1.POS_ORDER_ID = t2.POS_ORDER_ID and t1.row_id = t2.row_id)
when matched then
update
set
case when t2.rnk = 1 then 'Canceled' else 'Not Eligible';

What should be done to do multiple order by?

I want to sort by chart_num and DATE. However, the following results are printed out when aligned:
in this my code:
SELECT *
FROM (
SELECT id, chart_num, chart_name, MIN(DATE) AS DATE, amount, (COUNT(*) = 2) AS result, card_check
FROM (
(
SELECT id, hpd.chart_num AS chart_num, hpd.chart_name AS chart_name, hpd.visit AS DATE, card_amount_received AS amount, card_check_modify AS card_check
,row_number() over (PARTITION BY card_amount_received ORDER BY id) AS seqnum
FROM hospital_payment_data hpd
WHERE store_mbrno = '135790' AND card_amount_received > 0
)
UNION ALL (
SELECT id, ncd. chart_num AS chart_num, ncd. chart_name AS chart_name, DATE_FORMAT(ncd.tranDate,'%Y-%m-%d') AS DATA, amount, card_check_result AS card_check
,row_number() over (PARTITION BY amount ORDER BY id) AS seqnum
FROM noti_card_data ncd
WHERE (mbrNo = '135790' OR mbrNo = '135791') AND cmd ='승인'
)
) X
GROUP BY amount, seqnum
ORDER BY result DESC
) a
ORDER BY a.DATE DESC
The result I want is that the NULL value goes back to the latest DATE, and if there is a chart_num, I want to sort it in order of chart_num and DATE.
It feels like I'm missing something else with this question, but you can separate columns in the ORDER BY with a comma. It's not clear from your text whether you want dates grouped within the same chart_num or charts grouped within the same date, but if I guessed wrong you can just swap it.
Also, the ORDER BY result DESC is completely extra. It adds nothing to the results, and by removing it we can get rid of a whole level of nesting.
SELECT id, chart_num, chart_name, MIN(DATE) AS DATE, amount, (COUNT(*) = 2) AS result, card_check
FROM (
(
SELECT id, hpd.chart_num AS chart_num, hpd.chart_name AS chart_name, hpd.visit AS DATE, card_amount_received AS amount, card_check_modify AS card_check
,row_number() over (PARTITION BY card_amount_received ORDER BY id) AS seqnum
FROM hospital_payment_data hpd
WHERE store_mbrno = '135790' AND card_amount_received > 0
)
UNION ALL (
SELECT id, ncd.chart_num, ncd.chart_name, DATE_FORMAT(ncd.tranDate,'%Y-%m-%d'), amount, card_check_result
,row_number() over (PARTITION BY amount ORDER BY id) AS seqnum
FROM noti_card_data ncd
WHERE mbrNo IN ('135790', '135791') AND cmd ='승인'
)
) X
GROUP BY amount, seqnum
ORDER BY MIN(DATE), coalesce(chart_num,-1), result DESC
Dont order by result in the inner union all query.
Sort by chart_num and date in place of result.
So in place of
Order by result desc
use this:
Order by chart_num desc, DATE desc
Or,
in outer main query:
in place of
Order by a.DATE DESC
use
Order by a.chart_num desc, a.DATE desc
Hope it helps.!