Aging bucket in SQL query - sql

I have a table which has member ID along with LM_Conversion_date and retired_date. I have managed to get the difference between two date but now i would like to have the aging bucket and reflect those membership number which falls under those bucket. Here is my table example and how i want to see the data,
Member_no LM_Conversion_date Retired_date Date_difference
100026 08/12/2017 31/12/2017 23
100114 31/08/2017 31/08/2017 0
100620 15/09/2017 30/09/2017 15
100726 10/01/2017 31/12/2016 -10
I want the output to be
All negative 0-15 15-30 >30
100726 100114 100026
100620
Any Help will be much appreciated

You can do this using conditional aggregation:
select max(case when grp = '<0' then member_no end) as all_negative,
max(case when grp = '<=15' then member_no end) as [0-15],
max(case when grp = '<=30' then member_no end) as [15-30],
max(case when grp = '>30' then member_no end) as [>30]
from (select t.*, v.grp,
row_number() over (partition by grp order by member_no) as seqnum
from t cross apply
(values (case when date_difference <= 0 then '<0'
when date_difference <= 15 then '<=15'
when date_difference <= 30 then '<=30'
else '>30'
end)
) v(grp)
) t
group by seqnum
order by seqnum;
The subquery basically enumerates the members in each group. These are aggregated into separate rows by the aggregation.

Related

How to get difference in value over a sliding time window?

I'm attempting to write a SQL query which returns every product where the most recent price on an order within the last 30 days is different than the most recent price in the previous 30 days, and that calculated variance. I'm currently using PostgreSQL 11.
Data Model
Right now, the data is structured into three tables: orders, products, and a pivot table, order_product. Here is the simplified version of the table structure:
Orders
id
order_date
1
2022-01-15
2
2022-02-15
3
2022-03-08
Products
id
name
1
Some product
2
Another product
3
Yet another product
Order_Product
order_id
product_id
unit_price
1
1
10
1
2
20
1
3
10
2
1
12
2
2
20
2
3
5
3
1
15
Desired Output
The desired output would be something like the following:
id
name
order_date
latest_unit_price
previous_unit_price
variance
1
Some product
2022-03-08
15
10
5
3
Yet another product
2022-02-15
5
10
-5
What I've done so far
I've been able to write a join that combines the Orders and Products via the order_product table, within the 60-day window, which is seemingly the easy part:
SELECT
"products"."id",
"products"."name",
"order_product"."unit_price",
"orders"."order_date"
FROM
products
JOIN order_product ON products.id = order_product.product_id
JOIN orders ON order_product.order_id = orders.id
WHERE
order_date BETWEEN now() - INTERVAL '60 days'
AND now()
I've been trying to work with RANK() and LAG(); however, where I'm getting stuck is being able to find the rank the rows within the 30-day time windows, and then calculate the variance between the two windows.
Any help would be much appreciated!
Update: Added solution
Building off of the answer by D-Shih, I had to tweak this to work based on the time window starting from the current date:
WITH CTE AS (
SELECT
"products"."id",
"products"."name",
"order_product"."unit_price",
"orders"."order_date"
FROM
products
JOIN order_product ON products.id = order_product.product_id
JOIN orders ON order_product.order_id = orders.id
WHERE
order_date BETWEEN now() - INTERVAL '60 days' AND now()
),
CTE2 AS (
SELECT
*,
EXTRACT(DAYS FROM now() - order_date :: timestamp) gap_days
FROM
CTE
),
CTE3 AS (
SELECT
*,
(CASE WHEN gap_days < 30 THEN 1 ELSE 0 END) grp
FROM
CTE2
)
SELECT
id,
name,
MAX(CASE WHEN grp = 1 THEN order_date END) order_date,
MAX(CASE WHEN grp = 1 THEN unit_price END) latest_unit_price,
MAX(CASE WHEN grp = 0 THEN unit_price END) previous_unit_price,
SUM(CASE WHEN grp = 1 THEN unit_price ELSE - unit_price END) variance
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY ID, grp ORDER BY order_date DESC) rn
FROM
CTE3
) t1
WHERE
rn = 1
GROUP BY
id,
name
HAVING
MAX(CASE WHEN grp = 1 THEN unit_price END) <> MAX(CASE WHEN grp = 0 THEN unit_price END)
sqlfiddle
You can try to use EXTRACT with LAG window function to get days difference from order_date and previous order_date each productId.
Then use SUM aggregate condition window function to calculate the group
grp = 0 within the last 30 days
grp = 1 most recent price in the previous 30 days,
the query would be look like as below.
WITH CTE AS (
SELECT "products"."id",
"products"."name",
"order_product"."unit_price",
"orders"."order_date"
FROM
products
JOIN order_product ON products.id = order_product.product_id
JOIN orders ON order_product.order_id = orders.id
WHERE
order_date BETWEEN now() - INTERVAL '60 days'
AND now()
), CTE2 AS (
SELECT *,EXTRACT(DAYS FROM order_date - LAG(order_date,1,order_date) OVER(PARTITION BY id ORDER BY order_date)) gap_seconds
FROM CTE
), CTE3 AS (
SELECT *,(CASE WHEN SUM(gap_seconds) OVER(PARTITION BY id ORDER BY order_date) > 30 THEN 1 ELSE 0 END) grp
FROM CTE2
)
SELECT id,
name,
MAX(CASE WHEN grp = 1 THEN order_date END) order_date,
MAX(CASE WHEN grp = 1 THEN unit_price END) latest_unit_price,
MAX(CASE WHEN grp = 0 THEN unit_price END) previous_unit_price,
SUM(CASE WHEN grp = 1 THEN unit_price ELSE - unit_price END) variance
FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY ID,grp ORDER BY order_date DESC) rn
FROM CTE3
) t1
WHERE rn = 1
GROUP BY id,
name
HAVING MAX(CASE WHEN grp = 1 THEN unit_price END) <> MAX(CASE WHEN grp = 0 THEN unit_price END)
sqlfiddle

Is there a way that I can group by records that have a "cluster" of a time difference?

For example, I have a set of log records and I want to group records together if they are less than 1 minute apart. If there is a gap of more than 1 minute between the log records in the sequence, then I want to have that be in its own separate group.
Example:
Time
RecordType
00:00:01
A
00:00:02
A
00:00:03
B
00:01:02
A
00:02:05
A
00:02:06
B
Then I want to create something like:
Group #
Total Count
A Count
B Count
1
4
3
1
2
2
1
1
Yes. Use lag() to determine the first record in each group. Then do a cumulative sum to assign the groups. And finally aggregate:
select min(time), max(time), count(*),
sum(case when recordtype = 'A' then 1 else 0 end) as num_a,
sum(case when recordtype = 'B' then 1 else 0 end) as num_b
from (select t.*,
sum(case when prev_time > dateadd(minute, -1, time) then 0 else 1 end) over (order by time) as grp
from (select t.*,
lag(time) over (order by time) as prev_time
from t
) t
) t
group by grp;

I am looking to find customers repurchase frequency in SQL from their first purchase date

I am trying to find the customer's repurchase rates from their first order date. For example, for 2016, how many customer purchased 1X in days 1-365 from their initial purchase, how many purchased twice etc.
I have a transaction_detail table which looks like below:
txn_date Customer_ID Transaction_Number Sales
1/2/2019 1 12345 $10
4/3/2018 1 65890 $20
3/22/2019 3 64453 $30
4/3/2019 4 88567 $20
5/21/2019 4 85446 $15
1/23/2018 5 89464 $40
4/3/2019 5 99674 $30
4/3/2019 6 32224 $20
1/23/2018 6 46466 $30
1/20/2018 7 56558 $30
I am able to find the customers who have shopped in 2016 and how many times have they repurchased in 2016, but I need to find the customer who have shopped in 2016 and how many times have they come back from their first purchase date.
I need a starting point for the query, I am not sure how to build this logic in my SQL code.
Any help would be appreciated.
I am using the below query:
WITH by_year
AS (SELECT
Customer_ID,
to_char(txn_date, 'YYYY') AS visit_year
FROM table
GROUP BY Customer_ID, to_char(txn_date, 'YYYY')),
with_first_year
AS (SELECT
Customer_ID,
visit_year,
FIRST_VALUE(visit_year) OVER (PARTITION BY Customer_ID ORDER BY visit_year) AS first_year
FROM by_year),
with_year_number
AS (SELECT
Customer_ID,
visit_year,
first_year,
(visit_year - first_year) AS year_number
FROM with_first_year)
SELECT
first_year AS first_year,
SUM(CASE WHEN year_number = 0 THEN 1 ELSE 0 END) AS year_0,
SUM(CASE WHEN year_number = 1 THEN 1 ELSE 0 END) AS year_1,
SUM(CASE WHEN year_number = 2 THEN 1 ELSE 0 END) AS year_2,
SUM(CASE WHEN year_number = 3 THEN 1 ELSE 0 END) AS year_3,
SUM(CASE WHEN year_number = 4 THEN 1 ELSE 0 END) AS year_4,
SUM(CASE WHEN year_number = 5 THEN 1 ELSE 0 END) AS year_5,
SUM(CASE WHEN year_number = 6 THEN 1 ELSE 0 END) AS year_6,
SUM(CASE WHEN year_number = 7 THEN 1 ELSE 0 END) AS year_7,
SUM(CASE WHEN year_number = 8 THEN 1 ELSE 0 END) AS year_8,
SUM(CASE WHEN year_number = 9 THEN 1 ELSE 0 END) AS year_9
FROM with_year_number
GROUP BY first_year
ORDER BY first_year
Use window functions and aggregation:
select cnt, count(*), min(customer_id), max(customer_id)
from (select customer_id, count(*) as cnt
from (select td.*,
min(txn_date) over (partition by Customer_ID) as min_txn_date
from transaction_detail td
) td
where txn_date >= min_txn_date and txn_date < min_txn_date + interval '365' day
group by customer_id
) c
group by cnt
order by cnt;
So as per my understanding, you want to know the count of the distinct person who first purchased in 2016 and repurchased after one year or more from date of purchase.
Select * from
(
Select customer_id,
Floor(months_between(txn_date, lead_txn_date)/12) as num_years
From
(
Select customer_id,
txn_date,
row_number() over (partition by Customer_ID order by txn_date) as rn,
lead(txn_date) over (partition by Customer_ID order by txn_date) as lead_txn_date
From your_table
)
Where txn_date >= date '2016-01-01'
and txn_date < date '2017-01-01'
and rn = 1
And months_between(txn_date, lead_txn_date) >= 12
)
Pivot
(
Count(1) for num_year in (1,2,3,4)
)
Ultimately, we are finding the number of years between first and second purchase of the customer. And first purchase must be in 2016.
Cheers!!

How to display Max Date amount in another column?

I want to derive column MSP_ADULT and MSP_CHILD based on the LATEST date record ADULT_AMT to be in MSP_ADULT and CHILD_AMT to be in MSP_CHILD column.
I want to my out like below.
END_DATE ADULT_AMT CHILD_AMT MSP_ADULT MSP_CHILD
09/01/2017 100 50 180 80
10/01/2018 200 100 180 80
04/05/2019 300 90 180 80
08/20/2019 180 80 180 80
Here is the code I am running, but it is not working.
SELECT
AL1.END_DATE as BROCHURE_EFFECTIVE_END_DATE,
PORT_CODE,
AL7.PRODUCT_LEGACY_CODE as COMPONENT_CODE,
AL3.PRICE_GUEST_AGE_GROUP ,
MAX(CASE WHEN AL3.PRICE_GUEST_AGE_GROUP = 'ADULT' THEN AL3.PRICE_AMOUNT ELSE 0 END) ADULT_AMT,
MAX(CASE WHEN AL3.PRICE_GUEST_AGE_GROUP = 'CHILD' THEN AL3.PRICE_AMOUNT ELSE 0 END) CHILD_AMT,
MAX(CASE WHEN AL3.PRICE_GUEST_AGE_GROUP = 'ADULT' THEN AL3.PRICE_AMOUNT ELSE 0 END)
OVER (PARTITION BY PORT_CODE, AL7.PRODUCT_LEGACY_CODE --order by AL1.END_DATE desc
)AS MSP_ADULT,
MAX(CASE WHEN AL3.PRICE_GUEST_AGE_GROUP = 'CHILD' THEN AL3.PRICE_AMOUNT ELSE 0 END)
OVER (PARTITION BY PORT_CODE, AL7.PRODUCT_LEGACY_CODE order by AL1.END_DATE desc ) AS MSP_child
FROM RATE_PLAN AL1
inner join PRICE AL3
on (AL3.RATE_PLAN_SK=AL1.RATE_PLAN_SK and AL3.rate_plan_sk <>-1 )
Inner join PRODUCT_VARIANT AL7
ON (AL3.PRODUCT_CODE = AL7.PRODUCT_LEGACY_CODE and AL7.CATALOG_VERSION='Online')
INNER JOIN PRODUCT_OFFERING AL14
ON (AL14.PRODUCT_CODE = AL7.PRODUCT_LEGACY_CODE and AL7.CATALOG_VERSION='Online')
inner join PORT
on (AL7.FULFILLMENT_LOCATION = PORT_CODE)
where TO_CHAR(AL1.END_DATE,'YYYY') >= TO_CHAR(SYSDATE,'YYYY')
and AL3.use_for_pricing_flag is not null
and port_code='HKT' and AL7.product_LEGACY_code='PK83'
GROUP BY
PORT_CODE,
PRICE_GUEST_AGE_GROUP,
AL7.PRODUCT_LEGACY_CODE,
AL1.END_DATE
)
So you want the last values of those conditional MAX's.
Try FIRST_VALUE with a descending order.
...
FIRST_VALUE(ADULT_AMT)
OVER (PARTITION BY PORT_CODE, COMPONENT_CODE ORDER BY BROCHURE_EFFECTIVE_END_DATE DESC) AS MSP_ADULT,
FIRST_VALUE(CHILD_AMT)
OVER (PARTITION BY PORT_CODE, COMPONENT_CODE ORDER BY BROCHURE_EFFECTIVE_END_DATE DESC) AS MSP_CHILD
...
Note that there's also the LAST_VALUE window function, but that one can sometimes be misleading.

How to do filter for a field generated by using MAX(CASE WHEN ... END)?

I have retrieved data successfully using the query below from a data in January until May generating every first and second purchase for each customer.
SELECT
MAX(CASE WHEN row_num = 1 THEN month END) AS month,
customer_id,
1 AS row_num,
DATE_DIFF(MAX(CASE WHEN row_num = 2 THEN verified_date END),
MAX(CASE WHEN row_num = 1 THEN verified_date END), DAY) AS difference
FROM yourTable
GROUP BY
customer_id;
Now, I would like to filter the month to get all the user doing FIRST transaction in Jan - Apr, and doing SECOND transaction anytime (Jan - May) and try this query:
SELECT
MAX(CASE WHEN row_num = 1 AND month IN (1,2,3,4) THEN month END) AS month,
customer_id,
1 AS row_num,
DATE_DIFF(MAX(CASE WHEN row_num = 2 THEN verified_date END),
MAX(CASE WHEN row_num = 1 THEN verified_date END), DAY) AS difference
FROM yourTable
GROUP BY
customer_id;
The query successfully runs, however, it generated month 1 2 3 4, and NULL in the month field.
Why there's NULL in it?
Thank you
Assuming there's a row_num in yourTable that orders transactions of each customer chronologically, MAX(CASE WHEN row_num = 1 AND month IN (1,2,3,4) THEN month END) AS month, will end up null when the row with row_num = 1 has a month different from 1, 2, 3, or 4, i.e. "first transaction is in May".
To filter, use HAVING MAX(CASE WHEN row_num = 1 AND month IN (1,2,3,4) THEN month END) is not null.