Row number custom coded in sql - sql

I am using bigquery #standardsql to work on a table. The table will note a conversion (1) for user who purchase something in month 9 and month 10. And for user who did not purchase at month 10, will only have 0 in their row
So far , this is the query for custom_coded
(case when row_number()
over (partition by customer_id order by purchase_date asc) =
count(*) over (partition by customer_id)
then 1 else 0 END) AS custom_coded
and this is the result so far
What i expect is that customer_id = 288 only have 0 in custom_coded since he did not purchase in next month, or month 10. And customer_id = 879 expected to have 1 in his latest purchase_date since he have a purchase record at month 10
This is the expected result
I previously asked in this thread (Decode maximum number in rows for sql), however the dataset didn't satisfy the idea for the analysis that i'm going to executed

Below is for BigQuery Standard SQL
#standardSQL
SELECT customer_id, item_purchased, purchase_date,
(CASE WHEN
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY purchase_date ASC) =
COUNT(*) OVER (PARTITION BY customer_id)
AND SUM(DISTINCT (CASE FORMAT_DATE('%Y%m', purchase_date)
WHEN '201709' THEN 1 WHEN '201710' THEN 2 ELSE 0 END))
OVER(PARTITION BY customer_id) = 3
THEN 1 ELSE 0
END) AS custom_coded
FROM `project.dataset.table`
You can test / play with above using dummy data from your question
#standardSQL
WITH `project.dataset.table` AS (
SELECT 288 customer_id, 'Rice' item_purchased, DATE '2017-09-02' purchase_date UNION ALL
SELECT 288, 'Rice', DATE '2017-09-02' UNION ALL
SELECT 288, 'Rice', DATE '2017-09-06' UNION ALL
SELECT 879, 'Plate', DATE '2017-09-01' UNION ALL
SELECT 879, 'Plate', DATE '2017-09-25' UNION ALL
SELECT 879, 'Plate', DATE '2017-10-25' UNION ALL
SELECT 879, 'Plate', DATE '2017-10-27'
)
SELECT customer_id, item_purchased, purchase_date,
(CASE WHEN
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY purchase_date ASC) =
COUNT(*) OVER (PARTITION BY customer_id)
AND SUM(DISTINCT (CASE FORMAT_DATE('%Y%m', purchase_date)
WHEN '201709' THEN 1 WHEN '201710' THEN 2 ELSE 0 END))
OVER(PARTITION BY customer_id) = 3
THEN 1 ELSE 0
END) AS custom_coded
FROM `project.dataset.table`
ORDER BY customer_id, purchase_date
result is
customer_id item_purchased purchase_date custom_coded
288 Rice 2017-09-02 0
288 Rice 2017-09-02 0
288 Rice 2017-09-06 0
879 Plate 2017-09-01 0
879 Plate 2017-09-25 0
879 Plate 2017-10-25 0
879 Plate 2017-10-27 1

Related

How to get difference in value over a sliding time window?

I'm attempting to write a SQL query which returns every product where the most recent price on an order within the last 30 days is different than the most recent price in the previous 30 days, and that calculated variance. I'm currently using PostgreSQL 11.
Data Model
Right now, the data is structured into three tables: orders, products, and a pivot table, order_product. Here is the simplified version of the table structure:
Orders
id
order_date
1
2022-01-15
2
2022-02-15
3
2022-03-08
Products
id
name
1
Some product
2
Another product
3
Yet another product
Order_Product
order_id
product_id
unit_price
1
1
10
1
2
20
1
3
10
2
1
12
2
2
20
2
3
5
3
1
15
Desired Output
The desired output would be something like the following:
id
name
order_date
latest_unit_price
previous_unit_price
variance
1
Some product
2022-03-08
15
10
5
3
Yet another product
2022-02-15
5
10
-5
What I've done so far
I've been able to write a join that combines the Orders and Products via the order_product table, within the 60-day window, which is seemingly the easy part:
SELECT
"products"."id",
"products"."name",
"order_product"."unit_price",
"orders"."order_date"
FROM
products
JOIN order_product ON products.id = order_product.product_id
JOIN orders ON order_product.order_id = orders.id
WHERE
order_date BETWEEN now() - INTERVAL '60 days'
AND now()
I've been trying to work with RANK() and LAG(); however, where I'm getting stuck is being able to find the rank the rows within the 30-day time windows, and then calculate the variance between the two windows.
Any help would be much appreciated!
Update: Added solution
Building off of the answer by D-Shih, I had to tweak this to work based on the time window starting from the current date:
WITH CTE AS (
SELECT
"products"."id",
"products"."name",
"order_product"."unit_price",
"orders"."order_date"
FROM
products
JOIN order_product ON products.id = order_product.product_id
JOIN orders ON order_product.order_id = orders.id
WHERE
order_date BETWEEN now() - INTERVAL '60 days' AND now()
),
CTE2 AS (
SELECT
*,
EXTRACT(DAYS FROM now() - order_date :: timestamp) gap_days
FROM
CTE
),
CTE3 AS (
SELECT
*,
(CASE WHEN gap_days < 30 THEN 1 ELSE 0 END) grp
FROM
CTE2
)
SELECT
id,
name,
MAX(CASE WHEN grp = 1 THEN order_date END) order_date,
MAX(CASE WHEN grp = 1 THEN unit_price END) latest_unit_price,
MAX(CASE WHEN grp = 0 THEN unit_price END) previous_unit_price,
SUM(CASE WHEN grp = 1 THEN unit_price ELSE - unit_price END) variance
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY ID, grp ORDER BY order_date DESC) rn
FROM
CTE3
) t1
WHERE
rn = 1
GROUP BY
id,
name
HAVING
MAX(CASE WHEN grp = 1 THEN unit_price END) <> MAX(CASE WHEN grp = 0 THEN unit_price END)
sqlfiddle
You can try to use EXTRACT with LAG window function to get days difference from order_date and previous order_date each productId.
Then use SUM aggregate condition window function to calculate the group
grp = 0 within the last 30 days
grp = 1 most recent price in the previous 30 days,
the query would be look like as below.
WITH CTE AS (
SELECT "products"."id",
"products"."name",
"order_product"."unit_price",
"orders"."order_date"
FROM
products
JOIN order_product ON products.id = order_product.product_id
JOIN orders ON order_product.order_id = orders.id
WHERE
order_date BETWEEN now() - INTERVAL '60 days'
AND now()
), CTE2 AS (
SELECT *,EXTRACT(DAYS FROM order_date - LAG(order_date,1,order_date) OVER(PARTITION BY id ORDER BY order_date)) gap_seconds
FROM CTE
), CTE3 AS (
SELECT *,(CASE WHEN SUM(gap_seconds) OVER(PARTITION BY id ORDER BY order_date) > 30 THEN 1 ELSE 0 END) grp
FROM CTE2
)
SELECT id,
name,
MAX(CASE WHEN grp = 1 THEN order_date END) order_date,
MAX(CASE WHEN grp = 1 THEN unit_price END) latest_unit_price,
MAX(CASE WHEN grp = 0 THEN unit_price END) previous_unit_price,
SUM(CASE WHEN grp = 1 THEN unit_price ELSE - unit_price END) variance
FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY ID,grp ORDER BY order_date DESC) rn
FROM CTE3
) t1
WHERE rn = 1
GROUP BY id,
name
HAVING MAX(CASE WHEN grp = 1 THEN unit_price END) <> MAX(CASE WHEN grp = 0 THEN unit_price END)
sqlfiddle

I am looking to find customers repurchase frequency in SQL from their first purchase date

I am trying to find the customer's repurchase rates from their first order date. For example, for 2016, how many customer purchased 1X in days 1-365 from their initial purchase, how many purchased twice etc.
I have a transaction_detail table which looks like below:
txn_date Customer_ID Transaction_Number Sales
1/2/2019 1 12345 $10
4/3/2018 1 65890 $20
3/22/2019 3 64453 $30
4/3/2019 4 88567 $20
5/21/2019 4 85446 $15
1/23/2018 5 89464 $40
4/3/2019 5 99674 $30
4/3/2019 6 32224 $20
1/23/2018 6 46466 $30
1/20/2018 7 56558 $30
I am able to find the customers who have shopped in 2016 and how many times have they repurchased in 2016, but I need to find the customer who have shopped in 2016 and how many times have they come back from their first purchase date.
I need a starting point for the query, I am not sure how to build this logic in my SQL code.
Any help would be appreciated.
I am using the below query:
WITH by_year
AS (SELECT
Customer_ID,
to_char(txn_date, 'YYYY') AS visit_year
FROM table
GROUP BY Customer_ID, to_char(txn_date, 'YYYY')),
with_first_year
AS (SELECT
Customer_ID,
visit_year,
FIRST_VALUE(visit_year) OVER (PARTITION BY Customer_ID ORDER BY visit_year) AS first_year
FROM by_year),
with_year_number
AS (SELECT
Customer_ID,
visit_year,
first_year,
(visit_year - first_year) AS year_number
FROM with_first_year)
SELECT
first_year AS first_year,
SUM(CASE WHEN year_number = 0 THEN 1 ELSE 0 END) AS year_0,
SUM(CASE WHEN year_number = 1 THEN 1 ELSE 0 END) AS year_1,
SUM(CASE WHEN year_number = 2 THEN 1 ELSE 0 END) AS year_2,
SUM(CASE WHEN year_number = 3 THEN 1 ELSE 0 END) AS year_3,
SUM(CASE WHEN year_number = 4 THEN 1 ELSE 0 END) AS year_4,
SUM(CASE WHEN year_number = 5 THEN 1 ELSE 0 END) AS year_5,
SUM(CASE WHEN year_number = 6 THEN 1 ELSE 0 END) AS year_6,
SUM(CASE WHEN year_number = 7 THEN 1 ELSE 0 END) AS year_7,
SUM(CASE WHEN year_number = 8 THEN 1 ELSE 0 END) AS year_8,
SUM(CASE WHEN year_number = 9 THEN 1 ELSE 0 END) AS year_9
FROM with_year_number
GROUP BY first_year
ORDER BY first_year
Use window functions and aggregation:
select cnt, count(*), min(customer_id), max(customer_id)
from (select customer_id, count(*) as cnt
from (select td.*,
min(txn_date) over (partition by Customer_ID) as min_txn_date
from transaction_detail td
) td
where txn_date >= min_txn_date and txn_date < min_txn_date + interval '365' day
group by customer_id
) c
group by cnt
order by cnt;
So as per my understanding, you want to know the count of the distinct person who first purchased in 2016 and repurchased after one year or more from date of purchase.
Select * from
(
Select customer_id,
Floor(months_between(txn_date, lead_txn_date)/12) as num_years
From
(
Select customer_id,
txn_date,
row_number() over (partition by Customer_ID order by txn_date) as rn,
lead(txn_date) over (partition by Customer_ID order by txn_date) as lead_txn_date
From your_table
)
Where txn_date >= date '2016-01-01'
and txn_date < date '2017-01-01'
and rn = 1
And months_between(txn_date, lead_txn_date) >= 12
)
Pivot
(
Count(1) for num_year in (1,2,3,4)
)
Ultimately, we are finding the number of years between first and second purchase of the customer. And first purchase must be in 2016.
Cheers!!

How to do a Min and Max of date but following the changes in price points

I'm not really sure how to word this question better so I'll provide the data that I have and the result that I'm after.
This is the data that I have
sku sales qty date
A 100 1 1-Jan-19
A 200 2 2-Jan-19
A 100 1 3-Jan-19
A 240 2 4-Jan-19
A 360 3 5-Jan-19
A 360 4 6-Jan-19
A 200 2 7-Jan-19
A 90 1 8-Jan-19
B 100 1 9-Jan-19
B 200 2 10-Jan-19
And this is the result that I'm after
sku price sum(qty) sum(sales) min(date) max(date)
A 100 4 400 1-Jan-19 3-Jan-19
A 120 5 600 4-Jan-19 5-Jan-19
A 90 4 360 6-Jan-19 6-Jan-19
A 100 2 200 7-Jan-19 7-Jan-19
A 90 1 90 8-Jan-19 8-Jan-19
B 100 3 300 9-Jan-19 10-Jan-19
As you can see, I'm trying to get the min and max date of each price point, where price = sales/qty. At this point, I can get the min and max date of the same price but I can separate it when there's another price in between. I think I have to use some sort of min(date) over (partition by sales/qty order by date) but I can't figure it out yet.
I'm using Redshift SQL
This is a gaps-and-islands query. You can do this by generating a sequence and subtracting that from the date. Then aggregate:
select sku, price, sum(qty), sum(sales),
min(date), max(date)
from (select t.*,
row_number() over (partition by sku, price order by date) as seqnum
from t
) t
group by sku, price, (date - seqnum * interval '1 day')
order by sku, price, min(date);
You can do with Sub Query and LAG
FIDDLE DEMO
SELECT SKU, Price, SUM(Qty) SumQty, SUM(Sales) SumSales, MIN(date) MinDate, MAX(date) MaxDate
FROM (
SELECT SKU,Price,SUM(is_change) OVER(order by SKU, date) is_change,Sales, Qty,date
FROM (SELECT SKU, Sales/Qty AS Price, Sales, Qty,date,
CASE WHEN Sales/Qty = lag(Sales/Qty) over (order by SKU, date)
and SKU = lag(SKU) OVER (order by SKU, date) then 0 ELSE 1 END AS is_change
FROM Tbl
)InnerSelect
) X GROUP BY sku, price,is_change
ORDER BY SKU,MIN(date)
Output

Additional condition withing partition over

https://www.db-fiddle.com/f/rgLXTu3VysD3kRwBAQK3a4/3
My problem here is that I want function partition over to start counting the rows only from certain time range.
In this example, if I would add rn = 1 at the end, order_id = 5 would be excluded from the results (because partition is ordering by paid_date and there's order_id = 6 with earlier date) but it shouldn't be as I want that time range for partition starts from '2019-01-10'.
Adding condition rn = 1expected output should be order_id 3,5,11,15, now its only 3,11,15
it should include only orders with is_paid = 0 that are the first one within given time range (if there's preceeding order with is_paid = 1 it shouldn't be counted)
use correlated subquery with not exists
DEMO
SELECT order_id, customer_id, amount, is_paid, paid_date, rn FROM (
SELECT o.*,
ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY paid_date,order_id) rn
FROM orders o
WHERE paid_date between '2019-01-10'
and '2019-01-15'
) x where rn=1 and not exists (select 1 from orders o1 where x.order_id=o1.order_id
and is_paid=1)
OUTPUT:
order_id customer_id amount is_paid paid_date rn
3 101 30 0 10/01/2019 00:00:00 1
5 102 15 0 10/01/2019 00:00:00 1
11 104 31 0 10/01/2019 00:00:00 1
15 105 11 0 10/01/2019 00:00:00 1
If priority should be given to order_id then put that before paid date in the partition function order by clause, this will solve your issue.
SELECT order_id, customer_id, amount, is_paid, paid_date, rn FROM (
SELECT o.*,
ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY order_id,paid_date) rn
FROM orders o
) x WHERE is_paid = 0 and paid_date between
'2019-01-10' and '2019-01-15' and rn=1
Since you need the paid date to be ordered first you need to imply a where condition in the partitioning table in order to avoid unnecessary dates interrupting the partition function.
SELECT order_id, customer_id, amount, is_paid, paid_date, rn FROM (
SELECT o.*,
ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY paid_date, order_id) rn
FROM orders o
where paid_date between '2019-01-10' and '2019-01-15'
) x WHERE is_paid = 0 and rn=1

Incremental count

I have a table with a list of Customer Numbers and Order Dates and want to add a count against each Customer number, restarting from 1 each time the customer number changes, I've sorted the Table into Customer then Date order, and need to add an order count column.
CASE WHEN 'Customer Number' on This row = 'Customer Number' on Previous Row then ( Count = Count on Previous Row + 1 )
Else Count = 1
What is the best way to approach this?
Customer and Dates in Customer then Date order:
Customer Date Count
0001 01/05/18 1
0001 02/05/18 2
0001 03/05/18 3
0002 03/05/18 1 <- back to one here as Customer changed
0002 04/05/18 2
0003 05/05/18 1 <- back to one again
I've just tried COUNT(*) OVER (PARTITION BY Customer ) as COUNT but it doesn't seem to be starting from 1 for some reason when the Customer changes
It's hard to tell what you want, but "to add a count against each Customer number, restarting from 1 each time the customer number changes" sounds as if you simply want:
count(*) over (partition by customer_number)
or maybe that should be the count "up-to" the date of the row:
count(*) over (partition by customer_number order by order_date)
It sound like you just want an analytic row_number() call:
select customer_number,
order_date,
row_number() over (partition by customer_number order by order_date) as num
from your_table
order by customer_number,
order_date
Using an analytic count also works, as #horse_with_no_name suggested:
count(*) over (partition by customer_number order by order_date) as num
Quick demo showing both, with your sample data in a CTE:
with your_table (customer_number, order_date) as (
select '0001', date '2018-05-01' from dual
union all select '0001', date '2018-05-03' from dual
union all select '0001', date '2018-05-02' from dual
union all select '0002', date '2018-05-03' from dual
union all select '0002', date '2018-05-04' from dual
union all select '0003', date '2018-05-05' from dual
)
select customer_number,
order_date,
row_number() over (partition by customer_number order by order_date) as num1,
count(*) over (partition by customer_number order by order_date) as num2
from your_table
order by customer_number,
order_date
/
CUST ORDER_DATE NUM1 NUM2
---- ---------- ---------- ----------
0001 2018-05-01 1 1
0001 2018-05-02 2 2
0001 2018-05-03 3 3
0002 2018-05-03 1 1
0002 2018-05-04 2 2
0003 2018-05-05 1 1