Calculating average time between customer orders and average order value in Postgres - sql

In PostgreSQL I have an orders table that represents orders made by customers of a store:
SELECT * FROM orders
order_id
customer_id
value
created_at
1
1
188.01
2020-11-24
2
2
25.74
2022-10-13
3
1
159.64
2022-09-23
4
1
201.41
2022-04-01
5
3
357.80
2022-09-05
6
2
386.72
2022-02-16
7
1
200.00
2022-01-16
8
1
19.99
2020-02-20
For a specified time range (e.g. 2022-01-01 to 2022-12-31), I need to find the following:
Average 1st order value
Average 2nd order value
Average 3rd order value
Average 4th order value
E.g. the 1st purchases for each customer are:
for customer_id 1, order_id 8 is their first purchase
customer 2, order 6
customer 3, order 5
So, the 1st-purchase average order value is (19.99 + 386.72 + 357.80) / 3 = $254.84
This needs to be found for the 2nd, 3rd and 4th purchases also.
I also need to find the average time between purchases:
order 1 to order 2
order 2 to order 3
order 3 to order 4
The final result would ideally look something like this:
order_number
AOV
av_days_since_last_order
1
254.84
0
2
300.00
28
3
322.22
21
4
350.00
20
Note that average days since last order for order 1 would always be 0 as it's the 1st purchase.
Thanks.

select order_number
,round(avg(value),2) as AOV
,coalesce(round(avg(days_between_orders),0),0) as av_days_since_last_order
from
(
select *
,row_number() over(partition by customer_id order by created_at) as order_number
,created_at - lag(created_at) over(partition by customer_id order by created_at) as days_between_orders
from t
) t
where created_at between '2022-01-01' and '2022-12-31'
group by order_number
order by order_number
order_number
aov
av_days_since_last_order
1
372.26
0
2
25.74
239
3
200.00
418
4
201.41
75
5
159.64
175
Fiddle

Im suppose it should be something like this
WITH prep_data AS (
SELECT order_id,
cuntomer_id,
ROW_NUMBER() OVER(PARTITION BY order_id, cuntomer_id ORDER BY created_at) AS pushcase_num,
created_at,
value
FROM pushcases
WHERE created_at BETWEEN :date_from AND :date_to
), prep_data2 AS (
SELECT pd1.order_id,
pd1.cuntomer_id,
pd1.pushcase_num
pd2.created_at - pd1.created_at AS date_diff,
pd1.value
FROM prep_data pd1
LEFT JOIN prep_data pd2 ON (pd1.order_id = pd2.order_id AND pd1.cuntomer_id = pd2.cuntomer_id AND pd1.pushcase_num = pd2.pushcase_num+1)
)
SELECT order_id,
cuntomer_id,
pushcase_num,
avg(value) AS avg_val,
avg(date_diff) AS avg_date_diff
FROM prep_data2
GROUP BY pushcase_num

Related

Partition by weeknumber and next weeknumber

I would like to count for each customer how many times a specific product was purchased in the past. I want to highlight the purchases for the same product (where the second order date is close to the first order date) with a rn = 2, so I can only count the rows with rn = 1
I created the following query and also included the current output. Its containing a partition by week number, to filter out purchases for the same product in the same week. It's working quite good, but the behaviour is not exactly what I was hoping for.
create table sandbox.hm_orders as
select o.customer_id
,o.product_id
,o.order_date
,ROW_NUMBER() over (partition by o.customer_id, o.product_id,concat(EXTRACT(year FROM order_date),EXTRACT(week FROM order_date)) order by o.order_date asc) as rn
,concat(EXTRACT(year FROM o.order_date),'_',EXTRACT(week FROM o.order_date)) as weeknr
from datamarts.orders o
where o.label_id = 1
and o.order_date > '2020-01-01'
and o.payment_status = 'PAID'
Current output:
customer_id
product_ID
order_date
rn
weeknr
4708818
128703
2020-05-11 20:19:25
1
2020_20
4708818
128703
2020-05-12 22:13:09
2
2020_20
4708818
128703
2020-06-06 21:45:04
1
2020_23
4708818
274578
2020-07-02 22:02:10
1
2020_27
4753958
137482
2021-03-14 18:13:04
1
2021_10
4753958
137482
2021-03-15 17:29:03
1
2021_11
As you can see in first two rows, the difference between the first the rows is 1 day and it will mark the second row with a rowNumber 2. For the last 2 rows, the difference between the orders is also 1 day. But since the weeknumbers are different, it will not give the second row a rowNumber 2.
Therefore I would like to find a way to also include the next weeknumber for the partition by. In this case, the order that have been done in 2021-11 needs a row number 2, and the week number 10 needs row number 1
desired output
customer_id
product_ID
order_date
rn
weeknr
4708818
128703
2020-05-11 20:19:25
1
2020_20
4708818
128703
2020-05-12 22:13:09
2
2020_20
4708818
128703
2020-06-06 21:45:04
1
2020_23
4708818
274578
2020-07-02 22:02:10
1
2020_27
4753958
137482
2021-03-14 18:13:04
1
2021_10
4753958
137482
2021-03-15 17:29:03
2
2021_11
A bit complicated way is to calculate a ranking by summing over a calculated flag.
Then use the rank in the row_number.
SELECT *
, ROW_NUMBER() over (partition by customer_id, product_id, rnk order by order_date asc) as rn
, TO_CHAR(order_date, 'YYYY_WW') as weeknr
FROM
(
SELECT *
, SUM(flag) over (partition by customer_id, product_id order by order_date asc) as rnk
FROM
(
SELECT
o.customer_id
, o.product_id
, o.order_date
, CASE WHEN 1 >= DATE_PART('day', o.order_date - LAG(o.order_date) over (partition by o.customer_id, o.product_id order by o.order_date asc))
THEN 0 ELSE 1 END AS flag
FROM orders o
WHERE o.label_id = 1
AND o.order_date > '2020-01-01'
AND o.payment_status = 'PAID'
) q1
) q2
customer_id
product_id
order_date
flag
rnk
rn
weeknr
4708818
128703
2020-05-11 20:19:25
1
1
1
2020_19
4708818
128703
2020-05-12 22:13:09
0
1
2
2020_19
4708818
128703
2020-06-06 21:45:04
1
2
1
2020_23
4708818
274578
2020-07-02 22:02:10
1
1
1
2020_27
4753958
137482
2021-03-14 18:13:04
1
1
1
2021_11
4753958
137482
2021-03-15 17:29:03
0
1
2
2021_11
Test on db<>fiddle here

Add a column with customers orders count at the time they passed the order

I have the following table
order_id
created_at
customer_id
1
2020-01-02
11
2
2020-02-03
12
3
2020-02-03
11
I would like to add a column "customer_orders_count" that will assign the number of orders that a customer passed to each transaction, ie obtain this table :
order_id
created_at
customer_id
customer_orders_count
1
2020-01-02
11
1
2
2020-02-03
12
1
2
2020-02-03
11
2
My problem it's I can't find how to calculated a local "customer_orders_count" dependind on each order, I only managed to add a column with the global "customer_orders_count" and for example for the first row order_id=1 I'll get customer_orders_count=2 whereas I'll like to be 1.
Does anyone has and idea ?
Use cumulative count:
with mytable as (
select 1 as order_id, date '2020-01-02' as created_at, 11 as customer_id union all
select 2, '2020-02-03', 12 union all
select 3 , '2020-02-03', 11
)
select *, count(*) over (partition by customer_id order by created_at) as customer_orders_count
from mytable
order by order_id
Use row_number():
select t.*,
row_number() over (partition by customer_id order by created_at) as customer_order_count
from t;
This is subtly different from using a cumulative count(). This version guarantees that the numbers for a given customer are never duplicated, even when the dates are the same. A cumulative count has no such guarantee.

How to use SQL to get column count for a previous date?

I have the following table,
id status price date
2 complete 10 2020-01-01 10:10:10
2 complete 20 2020-02-02 10:10:10
2 complete 10 2020-03-03 10:10:10
3 complete 10 2020-04-04 10:10:10
4 complete 10 2020-05-05 10:10:10
Required output,
id status_count price ratio
2 0 0 0
2 1 10 0
2 2 30 0.33
I am looking to add the price for previous row. Row 1 is 0 because it has no previous row value.
Find ratio ie 10/30=0.33
You can use analytical function ROW_NUMBER and SUM as follows:
SELECT
id,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
COALESCE(SUM(price) OVER (PARTITION BY id ORDER BY date), 0) - price as price
FROM yourTable;
DB<>Fiddle demo
I think you want something like this:
SELECT
id,
COUNT(*) OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
COALESCE(SUM(price) OVER (PARTITION BY id
ORDER BY date ROWS BETWEEN
UNBOUNDED PRECEDING AND 1 PRECEDING), 0) price
FROM yourTable;
Demo
Please also check another method:
with cte
as(*,ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
SUM(price) OVER (PARTITION BY id ORDER BY date) ss from yourTable)
select id,status_count,isnull(ss,0)-price price
from cte

SQL query to find continuous local max, min of date based on category column

I have the following data set
Customer_ID Category FROM_DATE TO_DATE
1 5 1/1/2000 12/31/2001
1 6 1/1/2002 12/31/2003
1 5 1/1/2004 12/31/2005
2 7 1/1/2010 12/31/2011
2 7 1/1/2012 12/31/2013
2 5 1/1/2014 12/31/2015
3 7 1/1/2010 12/31/2011
3 7 1/5/2012 12/31/2013
3 5 1/1/2014 12/31/2015
The result I want to achieve is to find continuous local min/max date for Customers with the same category and identify any gap in dates:
Customer_ID FROM_Date TO_Date Category
1 1/1/2000 12/31/2001 5
1 1/1/2002 12/31/2003 6
1 1/1/2004 12/31/2005 5
2 1/1/2010 12/31/2013 7
2 1/1/2014 12/31/2015 5
3 1/1/2010 12/31/2011 7
3 1/5/2012 12/31/2013 7
3 1/1/2014 12/31/2015 5
My code works fine for customer 1 (return all 3 rows) and customer 2(return 2 rows with min and max date for each category) but for customer 3, it cannot identify the gap between 12/31/2011 and 1/5/2012 for category 7.
Customer_ID FROM_Date TO_Date Category
3 1/1/2010 12/31/2013 7
3 1/1/2014 12/31/2015 5
Here is my code:
SELECT Customer_ID, Category, min(From_Date), max(To_Date) FROM
(
SELECT Customer_ID, Category, From_Date,To_Date
,row_number() over (order by member_id, To_Date) - row_number() over (partition by Customer_ID order by Category) as p
FROM FFS_SAMP
) X
group by Customer_ID,Category,p
order by Customer_ID,min(From_Date),Max(To_Date)
This is a type of gaps and islands problem. Probably the safest method is to use a cumulative max() to look for overlaps with previous records. Where there is no overlap, then an "island" of records starts. So:
select customer_id, min(from_date), max(to_date), category
from (select t.*,
sum(case when prev_to_date >= from_date then 0 else 1 end) over
(partition by customer_id, category
order by from_date
) as grp
from (select t.*,
max(to_date) over (partition by customer_id, category
order by from_date
rows between unbounded preceding and 1 preceding
) as prev_to_date
from t
) t
) t
group by customer_id, category, grp;
Your attempt is quite close. You just need to fix the over() clause of the window functions:
select customer_id, category, min(from_date), max(to_date)
from (
select
fs.*,
row_number() over (partition by customer_id order from_date)
- row_number() over (partition by customer_id, category order by from_date) as grp
from ffs_samp fs
) x
group by customer_id, category, grp
order by customer_id, min(from_date)
Note that this method assumes no gaps or overlalp in the periods of a given customer, as show in your sample data.

Compare between values from the same table in postgresql

I have the following table:
id partid orderdate qty price
1 10 01/01/2017 10 3
2 10 02/01/2017 5 9
3 11 01/01/2017 0.5 0.001
4 145 02/01/2017 5 18
5 10 12/12/2016 8 7
6 10 05/07/2010 81 7.5
Basically I want to compare the most recent purchasing of parts to the other purchasing of the same part in a period of 24 months. For that matter compare id=2 to id = 1,5.
I want to check if the price of the latest orderdate (per part) is larger than the average price of that part in the last 24 months.
So first I need to calculate the avg price:
partid avgprice
10 (3+9+7)/3=6.33 (7.5 is out of range)
11 0.001
145 18
I also need to know the latest orderdate of each part:
id partid
2 10
3 11
4 145
and then I need to check if id=2, id=3, id=6 (latest purchases) are bigger than the average. If they are I need to return their partid.
So I should have something like this:
id partid avgprice lastprice
2 10 6.33 9
3 11 0.001 0.001
4 145 18 18
Finally I need to return partid=10 since 9>6.33
Now to my questions...
I'm not sure how I can find the latest order in PostgreSQL.
I tried:
select id, distinct partid,orderdate
from table
where orderdate> current_date - interval '24 months'
order by orderdate desc
This gives :
ERROR: syntax error at or near "distinct".
I'm a bit of a lost here. I know what I want to do but I cant translate it to SQL. Any one can help?
Get the avarage per part and the last order per price and join these:
select
lastorder.id,
lastorder.partid,
lastorder.orderdate,
lastorder.price as lastprice,
avgorder.price as avgprice
from
(
select
partid,
avg(price) as price
from mytable
where orderdate >= current_date - interval '24 months'
group by partid
) avgorder
join
(
select distinct on (partid)
id,
partid,
orderdate,
price
from mytable
order by partid, orderdate desc
) lastorder on lastorder.partid = avgorder.partid
and lastorder.price > avgorder.price;
This can be solved without distinct (which is heavy on the DB anyways):
with avg_price as (
select partid, avg(price) as price
from table
where orderdate> current_date - interval '24 months'
group by partid
)
select f.id, f.partid, av.price, f.price
from (
select id, partid, orderdate, price, rank() over (partition by partid order by orderdate desc)
from table
) as f
join avg_price av on f.partid = av.partid
where f.rank = 1
and av.price < f.price