Number of Customer Purchases in Their First Month - sql

I have a list of customer orders. I can easily calculate the month and year of first purchase for each customer (e.g. customer 1 had their first purchase in Sept 2021, customer 2 had their first purchase in Oct 2021, etc.). What I want to add is an additional column that counts the number of purchases a customer made in their first month.
Existing data table (Orders):
OrderId
CustomerId
OrderDate
1
1
9/15/2021
2
1
10/15/2021
3
1
11/1/2021
4
2
10/1/2021
5
2
10/6/2021
6
2
10/7/2021
7
2
11/9/2021
8
3
11/15/2021
Desired output:
CustomerId
FirstOrderMonth
FirstOrderYear
FirstMonthPurchaseCount
1
9
2021
1
2
10
2021
3
3
11
2021
1
I was thinking something like this for the first three columns:
SELECT o.CustomerId,
MONTH(MIN(o.OrderDate)) as FirstOrderMonth,
YEAR(MIN(o.OrderDate)) as FirstOrderYear
FROM Orders o
GROUP BY o.CustomerId
I am not sure how to approach the final column and was hoping for some help.

Aggregate by the customer's id, the year and the month of the order and use window functions to get the year and month of the 1st order and the count of that 1st month:
SELECT DISTINCT CustomerId,
FIRST_VALUE(MONTH(OrderDate)) OVER (PARTITION BY CustomerId ORDER BY YEAR(OrderDate), MONTH(OrderDate)) FirstOrderMonth,
MIN(YEAR(OrderDate)) OVER (PARTITION BY CustomerId) FirstOrderYear,
FIRST_VALUE(COUNT(*)) OVER (PARTITION BY CustomerId ORDER BY YEAR(OrderDate), MONTH(OrderDate)) FirstMonthPurchaseCount
FROM Orders
GROUP BY CustomerId, YEAR(OrderDate), MONTH(OrderDate);
See the demo.

You may use the RANK() function to identify the first month purchases for each user as the following:
Select D.CustomerId, MONTH(OrderDate) FirstOrderMonth,
YEAR(OrderDate) FirstOrderYear, COUNT(*) FirstMonthPurchaseCount
From
(
Select *, RANK() Over (Partition By CustomerId Order By YEAR(OrderDate), MONTH(OrderDate)) rnk
From table_name
) D
Where D.rnk = 1
Group By D.CustomerId, MONTH(OrderDate), YEAR(OrderDate)
See a demo.
If you want to find second, third ... month purchases, you may use the DENSE_RANK() function instead of RANK() and change the value in the where clause to the required month order.

select CustomerId
,min(month(OrderDate)) as FirstOrderMonth
,min(year(OrderDate)) as FirstOrderYear
,count(first_month_flag) as FirstMonthPurchaseCount
from (select *
,case when month(OrderDate) = month(min(OrderDate) over(partition by CustomerId)) then 1 end as first_month_flag
from Orders) Orders
group by CustomerId
CustomerId
FirstOrderMonth
FirstOrderYear
FirstMonthPurchaseCount
1
9
2021
1
2
10
2021
3
3
11
2021
1
Fiddle

Related

Calculating average time between customer orders and average order value in Postgres

In PostgreSQL I have an orders table that represents orders made by customers of a store:
SELECT * FROM orders
order_id
customer_id
value
created_at
1
1
188.01
2020-11-24
2
2
25.74
2022-10-13
3
1
159.64
2022-09-23
4
1
201.41
2022-04-01
5
3
357.80
2022-09-05
6
2
386.72
2022-02-16
7
1
200.00
2022-01-16
8
1
19.99
2020-02-20
For a specified time range (e.g. 2022-01-01 to 2022-12-31), I need to find the following:
Average 1st order value
Average 2nd order value
Average 3rd order value
Average 4th order value
E.g. the 1st purchases for each customer are:
for customer_id 1, order_id 8 is their first purchase
customer 2, order 6
customer 3, order 5
So, the 1st-purchase average order value is (19.99 + 386.72 + 357.80) / 3 = $254.84
This needs to be found for the 2nd, 3rd and 4th purchases also.
I also need to find the average time between purchases:
order 1 to order 2
order 2 to order 3
order 3 to order 4
The final result would ideally look something like this:
order_number
AOV
av_days_since_last_order
1
254.84
0
2
300.00
28
3
322.22
21
4
350.00
20
Note that average days since last order for order 1 would always be 0 as it's the 1st purchase.
Thanks.
select order_number
,round(avg(value),2) as AOV
,coalesce(round(avg(days_between_orders),0),0) as av_days_since_last_order
from
(
select *
,row_number() over(partition by customer_id order by created_at) as order_number
,created_at - lag(created_at) over(partition by customer_id order by created_at) as days_between_orders
from t
) t
where created_at between '2022-01-01' and '2022-12-31'
group by order_number
order by order_number
order_number
aov
av_days_since_last_order
1
372.26
0
2
25.74
239
3
200.00
418
4
201.41
75
5
159.64
175
Fiddle
Im suppose it should be something like this
WITH prep_data AS (
SELECT order_id,
cuntomer_id,
ROW_NUMBER() OVER(PARTITION BY order_id, cuntomer_id ORDER BY created_at) AS pushcase_num,
created_at,
value
FROM pushcases
WHERE created_at BETWEEN :date_from AND :date_to
), prep_data2 AS (
SELECT pd1.order_id,
pd1.cuntomer_id,
pd1.pushcase_num
pd2.created_at - pd1.created_at AS date_diff,
pd1.value
FROM prep_data pd1
LEFT JOIN prep_data pd2 ON (pd1.order_id = pd2.order_id AND pd1.cuntomer_id = pd2.cuntomer_id AND pd1.pushcase_num = pd2.pushcase_num+1)
)
SELECT order_id,
cuntomer_id,
pushcase_num,
avg(value) AS avg_val,
avg(date_diff) AS avg_date_diff
FROM prep_data2
GROUP BY pushcase_num

MS SQL | How to query a filtered column (WHERE) with non filtered data

I have a problem solving an MS SQL query.
in summary, the query should get the date column as two columns, year and month, the count of other columns, the sum of total of a column, and a filtered sum column.
what I struggled with was adding the filtered sum column.
a sample data, Test:
customerID, 1,2,3,4...
InvoiceID, 1234551, 1234552...
ProductID, A, B, C...
Date, Datetime
Income, int
customerID
InvoiceID
ProductID
Date
Income
1
1234551
A
01/01/2015
300
2
1234552
B
02/01/2016
300
I have a solution, but I am sure there is a more simple solution.
WITH CTE_1 AS
(
SELECT Date,
COUNT(DISTINCT Test.customerID) AS customers,
COUNT(Test.InvoiceID) AS Invoices,
COUNT(Test.ProductID) AS Products,
Sum(Income) AS Total_Income,
ISNULL((SELECT Sum(Income) AS Income_A FROM Test ts WHERE ProductID = 'A' AND ts.Date = Test.Date),0) AS Total_Income_A
FROM Test
GROUP BY Test.Date
)
SELECT YEAR(Date) AS Year,
MONTH(Date) AS Month,
Sum(customers) AS customers,
Sum(Invoices) AS Invoices,
Sum(Products) AS Products,
Sum(Total_Income) AS Total_Income,
Sum(Total_Income_A) AS Total_Income_A
FROM CTE_1
GROUP BY YEAR(Date), MONTH(Date)
ORDER BY YEAR(Date), MONTH(Date)
to produce:
Year, 2015, 2016...
Month, 1, 2, ...
customers, int
Invoices, int
Products, int
Total_Income, int
Total_Income_A, int
Year
Month
customers
Invoices
Products
Total_Income
Total_Income_A
2015
1
3
4
4
1600
600
2015
2
1
1
1
1200
0
Thanks!
Nir
You can directly apply a Conditional Aggregation such as
SELECT YEAR(Date) AS Year,
MONTH(Date) AS Month,
COUNT(DISTINCT customerID) AS customers,
COUNT(DISTINCT InvoiceID) AS Invoices,
COUNT(ProductID) AS Products,
SUM(Income) AS Total_Income,
ISNULL(SUM(CASE WHEN ProductID = 'A' THEN Income END),0) AS Total_Income_A
FROM Test
GROUP BY YEAR(Date), MONTH(Date)
ORDER BY YEAR(Date), MONTH(Date)
Demo

SQL to identify customers who placed more than X orders in a given year

Here's the table I am working with:
customer_id order_id order_date
101 1 2016-12-11
102 2 2016-12-13
101 3 2017-12-14
103 4 2017-12-15
... ... ...
I need a SQL to find out how many customers made more than X purchases in 2016 and 2017.
I've gotten the proper answer for it being customer 101, with this code:
select
customer_id
from
(
select
year(order_date) as order_date_year,
customer_id,
count(*) as number_of_orders
from
cust_orders
group by
year(order_date),
customer_id
having
count(*) >= 3
) as t
group by
order_date_year,
customer_id
But this doesn't solve for specific years being more than X.
You need 2 levels of aggregation:
select c.customer_id
from (
select customer_id
from cust_orders
where year(order_date) in (2016, 2017)
group by customer_id, year(order_date)
having count(*) >= 10
) c
group by c.customer_id
having count(*) = 2;
Replace 10 with the number of purchases.
Change 2 to the number of years that you want to search for.
See the demo.
You can use aggregation with a having clause to get the counts by year and customer. Then aggregate again and count the years:
select customer_id
from (select customer_id, year(order_date) as year
from cust_orders co
group by customer_id, year(order_date)
having count(*) >= X
) x
where year in (2016, 2017)
group by customer_id
having count(*) = 1;

Unable to resolve Rank Over Partition with multiple variables

I am trying to analyse a bunch of transaction data and have set up a series of different ranks to help me. The one I can't get right is the beneficiary rank. I want it to partition where there is a change in beneficiary chronologically rather than alphabetically.
Where the same beneficiary is paid from January to March and then again in June I would like the June to be classed a separate 'session'.
I am using Teradata SQL if that makes a difference.
I thought the solution was going to be a DENSE_RANK but if I PARTITION BY (CustomerID, Beneficiary) ORDER BY SystemDate it counts up the number of months. If I PARTITION BY (CustomerID) ORDER BY Beneficiary then it is not chronological, I need the highest rank to be the latest Beneficiary.
SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
,RANK() OVER(PARTITION BY CustomerID ORDER BY SystemDate ASC) AS PaymentRank
,RANK() OVER(PARTITION BY CustomerID ORDER BY PaymentMonth ASC) AS MonthRank
,RANK() OVER(PARTITION BY CustomerID , Beneficiary ORDER BY SystemDate ASC) AS Beneficiary
,RANK() OVER(PARTITION BY CustomerID , Beneficiary, ROUND(TRNSCN_AMOUNT, 0) ORDER BY SYSTEM_DATE ASC) AS TransRank
FROM table ORDER BY CustomerID, PaymentRank
CustomerID Beneficiary Amount DateStamp Month PaymentRank MonthRank BeneficiaryRank TransactionRank
a aa 10 Jan 1 1 1 1
a aa 20 Feb 2 2 2 1
a aa 20 Mar 3 3 3 2
a aa 20 Apr 4 4 4 3
a bb 20 May 5 5 1 1
a bb 30 Jun 6 6 2 1
a aa 30 Jul 7 7 5 2
a aa 30 Aug 8 8 6 1
a cc 5 Sep 9 9 1 1
a cc 5 Oct 10 10 2 2
a cc 5 Nov 11 11 3 3
b cc 5 Dec 1 1 1 1
This is what I have so far, I want a column alongside this which will look like the below
CustomerID Beneficiary Amount DateStamp Month NewRank
a aa 10 Jan 1
a aa 20 Feb 1
a aa 20 Mar 1
a aa 20 Apr 1
a bb 20 May 2
a bb 30 Jun 2
a aa 30 Jul 3
a aa 30 Aug 3
a cc 5 Sep 4
a cc 5 Oct 4
a cc 5 Nov 4
b cc 5 Dec 1
This is a type of gaps-and-islands problem. I would recommend lag() and a cumulative sum:
select t.*,
sum(case when prev_systemdate > systemdate - interval '1' month then 0 else 1 end) over (partition by customerid, beneficiary order by systemdate)
from (select t.*,
lag(systemdate) over (partition by customerid, beneficiary order by systemdate) as prev_systemdate
from t
) t
Credits to #Gordon and #dnoeth for providing the ideas and code to get me on the right track.
The below is mostly ripped from dnoeth but needed to add ROWS unbounded preceding to get the aggregation correct. Without this it was just showing the total for the partition. I also changed the systemdate to paymentrank as I had to fiddle about a bit with duplicate entries on a day.
SELECT dt.*,
-- now do a Cumulative Sum over those 0/1
SUM(flag) OVER(PARTITION BY CustomerID ORDER BY PaymentRank ASC ROWS UNBOUNDED PRECEDING) AS NewRank
FROM
(
SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
-- assign a 0 if current & previous Beneficiary are the same, otherwise 1
,CASE WHEN Beneficiary = MIN(Beneficiary) OVER (PARTITION BY CustomerID ORDER BY PaymentRank ASC ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) THEN 0 ELSE 1 END AS Flag ) AS dt
ORDER BY CustomerID, PaymentRank
The inner query sets a flag whenever the beneficiary changes. The outer query then does a cumulative sum on those.
I was unsure what the unbounded preceding was doing and #dnoeth has a great explanation here Below is taken from that explanation.
•UNBOUNDED PRECEDING, all rows before the current row -> fixed
•UNBOUNDED FOLLOWING, all rows after the current row -> fixed
•x PRECEDING, x rows before the current row -> relative
•y FOLLOWING, y rows after the current row -> relative
SELECT dt.*,
-- now do a Cumulative Sum over those 0/1
SUM(flag)
OVER(PARTITION BY CustomerID
ORDER BY SystemDate ASC
,flag DESC -- needed if the order by columns are not unique
ROWS UNBOUNDED PRECEDING) AS NewRank
FROM
(
SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
,RANK() OVER(PARTITION BY CustomerID ORDER BY SystemDate ASC) AS PaymentRank
,RANK() OVER(PARTITION BY CustomerID ORDER BY PaymentMonth ASC) AS MonthRank
,RANK() OVER(PARTITION BY CustomerID , Beneficiary ORDER BY SystemDate ASC) AS Beneficiary
,RANK() OVER(PARTITION BY CustomerID , Beneficiary, ROUND(TRNSCN_AMOUNT, 0) ORDER BY SYSTEM_DATE ASC) AS TransRank
-- assign a 0 if current & previous Beneficiary are the same, otherwise 1
,CASE WHEN Beneficiary = LAG(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate) THEN 0 ELSE 1 END AS flag
FROM table
) AS dt
ORDER BY CustomerID, PaymentRank
Your problem with Gordon's query is probably caused by your Teradata release, LAG is only supported in 16.10+. But there's a simple workaround:
LAG(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate)
--is equivalent to
MIN(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING))

SQL Query: Find highest revenue month/year for a customer

I'm looking to query the database to find highest revenue month for all the customers in the system. I have got the query working to pull customers monthly revenue from all the years for which the data is present. But I'm struggling to figure out how to get highest revenue month-year from this data.
The database is SQL Server 2008 R2.
The columns are: Customer name, Year, Month, and Revenue.
I even tried using Row_Number() and tried partitioning by customer name/year and ordering by revenue. But it didn't work. Maybe I'm making some mistake there.
Here's how I tried to build the base query.
Select Customer, Year(orderdatetime) as Year, Month(orderdatetime) as Month, SUM(Revenue)
From Orders
Group By Customer, Year(orderdatetime), Month(orderdatetime)
This is how I tried to use Row_Number()
WITH Max_Revenue AS
(
Select Customer, Year(orderdatetime) as Year, Month(orderdatetime) as Month, SUM(Revenue), RowNumber = ROW_NUMBER() OVER(PARTITION By Year Order By Revenue DESC)
From Orders
Group By Customer, Year(orderdatetime), Month(orderdatetime)
)
Select Max_Revenue.Customer, Max_Revenue.Year, Max_Revenue.Month, Max_Revenue.Revenue
From Max_Revenue
Where Max_Revenue.RowNumber = 1
Order By Max_Revenue.Customer asc
The data I get back is like:
Customer Month Year Revenue
ABC 2 2012 100
ABC 3 2013 150
ABC 5 2012 200
XYZ 4 2011 500
XYZ 6 2012 650
XYZ 7 2012 800
What I want as the output is
Customer Month Year Revenue
ABC 5 2012 200
XYZ 7 2012 800
So every customer's best month and respective year in terms of revenue.
SELECT Customer,
Year,
Revenue,
Month
FROM (
SELECT Customer,
Year,
ROW_NUMBER() OVER(PARTITION By Customer Order By Revenue DESC) as rank,
Revenue,
Month
FROM (
Select Customer,
Year(orderdatetime) as Year,
Month(orderdatetime) as Month,
SUM(Revenue) as Revenue
From Orders
Group By
Customer,
Year(orderdatetime),
Month(orderdatetime)
) BS
GROUP BY Customer,
Year,
Month) BS2
WHERE BS2.rank = 1
OR change = ROW_NUMBER() OVER(PARTITION By Year Order By Revenue DESC to
= ROW_NUMBER() OVER(PARTITION By Customer Order By Revenue DESC