Retrieve Customers with a Monthly Order Frequency greater than 4 - sql

I am trying to optimize the below query to help fetch all customers in the last three months who have a monthly order frequency +4 for the past three months.
Customer ID
Feb
Mar
Apr
0001
4
5
6
0002
3
2
4
0003
4
2
3
In the above table, the customer with Customer ID 0001 should only be picked, as he consistently has 4 or more orders in a month.
Below is a query I have written, which pulls all customers with an average purchase frequency of 4 in the last 90 days, but not considering there is a consistent purchase of 4 or more last three months.
Query:
SELECT distinct lines.customer_id Customer_ID, (COUNT(lines.order_id)/90) PurchaseFrequency
from fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY Customer_ID
HAVING PurchaseFrequency >=4;
I tried to use window functions, however not sure if it needs to be used in this case.

I would sum the orders per month instead of computing the avg and then retrieve those who have that sum greater than 4 in the last three months.
Also I think you should select your interval using "month(CURRENT_DATE()) - 3" instead of using a window of 90 days. Of course if needed you should handle the case of when current_date is jan-feb-mar and in that case go back to oct-nov-dec of the previous year.
I'm not familiar with Google BigQuery so I can't write your query but I hope this helps.

So I've found the solution to this using WITH operator as below:
WITH filtered_orders AS (
select
distinct customer_id ID,
extract(MONTH from date) Order_Month,
count(order_id) CountofOrders
from customer_order_lines` lines
where EXTRACT(YEAR FROM date) = 2022 AND EXTRACT(MONTH FROM date) IN (2,3,4)
group by ID, Order_Month
having CountofOrders>=4)
select distinct ID
from filtered_orders
group by ID
having count(Order_Month) =3;
Hope this helps!

An option could be first count the orders by month and then filter users which have purchases on all months above your threshold:
WITH ORDERS_BY_MONTH AS (
SELECT
DATE_TRUNC(lines.date, MONTH) PurchaseMonth,
lines.customer_id Customer_ID,
COUNT(lines.order_id) PurchaseFrequency
FROM fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY PurchaseMonth, Customer_ID
)
SELECT
Customer_ID,
AVG(PurchaseFrequency) AvgPurchaseFrequency
FROM ORDERS_BY_MONTH
GROUP BY Customer_ID
HAVING COUNT(1) = COUNTIF(PurchaseFrequency >= 4)

Related

How to join partitioned table with another one

Sorry for the newbie question, but I'm really having trouble with the following issue:
Say, I have this code in place:
WITH active_pass AS (SELECT DATE_TRUNC(fr.day, MONTH) AS month, id,
CASE
WHEN SUM(fr.imps) > 100 THEN 1
WHEN SUM(fr.imps) < 100 THEN 0
END AS active_or_passive
FROM table1 AS fr
WHERE day between (CURRENT_DATE() - 730) AND (CURRENT_DATE() - EXTRACT(DAY FROM CURRENT_DATE()))
GROUP BY month, id
ORDER BY month desc),
# summing the score for each customer (sum for the whole year)
active_pass_assigned AS (SELECT id, month,
SUM(SUM(active_or_passive)) OVER (PARTITION BY id ORDER BY month rows BETWEEN 3 PRECEDING AND 1 PRECEDING) AS trailing_act
FROM active_pass AS a
GROUP BY month, id
ORDER BY MONTH desc)
What it does is it creates a trailing total over the last 3 months to see how many of those last 3 month the customer was active. However, I have no idea how to join with the next table to get a sum of revenue that said client generated. What I tried is this:
SELECT c.id, DATE_TRUNC(day, MONTH) AS month, SUM(revenue) AS Rev, name
FROM table2 AS c
JOIN active_pass_assigned AS a
ON c.id = a.id
WHERE day between (CURRENT_DATE() - 365) AND (CURRENT_DATE() - EXTRACT(DAY FROM CURRENT_DATE()))
GROUP BY month, id, name
ORDER BY month DESC
However, it returns waaay higher values for Revenue than the actual ones and I have no idea why. Furthermore, could you please tell me how to join those two tables together so that I only get the customer's revenue on the months his activity was equal to 3?

Create a two-weeks window frame

I have a dataset that's just a list of orders made by customers each day.
order_date
month
week
customer
2022-10-06
10
40
Paul
2022-10-06
10
40
Edward
2022-10-01
10
39
Erick
2022-09-26
9
39
Divine
2022-09-23
9
38
Alice
2022-09-21
9
38
Evelyn
My goal is to calculate the total number of unique customers within a two-week period. I can count the number of customers within a month or week period but not two weeks. Also, the two weeks are in a rolling order such that weeks 40 and 39 (as in the sample above) is one window period while weeks 39 and 38 is the next frame.
So far, this is how I am getting the monthly and weekly numbers. Assume that the customer names are distinct per day.
select order_date,
month,
week,
COUNT(DISTINCT customer) over (partition by month) month_active_outlets,
COUNT(DISTINCT customer) OVER (partition by week) week active outlets,
from table
Again, I am unable to calculate the unique customer names within a two-week period.
I think the easiest would be to create your own grouper in a subquery and then use that to get to your count. Currently, COUNT UNIQUE and ORDER BY in the window is not supported, therefore that approach wouldn't work.
A possible query could be:
WITH
week_before AS (
SELECT
EXTRACT(WEEK from order_date) as week, --to be sure this is the same week format
month,
CONCAT(week,'-', EXTRACT(WEEK FROM DATE_SUB(order_date, INTERVAL 7 DAY))) AS two_weeks,
customer
FROM
`test`.`Basic`)
SELECT
two_weeks,
COUNT(DISTINCT customer) AS unique_customer
FROM
week_before
GROUP BY
two_weeks
The window function is the right tool. To obtain the 2 week date, we first extract the week number of the year:
mod(extract(week from order_date),2)
If the week number is odd (modulo 2) we add a week. Then we trunc to the start of (the even) week.
date_trunc(date_add(order_date,interval mod(extract(week from order_date),2) week),week )
with tbl as
(Select date("2022-10-06") as order_date, "Paul" as customer
union all select date("2022-10-06"),"Edward"
union all select date("2022-10-01"),"Erick"
union all select date("2022-09-26"),"Divine"
union all select date("2022-09-23"),"Alice"
union all select date("2022-09-21"),"Evelyn"
)
select *,
date_trunc(order_date,month) as month,
date_trunc(order_date,week) as week,
COUNT(DISTINCT customer) OVER week2 as customer_2weeks,
string_agg(cast(order_date as string)) over week2 as list_2weeks,
from tbl
window week2 as (partition by date_trunc(date_add(order_date,interval mod(extract(week from order_date),2) week),week ))
The first days of a year are counted to the last week of the previous year:
select order_date,
extract(isoweek from order_date),
date_trunc(date_add(order_date,interval mod(extract(week from order_date),2) week),week)
from
unnest(generate_date_array(date("2021-12-01"),date("2023-01-14"))) order_date
order by 1

How to get number of billable customers per month SQL

This is what my table looks like:
NOTE: Don't worry about the BMI field being empty in some rows. We assume that each row is a reading. I have omitted some columns for privacy reasons.
I want to get a count of the number of active customers per month. A customer is active if they have at least 18 readings in total (1 reading per day for 18 days in a given month). How do I write this SQL query? Assume the table name is 'cust'. I'm using SQL Server. Any help is appreciated.
Presumably a patient is a customer in your world. If so, you can use two levels of aggregation:
select yyyy, mm, count(*)
from (select year(createdat) as yyyy, month(createdat) as mm,
patient_id,
count(distinct convert(date, createdat)) as num_days
from t
group by year(createdat), month(createdat), patient_id
) ymp
where num_days >= 18
group by yyyy, mm;
You need to group by patient and the month, then group again by just the month
SELECT
mth,
COUNT(*) NumPatients
FROM (
SELECT
EOMONTH(c.createdat) mth
FROM cust c
GROUP BY EOMONTH(c.createdat), c.patient_id
HAVING COUNT(*) >= 18
-- for distinct days you could change it to:
-- HAVING COUNT(DISTINCT CAST(c.createdat AS date)) >= 18
) c
GROUP BY mth;

I want find customers transacting for any consecutive 3 months from year 2017 to 2018

I want to know the trick to find the list of customers who are transacting for consecutive 3 months ,that could be any 3 consecutive months with any number of occurrence.
example: suppose there is customer who transact in January then keep transacting till march then he stopped transacting.I want the list of these customer from my database .
I am working on AWS Athena.
One method uses aggregation and window functions:
select customer_id, yyyymm_2
from (select date_trunc(month, transactdate) as yyyymm, customer_id,
lag(date_trunc(month, transactdate), 2) over (partition by customer_id order by date_trunc(month, transactdate)) as prev_yyyymm_2
from t
where transactdate >= '2017-01-01' and
transactadte < '2019-01-01'
)
where prev_dt_2 = yyyymm - interval '2' month;
This aggregates transactions by month and looks at the transaction date two rows earlier. The outer filter checks that that date is exactly 2 months earlier.

Count and sum per day multiple tables without join

I want count the number of orders invoices delivery and sum the amount of orders invoices delivery per day.
Like this:
date nb orders orders$ nb delivery
day1 5 1234,56 3
day2 6 665,88 7
..
The first time I tried this, it was ok for one day but not for a week, for example:
SELECT
(SELECT COUNT(OPP.OPPNUM_0) FROM OPPOR OPP WHERE OPP.CREDAT_0=%1%),
(SELECT SUM(OPP.OPPAMT_0) FROM OPPOR OPP WHERE OPP.CREDAT_0=%1%),
(SELECT COUNT(SQH.SQHNUM_0) FROM SQUOTE SQH WHERE SQH.CREDAT_0=%1%),
(SELECT SUM(SQH.YCUMHTSEL_0) FROM SQUOTE SQH WHERE SQH.CREDAT_0=%1%),
(SELECT COUNT(SOH.SOHNUM_0) FROM SORDER SOH WHERE SOH.CREDAT_0=%1%),
(SELECT SUM(SOH.ORDNOT_0) FROM SORDER SOH WHERE SOH.CREDAT_0=%1%)
FROM dual
MySQL provides built-in query for filtering by day:
SELECT COUNT(*) FROM table_name WHERE anydatefiled >= NOW() - INTERVAL 1 DAY