Weekly active or lapsing status in BigQuery - sql

I want to see the status of a customer each week based on their activity.
If a customer has transacted in the last 7 days it should appear as active and if the customer has not transacted in 8-21 days it should appear as "lapsing".
I have these values in my table:
enter image description here
Desired output refrence:
Week# Customer_id Status

If you want a row for every combination of week and customer_id, you could create a large cross join of the distinct combinations of those two from your orders table, then match all orders back into that superset keeping the latest (that's before that date).
with base_table as (
select distinct customer_id, week_date
from orders
cross join (SELECT week_date FROM UNNEST(GENERATE_DATE_ARRAY((select min(order_date) from orders), CURRENT_DATE(), INTERVAL 7 DAY)) AS week_date)
)
select base_table.customer_id, base_table.week_date, max(order_date) as latest_order,
case
when DATE_DIFF(week_date,max(order_date),DAY) <= 7 then 'active'
when DATE_DIFF(week_date,max(order_date),DAY) >= 8 and DATE_DIFF(week_date,max(order_date),DAY) <= 21 then 'lapsing'
else 'not active'
end as status
from base_table
cross join orders
where orders.customer_id = base_table.customer_id
and order_date <= week_date
group by 1, 2

Related

How to join partitioned table with another one

Sorry for the newbie question, but I'm really having trouble with the following issue:
Say, I have this code in place:
WITH active_pass AS (SELECT DATE_TRUNC(fr.day, MONTH) AS month, id,
CASE
WHEN SUM(fr.imps) > 100 THEN 1
WHEN SUM(fr.imps) < 100 THEN 0
END AS active_or_passive
FROM table1 AS fr
WHERE day between (CURRENT_DATE() - 730) AND (CURRENT_DATE() - EXTRACT(DAY FROM CURRENT_DATE()))
GROUP BY month, id
ORDER BY month desc),
# summing the score for each customer (sum for the whole year)
active_pass_assigned AS (SELECT id, month,
SUM(SUM(active_or_passive)) OVER (PARTITION BY id ORDER BY month rows BETWEEN 3 PRECEDING AND 1 PRECEDING) AS trailing_act
FROM active_pass AS a
GROUP BY month, id
ORDER BY MONTH desc)
What it does is it creates a trailing total over the last 3 months to see how many of those last 3 month the customer was active. However, I have no idea how to join with the next table to get a sum of revenue that said client generated. What I tried is this:
SELECT c.id, DATE_TRUNC(day, MONTH) AS month, SUM(revenue) AS Rev, name
FROM table2 AS c
JOIN active_pass_assigned AS a
ON c.id = a.id
WHERE day between (CURRENT_DATE() - 365) AND (CURRENT_DATE() - EXTRACT(DAY FROM CURRENT_DATE()))
GROUP BY month, id, name
ORDER BY month DESC
However, it returns waaay higher values for Revenue than the actual ones and I have no idea why. Furthermore, could you please tell me how to join those two tables together so that I only get the customer's revenue on the months his activity was equal to 3?

Retrieve Customers with a Monthly Order Frequency greater than 4

I am trying to optimize the below query to help fetch all customers in the last three months who have a monthly order frequency +4 for the past three months.
Customer ID
Feb
Mar
Apr
0001
4
5
6
0002
3
2
4
0003
4
2
3
In the above table, the customer with Customer ID 0001 should only be picked, as he consistently has 4 or more orders in a month.
Below is a query I have written, which pulls all customers with an average purchase frequency of 4 in the last 90 days, but not considering there is a consistent purchase of 4 or more last three months.
Query:
SELECT distinct lines.customer_id Customer_ID, (COUNT(lines.order_id)/90) PurchaseFrequency
from fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY Customer_ID
HAVING PurchaseFrequency >=4;
I tried to use window functions, however not sure if it needs to be used in this case.
I would sum the orders per month instead of computing the avg and then retrieve those who have that sum greater than 4 in the last three months.
Also I think you should select your interval using "month(CURRENT_DATE()) - 3" instead of using a window of 90 days. Of course if needed you should handle the case of when current_date is jan-feb-mar and in that case go back to oct-nov-dec of the previous year.
I'm not familiar with Google BigQuery so I can't write your query but I hope this helps.
So I've found the solution to this using WITH operator as below:
WITH filtered_orders AS (
select
distinct customer_id ID,
extract(MONTH from date) Order_Month,
count(order_id) CountofOrders
from customer_order_lines` lines
where EXTRACT(YEAR FROM date) = 2022 AND EXTRACT(MONTH FROM date) IN (2,3,4)
group by ID, Order_Month
having CountofOrders>=4)
select distinct ID
from filtered_orders
group by ID
having count(Order_Month) =3;
Hope this helps!
An option could be first count the orders by month and then filter users which have purchases on all months above your threshold:
WITH ORDERS_BY_MONTH AS (
SELECT
DATE_TRUNC(lines.date, MONTH) PurchaseMonth,
lines.customer_id Customer_ID,
COUNT(lines.order_id) PurchaseFrequency
FROM fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY PurchaseMonth, Customer_ID
)
SELECT
Customer_ID,
AVG(PurchaseFrequency) AvgPurchaseFrequency
FROM ORDERS_BY_MONTH
GROUP BY Customer_ID
HAVING COUNT(1) = COUNTIF(PurchaseFrequency >= 4)

Filter customers with atleast 3 transactions a year for the past 2 years Presto/SQL

I have a table of customer transactions called cust_trans where each transaction made by a customer is stored as one row. I have another col called visit_date that contains the transaction date. I would like to filter the customers who transact atleast 3 times a year for the past 2 years.
The data looks like below
Id visit_date
---- ------
1 01/01/2019
1 01/02/2019
1 01/01/2019
1 02/01/2020
1 02/01/2020
1 03/01/2020
1 03/01/2020
2 01/02/2019
3 02/04/2019
I would like to know the customers who visited atleast 3 times every year for the past two years
ie. I want below output.
id
---
1
From the customer table only one person visited atleast 3 times for 2 years.
I tried with below query but it only checks if total visits greater than or equal to 3
select id
from
cust_scan
GROUP by
id
having count(visit_date) >= 3
and year(date(max(visit_date)))-year(date(min(visit_date))) >=2
I would appreciate any help, guidance or suggestions
One option would be to generate a list of distinct ids, cross join it with the last two years, and then bring the original table with a left join. You can then aggregate to count how many visits each id had each year. The final step is to aggregate again, and filter with a having clause
select i.id
from (
select i.id, y.yr, count(c.id) cnt
from (select distinct id from cust_scan) i
cross join (values
(date_trunc('year', current_date)),
(date_trunc('year', current_date) - interval '1' year)
) as y(yr)
left join cust_scan c
on i.id = c.id
and c.visit_date >= y.yr
and c.visit_date < y.yr + interval '1' year
group by i.id, y.yr
) t
group by i.id
having min(cnt) >= 3
Another option would be to use two correlated subqueries:
select distinct id
from cust_scan c
where
(
select count(*)
from cust_scan c1
where
c1.id = c.id
and c1.visit_date >= date_trunc('year', current_date)
and c1.visit_date < date_trunc('year', current_date) + interval '1' year
) >= 3
and (
select count(*)
from cust_scan c1
where
c1.id = c.id
and c1.visit_date >= date_trunc('year', current_date) - interval '1' year
and c1.visit_date < date_trunc('year', current_date)
) >= 3
I assume you mean calendar years. I think I would use two levels of aggregation:
select ct.id
from (select ct.id, year(visit_date) as yyyy, count(*) as cnt
from cust_trans ct
where ct.visit_date >= '2019-01-01' -- or whatever
group by ct.id
) ct
group by ct.id
having count(*) = 2 and -- both year
min(cnt) >= 3; -- at least three transactions
If you want the last two complete years, just change the where clause in the subquery.
You can use a similar idea -- of two aggregations -- if you want the last two years relative to the current date. That would be two full years, rather than 1 and some fraction of the current year.

SQL Get records past a certain date, higher than a certain value, with a minimum amount

I'm having a hard time with an SQL query at the moment. I have a list of customer orders, and I want to remove a set of them based on certain criteria:
We need to keep at least 6 of each customers' past orders on hand.
We need to keep all of the customers orders that occurred within the past 90 days.
We need to keep AT LEAST 1 of each customers orders that is older than 90 days (if the customer had 4 orders in the past 90 days, we'll need to keep the 2 from an earlier time to hit the 6 orders requirement.
So, for example, if a customer had 6 orders in the past 90 days, we would keep 7 of their orders (because we include the 1 order from older than 90 days).
If a customer had 21 orders in the past 90 days, we would keep 22 of their orders.
If a customer had 5 orders in the past 90 days, we would keep 6 of their orders.
Here is the query I am using to build a table of their orders:
INSERT INTO #OrdersToDelete
SELECT TempOrders.Site, TempOrders.Number, TempOrders.RowNumber, TempOrders.CustomerNumber
FROM (SELECT
ROW_NUMBER() OVER ( PARTITION BY CustomerNumber ORDER BY OrderDate DESC) AS 'RowNumber',
Number,
OrderDate,
CustomerNumber
FROM Orders
) TempOrders
LEFT OUTER JOIN (SELECT
ROW_NUMBER() OVER ( PARTITION BY CustomerNumber ORDER BY OrderDate DESC) AS 'RowNumber',
Number,
CustomerNumber
FROM SmartOrders
) SmartOrderOrders
ON TempOrders.Site = SmartOrderOrders.Site
AND TempOrders.Number = SmartOrderOrders.Number
WHERE
(DATEDIFF(dd, OrderDate, GETDATE()) > 90
This query returns a list of orders that are up for deletion (older than 90 days). In the WHERE clause, I can also check the order number, but I'm having difficulty figuring out how to exclude the customers first order after the 90 days period.
Any help would be appreciated.
--Get the rownumbers using a case expression in order by
--so all the orders within the last 90 days come first
WITH ROWNUMS AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY CustomerNumber
ORDER BY
CASE WHEN DATEDIFF(dd, OrderDate, GETDATE()) < 90 THEN 1 ELSE 0 END DESC,
OrderDate DESC) AS 'RowNumber',
Number,
OrderDate,
CustomerNumber
FROM Orders)
--Get the maximum rownumber per customer in the last 90 days
,MAXROWNUM AS (select CustomerNumber, MAX(rn) maxrn from ROWNUMS
where diff<=90
group by id)
--Join the previous cte's and get all the orders for a customer in the last 90 days
-- + one more row which is the latest before 90 days
SELECT r.*
FROM ROWNUMS r
JOIN MAXROWNUM c ON c.CustomerNumber=r.CustomerNumber
WHERE r.rn <= c.maxrn+1
--use r.rn <= case when c.maxrn <=5 then 5 else c.maxrn end + 1 to get atleast 6 orders per customer
Give this a shot.
Start off by creating 3 Common Table Expressions (CTEs). You can do them as nested subqueries but I find CTEs easier to read and manage, plus they're easier to explain.
WITH ninety_day_cte
AS
(SELECT temporders.site, temporders.number, temporders.customernumber, temporders.orderdate
FROM orders
WHERE
temporders.orderdate >= DATEADD(DAY,-ninety,GETDATE())),
ninety_day_count_cte
AS
(SELECT temporders.customernumber, COUNT(*) AS Order_Count
FROM orders
WHERE
temporders.orderdate >= DATEADD(DAY,-ninety,GETDATE())
GROUP BY
temporders.customernumber),
greater_ninety_day_cte
AS
(SELECT temporders.site, temporders.number, temporders.customernumber, temporders.orderdate,
ROW_NUMBER() OVER(PARTITION BY temporders.customernumber ORDER BY temporders.orderdate DESC) AS Row_Number
FROM orders
WHERE
temporders.orderdate < DATEADD(DAY,-ninety,GETDATE()))
The first CTE, ninety_day_cte will grab all the orders within the past 90 days - we need this for all customers and we need all orders. Simple, we can set this one aside.
The second CTE, ninety_day_count_cte is used to determine the total count of orders per customer within the last 90 days. We need to know this number to determine how many orders older than 90 days we need to grab.
The third CTE, greater_ninety_day_cte will grab all orders older than 90 days. We add the ROW_NUMBER() to rank the orders per customer by order date - this will help us grab the orders we need for the past 90 days.
Now we need to add the query that will grab the orders for the past 90 days:
SELECT site, number, customernumber, orderdate
FROM greater_ninety_day_cte AS g
LEFT JOIN ninety_day_count AS c
ON g.customernumber = c.customernumber
WHERE
g.Row_Number <= CASE
WHEN CASE WHEN c.Order_Count IS NULL THEN 0 ELSE c.Order_Count END > 6 THEN 1
ELSE (6 - CASE WHEN c.Order_Count IS NULL THEN 0 ELSE c.Order_Count END)
END
This uses the 2nd and 3rd CTEs. We use a LEFT JOIN so we grab data for customers who only have orders older than 90 days. The WHERE clause takes the Row_Number from the 3rd CTE and compares it to the Order_Count from the 2nd CTE. The CASE clauses state that if the Order_Count (Count of orders in the past 90 days) is greater than 6 we only want to pull the Row_Number >= 1, but if the Order_Count is less than 6 then we want to pull the difference (6 - Order_Count). This should get all the orders older than 90 day that meet the requirements.
Now we only need to get the orders that are less than 90 days. This is easily done with a UNION ALL statement using the 1st CTE:
UNION ALL
SELECT site, number, customernumber, orderdate
FROM ninety_day_cte
That should get you all the results you need. At least 6 orders and at least 1 order older than 90 days.
Here's the full query altogether:
WITH ninety_day_cte
AS
(SELECT temporders.site, temporders.number, temporders.customernumber, temporders.orderdate
FROM orders
WHERE
temporders.orderdate >= DATEADD(DAY,-ninety,GETDATE())),
ninety_day_count_cte
AS
(SELECT temporders.customernumber, COUNT(*) AS Order_Count
FROM orders
WHERE
temporders.orderdate >= DATEADD(DAY,-ninety,GETDATE())
GROUP BY
temporders.customernumber),
greater_ninety_day_cte
AS
(SELECT temporders.site, temporders.number, temporders.customernumber, temporders.orderdate,
ROW_NUMBER() OVER(PARTITION BY temporders.customernumber ORDER BY temporders.orderdate DESC) AS Row_Number
FROM orders
WHERE
temporders.orderdate < DATEADD(DAY,-ninety,GETDATE()))
SELECT site, number, customernumber, orderdate
FROM greater_ninety_day_cte AS g
LEFT JOIN ninety_day_count AS c
ON g.customernumber = c.customernumber
WHERE
g.Row_Number <= CASE
WHEN CASE WHEN c.Order_Count IS NULL THEN 0 ELSE c.Order_Count END > 6 THEN 1
ELSE (6 - CASE WHEN c.Order_Count IS NULL THEN 0 ELSE c.Order_Count END)
END
UNION ALL
SELECT site, number, customernumber, orderdate
FROM ninety_day_cte

Count and sum per day multiple tables without join

I want count the number of orders invoices delivery and sum the amount of orders invoices delivery per day.
Like this:
date nb orders orders$ nb delivery
day1 5 1234,56 3
day2 6 665,88 7
..
The first time I tried this, it was ok for one day but not for a week, for example:
SELECT
(SELECT COUNT(OPP.OPPNUM_0) FROM OPPOR OPP WHERE OPP.CREDAT_0=%1%),
(SELECT SUM(OPP.OPPAMT_0) FROM OPPOR OPP WHERE OPP.CREDAT_0=%1%),
(SELECT COUNT(SQH.SQHNUM_0) FROM SQUOTE SQH WHERE SQH.CREDAT_0=%1%),
(SELECT SUM(SQH.YCUMHTSEL_0) FROM SQUOTE SQH WHERE SQH.CREDAT_0=%1%),
(SELECT COUNT(SOH.SOHNUM_0) FROM SORDER SOH WHERE SOH.CREDAT_0=%1%),
(SELECT SUM(SOH.ORDNOT_0) FROM SORDER SOH WHERE SOH.CREDAT_0=%1%)
FROM dual
MySQL provides built-in query for filtering by day:
SELECT COUNT(*) FROM table_name WHERE anydatefiled >= NOW() - INTERVAL 1 DAY