Filter customers with atleast 3 transactions a year for the past 2 years Presto/SQL - sql

I have a table of customer transactions called cust_trans where each transaction made by a customer is stored as one row. I have another col called visit_date that contains the transaction date. I would like to filter the customers who transact atleast 3 times a year for the past 2 years.
The data looks like below
Id visit_date
---- ------
1 01/01/2019
1 01/02/2019
1 01/01/2019
1 02/01/2020
1 02/01/2020
1 03/01/2020
1 03/01/2020
2 01/02/2019
3 02/04/2019
I would like to know the customers who visited atleast 3 times every year for the past two years
ie. I want below output.
id
---
1
From the customer table only one person visited atleast 3 times for 2 years.
I tried with below query but it only checks if total visits greater than or equal to 3
select id
from
cust_scan
GROUP by
id
having count(visit_date) >= 3
and year(date(max(visit_date)))-year(date(min(visit_date))) >=2
I would appreciate any help, guidance or suggestions

One option would be to generate a list of distinct ids, cross join it with the last two years, and then bring the original table with a left join. You can then aggregate to count how many visits each id had each year. The final step is to aggregate again, and filter with a having clause
select i.id
from (
select i.id, y.yr, count(c.id) cnt
from (select distinct id from cust_scan) i
cross join (values
(date_trunc('year', current_date)),
(date_trunc('year', current_date) - interval '1' year)
) as y(yr)
left join cust_scan c
on i.id = c.id
and c.visit_date >= y.yr
and c.visit_date < y.yr + interval '1' year
group by i.id, y.yr
) t
group by i.id
having min(cnt) >= 3
Another option would be to use two correlated subqueries:
select distinct id
from cust_scan c
where
(
select count(*)
from cust_scan c1
where
c1.id = c.id
and c1.visit_date >= date_trunc('year', current_date)
and c1.visit_date < date_trunc('year', current_date) + interval '1' year
) >= 3
and (
select count(*)
from cust_scan c1
where
c1.id = c.id
and c1.visit_date >= date_trunc('year', current_date) - interval '1' year
and c1.visit_date < date_trunc('year', current_date)
) >= 3

I assume you mean calendar years. I think I would use two levels of aggregation:
select ct.id
from (select ct.id, year(visit_date) as yyyy, count(*) as cnt
from cust_trans ct
where ct.visit_date >= '2019-01-01' -- or whatever
group by ct.id
) ct
group by ct.id
having count(*) = 2 and -- both year
min(cnt) >= 3; -- at least three transactions
If you want the last two complete years, just change the where clause in the subquery.
You can use a similar idea -- of two aggregations -- if you want the last two years relative to the current date. That would be two full years, rather than 1 and some fraction of the current year.

Related

Retrieve Customers with a Monthly Order Frequency greater than 4

I am trying to optimize the below query to help fetch all customers in the last three months who have a monthly order frequency +4 for the past three months.
Customer ID
Feb
Mar
Apr
0001
4
5
6
0002
3
2
4
0003
4
2
3
In the above table, the customer with Customer ID 0001 should only be picked, as he consistently has 4 or more orders in a month.
Below is a query I have written, which pulls all customers with an average purchase frequency of 4 in the last 90 days, but not considering there is a consistent purchase of 4 or more last three months.
Query:
SELECT distinct lines.customer_id Customer_ID, (COUNT(lines.order_id)/90) PurchaseFrequency
from fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY Customer_ID
HAVING PurchaseFrequency >=4;
I tried to use window functions, however not sure if it needs to be used in this case.
I would sum the orders per month instead of computing the avg and then retrieve those who have that sum greater than 4 in the last three months.
Also I think you should select your interval using "month(CURRENT_DATE()) - 3" instead of using a window of 90 days. Of course if needed you should handle the case of when current_date is jan-feb-mar and in that case go back to oct-nov-dec of the previous year.
I'm not familiar with Google BigQuery so I can't write your query but I hope this helps.
So I've found the solution to this using WITH operator as below:
WITH filtered_orders AS (
select
distinct customer_id ID,
extract(MONTH from date) Order_Month,
count(order_id) CountofOrders
from customer_order_lines` lines
where EXTRACT(YEAR FROM date) = 2022 AND EXTRACT(MONTH FROM date) IN (2,3,4)
group by ID, Order_Month
having CountofOrders>=4)
select distinct ID
from filtered_orders
group by ID
having count(Order_Month) =3;
Hope this helps!
An option could be first count the orders by month and then filter users which have purchases on all months above your threshold:
WITH ORDERS_BY_MONTH AS (
SELECT
DATE_TRUNC(lines.date, MONTH) PurchaseMonth,
lines.customer_id Customer_ID,
COUNT(lines.order_id) PurchaseFrequency
FROM fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY PurchaseMonth, Customer_ID
)
SELECT
Customer_ID,
AVG(PurchaseFrequency) AvgPurchaseFrequency
FROM ORDERS_BY_MONTH
GROUP BY Customer_ID
HAVING COUNT(1) = COUNTIF(PurchaseFrequency >= 4)

Weekly active or lapsing status in BigQuery

I want to see the status of a customer each week based on their activity.
If a customer has transacted in the last 7 days it should appear as active and if the customer has not transacted in 8-21 days it should appear as "lapsing".
I have these values in my table:
enter image description here
Desired output refrence:
Week# Customer_id Status
If you want a row for every combination of week and customer_id, you could create a large cross join of the distinct combinations of those two from your orders table, then match all orders back into that superset keeping the latest (that's before that date).
with base_table as (
select distinct customer_id, week_date
from orders
cross join (SELECT week_date FROM UNNEST(GENERATE_DATE_ARRAY((select min(order_date) from orders), CURRENT_DATE(), INTERVAL 7 DAY)) AS week_date)
)
select base_table.customer_id, base_table.week_date, max(order_date) as latest_order,
case
when DATE_DIFF(week_date,max(order_date),DAY) <= 7 then 'active'
when DATE_DIFF(week_date,max(order_date),DAY) >= 8 and DATE_DIFF(week_date,max(order_date),DAY) <= 21 then 'lapsing'
else 'not active'
end as status
from base_table
cross join orders
where orders.customer_id = base_table.customer_id
and order_date <= week_date
group by 1, 2

How to count only the working days between two dates?

I have the following table called vacations, where the employee number is displayed along with the start and end date of their vacations:
id_employe
start
end
1001
2020-12-24
2021-01-04
What I am looking for is to visualize the amount of vacation days that each employee had, but separating them by employee number, month, year and number of days; without taking into account non-business days (Saturdays, Sundays and holidays).
I have the following query, which manages to omit Saturday and Sunday from the posting:
SELECT id_employee,
EXTRACT(YEAR FROM t.Date) AS year,
EXTRACT(MONTH FROM t.Date) AS month,
SUM(WEEKDAY(`Date`) < 5) AS days
FROM (SELECT v.id_employee,
DATE_ADD(v.start, interval s.seq - 1 DAY) AS Date
FROM vacations v CROSS JOIN seq_1_to_100 s
WHERE DATE_ADD(v.start, interval s.seq - 1 DAY) <= v.end
ORDER BY v.id_employee, v.start, s.seq ) t
GROUP BY id_employee, EXTRACT(YEAR_MONTH FROM t.Date);
My question is, how could I in addition to skipping the weekends, also skip the holidays? I suppose that I should establish another table where the dates of those holidays are stored, but how could my * query * be adapted to perform the comparison?
If we consider that the employee 1001 took his vacations from 2020-12-24 to 2021-01-04 and we take Christmas and New Years as holidays, we should get the following result:
id_employee
month
year
days
1001
12
2020
5
1001
1
2021
1
After you have created a table that stores the holiday dates, then you probably can do something like this:
SELECT id_employee,
EXTRACT(YEAR FROM t.Date) AS year,
EXTRACT(MONTH FROM t.Date) AS month,
SUM(CASE WHEN h.holiday_date IS NULL THEN WEEKDAY(`Date`) < 5 END) AS days
FROM (SELECT v.id_employee,
DATE_ADD(v.start, interval s.seq - 1 DAY) AS Date
FROM vacations v CROSS JOIN seq_1_to_100 s
WHERE DATE_ADD(v.start, interval s.seq - 1 DAY) <= v.end
ORDER BY v.id_employee, v.start, s.seq ) t
LEFT JOIN holidays h ON t.date=h.holiday_date
GROUP BY id_employee, EXTRACT(YEAR_MONTH FROM t.Date);
Assuming that the holidays table structure would be something like this:
CREATE TABLE holidays (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
holiday_date DATE,
holiday_description VARCHAR(255));
Then LEFT JOIN it to your current query and change the SUM() slightly by adding CASE expression to check. If the ON t.date=h.holiday_date in the left join matches, there will be result of field h.holiday_date, otherwise it will be NULL, hence only the CASE h.holiday_date WHEN IS NULL .. will be considered.
Demo fiddle
Adding this solution compatible with both MariaDB and MySQL version that supports common table expression:
WITH RECURSIVE cte AS
(SELECT id_employee, start, start lvdt, end FROM vacations
UNION ALL
SELECT id_employee, start, lvdt+INTERVAL 1 DAY, end FROM cte
WHERE lvdt+INTERVAL 1 DAY <=end)
SELECT id_employee,
YEAR(v.lvdt) AS year,
MONTH(v.lvdt) AS month,
SUM(CASE WHEN h.holiday_date IS NULL THEN WEEKDAY(v.lvdt) < 5 END) AS days
FROM cte v
LEFT JOIN holidays h
ON v.lvdt=h.holiday_date
GROUP BY id_employee,
YEAR(v.lvdt),
MONTH(v.lvdt);

How can I get the count to display zero for months that have no records

I am pulling transactions that happen on an attribute (attribute ID 4205 in table 1235) by the date that a change happened to the attribute (found in the History table) and counting up the number of changes that occurred by month. So far I have
SELECT TOP(100) PERCENT MONTH(H.transactiondate) AS Month, COUNT(*) AS Count
FROM hsi.rmObjectInstance1235 AS O LEFT OUTER JOIN
hsi.rmObjectHistory AS H ON H.objectID = O.objectID
WHERE H.attributeid = 4205) AND Year(H.transaction date) = '2020'
GROUP BY MONTH(H.transactiondate)
And I get
Month Count
---------------
1 9
2 4
3 11
4 14
5 1
I need to display a zero for months June - December instead of excluding those months.
One option uses a recursive query to generate the dates, and then brings the original query with a left join:
with all_dates as (
select cast('2020-01-01' as date) dt
union all
select dateadd(month, 1, dt) from all_dates where dt < '2020-12-01'
)
select
month(d.dt) as month,
count(h.objectid) as cnt
from all_dates d
left join hsi.rmobjecthistory as h
on h.attributeid = 4205
and h.transaction_date >= d.dt
and h.transaction_date < dateadd(month, 1, d.dt)
and exists (select 1 from hsi.rmObjectInstance1235 o where o.objectID = h.objectID)
group by month(d.dt)
I am quite unclear about the intent of the table hsi.rmObjectInstance1235 in the query, as none of its column are used in the select and group by clauses; it it is meant to filter hsi.rmobjecthistory by objectID, then you can rewrite this as an exists condition, as shown in the above solution. Possibly, you might as well be able to just remove that part of the query.
Also, note that
top without order by does not really make sense
top (100) percent is a no op
As a consequence, I removed that row-limiting clause.

sql count statement with multiple date ranges

I have two table with different appointment dates.
Table 1
id start date
1 5/1/14
2 3/2/14
3 4/5/14
4 9/6/14
5 10/7/14
Table 2
id start date
1 4/7/14
1 4/10/14
1 7/11/13
2 2/6/14
2 2/7/14
3 1/1/14
3 1/2/14
3 1/3/14
If i had set date ranges i can count each appointment date just fine but i need to change the date ranges.
For each id in table 1 I need to add the distinct appointment dates from table 2 BUT only
6 months prior to the start date from table 1.
Example: count all distinct appointment dates for id 1 (in table 2) with appointment dates between 12/1/13 and 5/1/14 (6 months prior). So the result is 2...4/7/14 and 4/10/14 are within and 7/1/13 is outside of 6 months.
So my issue is that the range changes for each record and i can not seem to figure out how to code this.For id 2 the date range will be 9/1/14-3/2/14 and so on.
Thanks everyone in advance!
Try this out:
SELECT id,
(
SELECT COUNT(*)
FROM table2
WHERE id = table1.id
AND table2.start_date >= DATEADD(MM,-6,table1.start_date)
) AS table2records
FROM table1
The DATEADD subtracts 6 months from the date in table1 and the subquery returns the count of related records.
I think what you want is a type of join.
select t1.id, count(t2.id) as numt2dates
from table1 t1 left outer join
table2 t2
on t1.id = t2.id and
t2.startdate between dateadd(month, -6, t1.startdate) and t1.startdate
group by t1.id;
The exact syntax for the date arithmetic depends on the database.
Thank you this solved my issue. Although this may not help you since you are not attempting to group by date. But the answer gave me the insights to resolve the issue I was facing.
I was attempting to gather the total users a date criteria that had to be evaluated by multiple fields.
WITH data AS (
SELECT generate_series(
(date '2020-01-01')::timestamp,
NOW(),
INTERVAL '1 week'
) AS date
)
SELECT d.date, (SELECT COUNT(DISTINCT h.id) AS user_count
FROM history h WHERE h.startDate < d.date AND h.endDate > d.date
ORDER BY 1 DESC) AS total_records
FROM data d ORDER BY d.date DESC
2022-05-16, 15
2022-05-09, 13
2022-05-02, 13
...