I have a dataset that's just a list of orders made by customers each day.
order_date
month
week
customer
2022-10-06
10
40
Paul
2022-10-06
10
40
Edward
2022-10-01
10
39
Erick
2022-09-26
9
39
Divine
2022-09-23
9
38
Alice
2022-09-21
9
38
Evelyn
My goal is to calculate the total number of unique customers within a two-week period. I can count the number of customers within a month or week period but not two weeks. Also, the two weeks are in a rolling order such that weeks 40 and 39 (as in the sample above) is one window period while weeks 39 and 38 is the next frame.
So far, this is how I am getting the monthly and weekly numbers. Assume that the customer names are distinct per day.
select order_date,
month,
week,
COUNT(DISTINCT customer) over (partition by month) month_active_outlets,
COUNT(DISTINCT customer) OVER (partition by week) week active outlets,
from table
Again, I am unable to calculate the unique customer names within a two-week period.
I think the easiest would be to create your own grouper in a subquery and then use that to get to your count. Currently, COUNT UNIQUE and ORDER BY in the window is not supported, therefore that approach wouldn't work.
A possible query could be:
WITH
week_before AS (
SELECT
EXTRACT(WEEK from order_date) as week, --to be sure this is the same week format
month,
CONCAT(week,'-', EXTRACT(WEEK FROM DATE_SUB(order_date, INTERVAL 7 DAY))) AS two_weeks,
customer
FROM
`test`.`Basic`)
SELECT
two_weeks,
COUNT(DISTINCT customer) AS unique_customer
FROM
week_before
GROUP BY
two_weeks
The window function is the right tool. To obtain the 2 week date, we first extract the week number of the year:
mod(extract(week from order_date),2)
If the week number is odd (modulo 2) we add a week. Then we trunc to the start of (the even) week.
date_trunc(date_add(order_date,interval mod(extract(week from order_date),2) week),week )
with tbl as
(Select date("2022-10-06") as order_date, "Paul" as customer
union all select date("2022-10-06"),"Edward"
union all select date("2022-10-01"),"Erick"
union all select date("2022-09-26"),"Divine"
union all select date("2022-09-23"),"Alice"
union all select date("2022-09-21"),"Evelyn"
)
select *,
date_trunc(order_date,month) as month,
date_trunc(order_date,week) as week,
COUNT(DISTINCT customer) OVER week2 as customer_2weeks,
string_agg(cast(order_date as string)) over week2 as list_2weeks,
from tbl
window week2 as (partition by date_trunc(date_add(order_date,interval mod(extract(week from order_date),2) week),week ))
The first days of a year are counted to the last week of the previous year:
select order_date,
extract(isoweek from order_date),
date_trunc(date_add(order_date,interval mod(extract(week from order_date),2) week),week)
from
unnest(generate_date_array(date("2021-12-01"),date("2023-01-14"))) order_date
order by 1
Related
I am trying to optimize the below query to help fetch all customers in the last three months who have a monthly order frequency +4 for the past three months.
Customer ID
Feb
Mar
Apr
0001
4
5
6
0002
3
2
4
0003
4
2
3
In the above table, the customer with Customer ID 0001 should only be picked, as he consistently has 4 or more orders in a month.
Below is a query I have written, which pulls all customers with an average purchase frequency of 4 in the last 90 days, but not considering there is a consistent purchase of 4 or more last three months.
Query:
SELECT distinct lines.customer_id Customer_ID, (COUNT(lines.order_id)/90) PurchaseFrequency
from fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY Customer_ID
HAVING PurchaseFrequency >=4;
I tried to use window functions, however not sure if it needs to be used in this case.
I would sum the orders per month instead of computing the avg and then retrieve those who have that sum greater than 4 in the last three months.
Also I think you should select your interval using "month(CURRENT_DATE()) - 3" instead of using a window of 90 days. Of course if needed you should handle the case of when current_date is jan-feb-mar and in that case go back to oct-nov-dec of the previous year.
I'm not familiar with Google BigQuery so I can't write your query but I hope this helps.
So I've found the solution to this using WITH operator as below:
WITH filtered_orders AS (
select
distinct customer_id ID,
extract(MONTH from date) Order_Month,
count(order_id) CountofOrders
from customer_order_lines` lines
where EXTRACT(YEAR FROM date) = 2022 AND EXTRACT(MONTH FROM date) IN (2,3,4)
group by ID, Order_Month
having CountofOrders>=4)
select distinct ID
from filtered_orders
group by ID
having count(Order_Month) =3;
Hope this helps!
An option could be first count the orders by month and then filter users which have purchases on all months above your threshold:
WITH ORDERS_BY_MONTH AS (
SELECT
DATE_TRUNC(lines.date, MONTH) PurchaseMonth,
lines.customer_id Customer_ID,
COUNT(lines.order_id) PurchaseFrequency
FROM fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY PurchaseMonth, Customer_ID
)
SELECT
Customer_ID,
AVG(PurchaseFrequency) AvgPurchaseFrequency
FROM ORDERS_BY_MONTH
GROUP BY Customer_ID
HAVING COUNT(1) = COUNTIF(PurchaseFrequency >= 4)
The question I have is very similar to the question here, but I am using Presto SQL (on aws athena) and couldn't find information on loops in presto.
To reiterate the issue, I want the query that:
Given table that contains: Day, Number of Items for this Day
I want: Day, Average Items for Last 7 Days before "Day"
So if I have a table that has data from Dec 25th to Jan 25th, my output table should have data from Jan 1st to Jan 25th. And for each day from Jan 1-25th, it will be the average number of items from last 7 days.
Is it possible to do this with presto?
maybe you can try this one
calendar Common Table Expression (CTE) is used to generate dates between two dates range.
with calendar as (
select date_generated
from (
values (sequence(date'2021-12-25', date'2022-01-25', interval '1' day))
) as t1(date_array)
cross join unnest(date_array) as t2(date_generated)),
temp CTE is basically used to make a date group which contains last 7 days for each date group.
temp as (select c1.date_generated as date_groups
, format_datetime(c2.date_generated, 'yyyy-MM-dd') as dates
from calendar c1, calendar c2
where c2.date_generated between c1.date_generated - interval '6' day and c1.date_generated
and c1.date_generated >= date'2021-12-25' + interval '6' day)
Output for this part:
date_groups
dates
2022-01-01
2021-12-26
2022-01-01
2021-12-27
2022-01-01
2021-12-28
2022-01-01
2021-12-29
2022-01-01
2021-12-30
2022-01-01
2021-12-31
2022-01-01
2022-01-01
last part is joining day column from your table with each date and then group it by the date group
select temp.date_groups as day
, avg(your_table.num_of_items) avg_last_7_days
from your_table
join temp on your_table.day = temp.dates
group by 1
You want a running average (AVG OVER)
select
day, amount,
avg(amount) over (order by day rows between 6 preceding and current row) as avg_amount
from mytable
order by day
offset 6;
I tried many different variations of getting the "running average" (which I now know is what I was looking for thanks to Thorsten's answer), but couldn't get the output I wanted exactly with my other columns (that weren't included in my original question) in the table, but this ended up working:
SELECT day, <other columns>, avg(amount) OVER (
PARTITION BY <other columns>
ORDER BY date(day) ASC
ROWS 6 PRECEDING) as avg_7_days_amount FROM table ORDER BY date(day) ASC
I have this query that shows the number of flights each month, but the month appears in number format and I want to convert the month number as text.
Here is the query:
select to_char(f.departuretime, 'yyyy-mm') month, count(*) numberofflights
from flight f
group by to_char(f.departuretime, 'yyyy-mm')
order by numberofflights desc;
output:
MONTH NUMBEROFFLIGHTS
2022-05 7
2022-11 5
2022-08 3
... ...
I want to display the months like "2022-MAY" or just "MAY" and so on.
You can use the month format instead of MM to get the month's name instead of its number:
select to_char(f.departuretime, 'yyyy-MONTH') month, count(*) numberofflights
from flight f
group by to_char(f.departuretime, 'yyyy-MONTH')
order by numberofflights desc;
I'm having a bit of trouble working with windowing and partitions in GoogleSQL.
My script currently takes the revenue of the last 7 days by category and class:
SELECT
date_id,
category,
class,
revenue as revenue_usd,
SUM(revenue)
OVER(PARTITION BY product_area, product_group
ORDER BY date_id
RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) / 7 as l7d_avg_revenue_usd
FROM
(
SELECT
date_id,
category,
class,
SUM(revenue_usd_fx) as revenue
FROM
revenue_table
WHERE
date_id BETWEEN DATE('2021-01-01') AND DATE('2021-03-31')
GROUP BY 1,2,3
)
However the problem with my script is that, for date 2021-01-01, it will still take the revenue of the date, and still divide it by 7.
For date_ids that are < 7 days in based on the availability in the database, how do i take an average of whatever days that have passed so far?
(E.g. 2021-01-01 will take an average of 1 day, 2020-01-03 will take an average of 3 days, and anything after 2021-01-07 will take an average of 7 days)
Thank you!
Use AVG() instead of SUM():
AVG(revenue) OVER (
PARTITION BY product_area, product_group
ORDER BY date_id
RANGE BETWEEN 6 PRECEDING AND CURRENT ROW
) as l7d_avg_revenue_usd
I want to know the trick to find the list of customers who are transacting for consecutive 3 months ,that could be any 3 consecutive months with any number of occurrence.
example: suppose there is customer who transact in January then keep transacting till march then he stopped transacting.I want the list of these customer from my database .
I am working on AWS Athena.
One method uses aggregation and window functions:
select customer_id, yyyymm_2
from (select date_trunc(month, transactdate) as yyyymm, customer_id,
lag(date_trunc(month, transactdate), 2) over (partition by customer_id order by date_trunc(month, transactdate)) as prev_yyyymm_2
from t
where transactdate >= '2017-01-01' and
transactadte < '2019-01-01'
)
where prev_dt_2 = yyyymm - interval '2' month;
This aggregates transactions by month and looks at the transaction date two rows earlier. The outer filter checks that that date is exactly 2 months earlier.