SQL - How to sum revenue by customer over the last 7 days for each date - sql

I want to sum the previous 7 days revenue from each date for each customer. There are some missing dates for some customers and various different customers so I cannot use a Lag function. I was previously using windows but I could only partition by customer_ID and could not partition by the date range as well.
Some sample data as follows:
Customer_ID
Date
Revenue
1
01/02/21
$20
2
01/02/21
$30
1
02/02/21
$40
2
02/02/21
$50
1
03/02/21
$20
2
03/02/21
$60
1
04/02/21
$10
2
04/02/21
$80
1
05/02/21
$100
2
05/02/21
$40
1
06/02/21
$20
2
06/02/21
$30
1
07/02/21
$50
2
07/02/21
$70
1
08/02/21
$10
2
08/02/21
$20
1
09/02/21
$3
2
09/02/21
$40
This result would give the sum of the previous seven days revenue broken down by customer ID for each date. It is ordered by Customer_ID and Date
Customer_ID
Date
Revenue
1
01/02/21
$20
1
02/02/21
$60
1
03/02/21
$80
1
04/02/21
$90
1
05/02/21
$190
1
06/02/21
$210
1
07/02/21
$260
1
08/02/21
$250
1
09/02/21
$240
2
01/02/21
$30
2
02/02/21
$80
2
03/02/21
$140
2
04/02/21
$220
2
05/02/21
$260
2
06/02/21
$290
2
07/02/21
$360
2
08/02/21
$350
2
09/02/21
$340
Data:
Database table
Query Result:
Query Result

select customer_id,date,sum(revenue) from customer_table where date >= sysdate-7 and date < =sysdate group by customer_id,date;
Hope this helps you

You can try going with a self join, where you match on:
tab1.customer_id = table2.customer_id
tab1.date being matched with till-6-days-before records of tab2.date.
Then apply the SUM on t2.revenues and aggregate on the selected fields.
SELECT t1.Customer_ID,
t1.Date,
SUM(t2.Revenue) AS total
FROM tab t1
LEFT JOIN tab t2
ON t1.Customer_ID = t2.Customer_ID
AND t1.Date BETWEEN t2.Date AND DATEADD(day, -6, t2.Date)
GROUP BY t1.Customer_ID,
t1.Date
This approach would avoid the issue of missing dates for customers, as long as you are comparing dates instead of taking the "last 7 records" with LAG.

with cte as (-- Customer_ID Date Revenue
select 1 customer_id, DATE( '01/02/2021','DD/MM/YYYY') Some_date, 20 Revenue
union all select 2 customer_id, DATE( '01/02/2021','DD/MM/YYYY') Some_date, 30 Revenue
union all select 1 customer_id, DATE( '03/02/2021','DD/MM/YYYY') Some_date, 20 Revenue
union all select 2 customer_id, DATE( '03/02/2021','DD/MM/YYYY') Some_date, 60 Revenue
union all select 1 customer_id, DATE( '04/02/2021','DD/MM/YYYY') Some_date, 10 Revenue
union all select 2 customer_id, DATE( '04/02/2021','DD/MM/YYYY') Some_date, 80 Revenue
union all select 1 customer_id, DATE( '05/02/2021','DD/MM/YYYY') Some_date, 100 Revenue
union all select 2 customer_id, DATE( '05/02/2021','DD/MM/YYYY') Some_date, 40 Revenue
union all select 1 customer_id, DATE( '06/02/2021','DD/MM/YYYY') Some_date, 20 Revenue
union all select 2 customer_id, DATE( '06/02/2021','DD/MM/YYYY') Some_date, 30 Revenue
union all select 1 customer_id, DATE( '07/02/2021','DD/MM/YYYY') Some_date, 50 Revenue
union all select 2 customer_id, DATE( '07/02/2021','DD/MM/YYYY') Some_date, 70 Revenue
union all select 1 customer_id, DATE( '08/02/2021','DD/MM/YYYY') Some_date, 10 Revenue
union all select 2 customer_id, DATE( '08/02/2021','DD/MM/YYYY') Some_date, 20 Revenue
union all select 1 customer_id, DATE( '09/02/2021','DD/MM/YYYY') Some_date, 3 Revenue
union all select 1 customer_id, DATE( '02/02/2021','DD/MM/YYYY') Some_date, 40 Revenue
union all select 2 customer_id, DATE( '02/02/2021','DD/MM/YYYY') Some_date, 50 Revenue
union all select 2 customer_id, DATE( '09/02/2021','DD/MM/YYYY') Some_date, 40 Revenue)
select customer_id, revenue
, DATE_TRUNC('week', Some_date ) week_number
, sum(revenue)
over(partition by customer_id,week_number
order by Some_date asc
rows between unbounded preceding and current row) volia
from cte

Related

How to use SUM() OVER (partition by)?

Imagine, from 1st to 3rd november you have sold a certain amount of goods (there are two types A and B), and now you need to determine how much was sold in total for the day.
How can I query last 2 columns (sum and quantity for date) that my table looks like this?:
Date Type Quantity Amount Sum_Quantity Sum_Amount
01-11 A 2 100 5 300
01-11 B 3 200 5 300
02-11 A 1 700 3 950
02-11 B 2 250 3 950
03-11 A 2 600 7 800
03-11 B 5 200 7 800
And how can I query, if I want to take the results partitioned by month?
SELECT date,
type,
quantity,
amount,
-- Partition by date
SUM(quantity) OVER (PARTITION BY date) AS sum_quantity_date_part,
SUM(amount) OVER (PARTITION BY date) AS sum_amount_date_part,
-- Partition by month
SUM(quantity) OVER (
PARTITION BY EXTRACT(YEAR FROM date),
EXTRACT(MONTH FROM date)
) AS sum_quantity_month_part,
SUM(amount) OVER (
PARTITION BY EXTRACT(YEAR FROM date),
EXTRACT(MONTH FROM date)
) AS sum_amount_month_part
FROM sales
ORDER BY date, type
;

Cumulative average and count over occurrences increasing in time

I am looking to calculate an average (over number of occurrences) and observation count over increasing dates per instance (take customer as an example instance) in Oracle SQL.
So the count will increase as date goes up, the average could go up or down.
I can do it for an individual case and a fixed time interval, but I would like to see a series for every customer, with every row a separate date where a sale occurred. Right now, I have a single row per customer. Here is the SQL summarizing the average and count for a fixed time interval:
SELECT AVG(bought_usd) as avg_bought
, COUNT(*) as num_of_interactions
, cust_id
FROM salesTable
WHERE obsdate >= DATE('2000-01-01')
AND obsdate <= DATE('2022-01-01')
GROUP BY cust_id
So for an input of:
the output should look like:
Use analytic functions:
SELECT "DATE",
cust,
AVG(bought_usd) OVER (PARTITION BY cust ORDER BY "DATE") AS avg,
COUNT(*) OVER (PARTITION BY cust ORDER BY "DATE") AS cnt
FROM salestable
ORDER BY cust, "DATE"
Note: DATE is a reserved word. You should not use it as an identifier.
Which, for the sample data:
CREATE TABLE salestable ("DATE", cust, bought_usd) AS
SELECT DATE '2010-10-01', 'Cust A', 100 FROM DUAL UNION ALL
SELECT DATE '2010-12-18', 'Cust A', 50 FROM DUAL UNION ALL
SELECT DATE '2010-12-18', 'Cust B', 120 FROM DUAL UNION ALL
SELECT DATE '2011-10-01', 'Cust B', 180 FROM DUAL;
Outputs:
DATE
CUST
AVG
CNT
2010-10-01 00:00:00
Cust A
100
1
2010-12-18 00:00:00
Cust A
75
2
2010-12-18 00:00:00
Cust B
120
1
2011-10-01 00:00:00
Cust B
150
2
db<>fiddle here

What to use in place of union in above query i wrote or more optimize query then my given query without union and union all

I am counting the birthdays , sales , order in all 12 months from customers table in SQL server like these
In Customers table birth_date ,sale_date, order_date are columns of the table
select 1 as ranking,'Birthdays' as Type,[MONTH],TOTAL
from ( select DATENAME(month, birth_date) AS [MONTH],count(*) TOTAL
from customers
group by DATENAME(month, birth_date)
)x
union
select 2 as ranking,'sales' as Type,[MONTH],TOTAL
from ( select DATENAME(month, sale_date) AS [MONTH],count(*) TOTAL
from customers
group by DATENAME(month, sale_date)
)x
union
select 3 as ranking,'Orders' as Type,[MONTH],TOTAL
from ( select DATENAME(month, order_date) AS [MONTH],count(*) TOTAL
from customers
group by DATENAME(month, order_date)
)x
And the output is like these(just dummy data)
ranking
Type
MONTH
TOTAL
1
Birthdays
January
12
1
Birthdays
April
6
1
Birthdays
May
10
2
Sales
Febrary
8
2
Sales
April
14
2
Sales
May
10
3
Orders
June
4
3
Orders
July
3
3
Orders
October
6
3
Orders
December
17
I want to find count of these all these three types without using UNION and UNION ALL, means I want these data by single query statement (or more optimize version of these query)
Another approach is to create a CTE with all available ranking values ​​and use CROSS APPLY for it, as shown below.
WITH ranks(ranking) AS (
SELECT * FROM (VALUES (1), (2), (3)) v(r)
)
SELECT
r.ranking,
CASE WHEN r.ranking = 1 THEN 'Birthdays'
WHEN r.ranking = 2 THEN 'Sales'
WHEN r.ranking = 3 THEN 'Orders'
END AS Type,
DATENAME(month, CASE WHEN r.ranking = 1 THEN c.birth_date
WHEN r.ranking = 2 THEN c.sale_date
WHEN r.ranking = 3 THEN c.order_date
END) AS MONTH,
COUNT(*) AS TOTAL
FROM customers c
CROSS APPLY ranks r
GROUP BY r.ranking,
DATENAME(month, CASE WHEN r.ranking = 1 THEN c.birth_date
WHEN r.ranking = 2 THEN c.sale_date
WHEN r.ranking = 3 THEN c.order_date
END)
ORDER BY r.ranking, MONTH

Get last known record per month in BigQuery

Account balance collection, that shows the account balance of a customer at a given day:
+---------------+---------+------------+
| customer_id | value | timestamp |
+---------------+---------+------------+
| 1 | -500 | 2019-10-12 |
| 1 | -300 | 2019-10-11 |
| 1 | -200 | 2019-10-10 |
| 1 | 0 | 2019-10-09 |
| 2 | 200 | 2019-09-10 |
| 1 | 600 | 2019-09-02 |
+---------------+---------+------------+
Notice, that customer #2 had no updates to his account balance in October.
I want to get the last account balance per customer per month. If there has been no account balance update for a customer in a given month, the last known account balance should be transferred to the current month. The result should look like that:
+---------------+---------+------------+
| customer_id | value | timestamp |
+---------------+---------+------------+
| 1 | -500 | 2019-10-12 |
| 2 | 200 | 2019-10-10 |
| 2 | 200 | 2019-09-10 |
| 1 | 600 | 2019-09-02 |
+---------------+---------+------------+
Since the account balance of customer #2 was not updated in October but in September, we create a copy of the row from September changing the date to October. Any ideas how to achieve this in BigQuery?
Below is for BigQuery Standard SQL
#standardSQL
WITH customers AS (
SELECT DISTINCT customer_id FROM `project.dataset.table`
), months AS (
SELECT month FROM (
SELECT DATE_TRUNC(MIN(timestamp), MONTH) min_month, DATE_TRUNC(MAX(timestamp), MONTH) max_month
FROM `project.dataset.table`
), UNNEST(GENERATE_DATE_ARRAY(min_month, max_month, INTERVAL 1 MONTH)) month
)
SELECT customer_id,
IFNULL(value, LEAD(value) OVER(win)) value,
IFNULL(timestamp, DATE_ADD(LEAD(timestamp) OVER(win), INTERVAL DATE_DIFF(month, LEAD(month) OVER(win), MONTH) MONTH)) timestamp
FROM months, customers
LEFT JOIN (
SELECT DATE_TRUNC(timestamp, MONTH) month, customer_id,
ARRAY_AGG(STRUCT(value, timestamp) ORDER BY timestamp DESC LIMIT 1)[OFFSET(0)].*
FROM `project.dataset.table`
GROUP BY month, customer_id
) USING(month, customer_id)
WINDOW win AS (PARTITION BY customer_id ORDER BY month DESC)
if to apply to sample data from your question - as it is in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 customer_id, -500 value, DATE '2019-10-12' timestamp UNION ALL
SELECT 1, -300, '2019-10-11' UNION ALL
SELECT 1, -200, '2019-10-10' UNION ALL
SELECT 2, 200, '2019-09-10' UNION ALL
SELECT 2, 100, '2019-08-11' UNION ALL
SELECT 2, 50, '2019-07-12' UNION ALL
SELECT 1, 600, '2019-09-02'
), customers AS (
SELECT DISTINCT customer_id FROM `project.dataset.table`
), months AS (
SELECT month FROM (
SELECT DATE_TRUNC(MIN(timestamp), MONTH) min_month, DATE_TRUNC(MAX(timestamp), MONTH) max_month
FROM `project.dataset.table`
), UNNEST(GENERATE_DATE_ARRAY(min_month, max_month, INTERVAL 1 MONTH)) month
)
SELECT customer_id,
IFNULL(value, LEAD(value) OVER(win)) value,
IFNULL(timestamp, DATE_ADD(LEAD(timestamp) OVER(win), INTERVAL DATE_DIFF(month, LEAD(month) OVER(win), MONTH) MONTH)) timestamp
FROM months, customers
LEFT JOIN (
SELECT DATE_TRUNC(timestamp, MONTH) month, customer_id,
ARRAY_AGG(STRUCT(value, timestamp) ORDER BY timestamp DESC LIMIT 1)[OFFSET(0)].*
FROM `project.dataset.table`
GROUP BY month, customer_id
) USING(month, customer_id)
WINDOW win AS (PARTITION BY customer_id ORDER BY month DESC)
-- ORDER BY month DESC, customer_id
result is
Row customer_id value timestamp
1 1 -500 2019-10-12
2 2 200 2019-10-10
3 1 600 2019-09-02
4 2 200 2019-09-10
5 1 null null
6 2 100 2019-08-11
7 1 null null
8 2 50 2019-07-12
The following query should mostly answer your question by creating a 'month-end' record for each customer for every month and getting the most recent balance:
with
-- Generate a set of months
month_begins as (
select dt from unnest(generate_date_array('2019-01-01','2019-12-01', interval 1 month)) dt
),
-- Get the month ends
month_ends as (
select date_sub(date_add(dt, interval 1 month), interval 1 day) as month_end_date from month_begins
),
-- Cross Join and group so we get 1 customer record for every month to account for
-- situations where customer doesn't change balance in a month
user_month_ends as (
select
customer_id,
month_end_date
from `project.dataset.table`
cross join month_ends
group by 1,2
),
-- Fan out so for each month end, you get all balances prior to month end for each customer
values_prior_to_month_end as (
select
customer_id,
value,
timestamp,
month_end_date
from `project.dataset.table`
inner join user_month_ends using(customer_id)
where timestamp <= month_end_date
),
-- Order by most recent balance before month end, even if it was more than 1+ months ago
ordered as (
select
*,
row_number() over (partition by customer_id, month_end_date order by timestamp desc) as my_row
from values_prior_to_month_end
),
-- Finally, select only the most recent record for each customer per month
final as (
select
* except(my_row)
from ordered
where my_row = 1
)
select * from final
order by customer_id, month_end_date desc
A few caveats:
I did not order results to match your desired result set, and I also kept a month-end date to illustrate the concept. You can easily change the ordering and exclude unneeded fields.
In the month_begins CTE, I set a range of months into the future, so your result set will contain the most recent balance of 'future months'. To make this a bit prettier, consider changing '2019-12-01' to 'current_date()' and your query will always return to the end of the current month.
Your timestamp field looks to be dates, so I used date logic, but you should be able to apply the same principles to use timestamp logic if your underlying fields are actual timestamps.
In your result set, I'm not sure why your 2nd row (customer 2) would have a timestamp of '2019-10-10', that seems arbitrary as customer 2 has no 2nd balance record.
I purposefully split the logic into several CTEs so I could comment on each step easier, you could definitely perform several steps in the same code block for a more condensed query.

Moving average of 2 columns

Hello I have a problem. I know how to calculate moving average last 3 months using oracle analytic functions... but my situatiion is a little different
Month-----ProductType-----Sales----------Average(HAVE TO FIND THIS)
1---------A---------------10
1---------B---------------12
1---------C---------------17
2---------A---------------21
3---------C---------------2
3---------B---------------21
4---------B---------------23
5
6
7
8
9
So we have sales for each month and each product type... I need to calculate the moving average of the last 3 months and the particular product.
example:
For month 4 and Produt B it would be (21+0+12)/3
Any ideas ?
Another option is to use the windowing clause of analytic functions
with my_data as (
select 1 as month, 'A' as product, 10 as sales from dual union all
select 1 as month, 'B' as product, 12 as sales from dual union all
select 1 as month, 'C' as product, 17 as sales from dual union all
select 2 as month, 'A' as product, 21 as sales from dual union all
select 3 as month, 'C' as product, 2 as sales from dual union all
select 3 as month, 'B' as product, 21 as sales from dual union all
select 4 as month, 'B' as product, 23 as sales from dual
)
select
month,
product,
sales,
nvl(sum(sales)
over (partition by product order by month
range between 3 preceding and 1 preceding),0)/3 as average_sales
from my_data
order by month, product
SELECT month,
productType,
sales,
(lag(sales, 3) over (partition by produtType order by month) +
lag(sales, 2) over (partition by productType order by month) +
lag(sales, 1) over (partition by productType order by month)/3 moving_avg
FROM your_table_name