Cumulative average and count over occurrences increasing in time - sql

I am looking to calculate an average (over number of occurrences) and observation count over increasing dates per instance (take customer as an example instance) in Oracle SQL.
So the count will increase as date goes up, the average could go up or down.
I can do it for an individual case and a fixed time interval, but I would like to see a series for every customer, with every row a separate date where a sale occurred. Right now, I have a single row per customer. Here is the SQL summarizing the average and count for a fixed time interval:
SELECT AVG(bought_usd) as avg_bought
, COUNT(*) as num_of_interactions
, cust_id
FROM salesTable
WHERE obsdate >= DATE('2000-01-01')
AND obsdate <= DATE('2022-01-01')
GROUP BY cust_id
So for an input of:
the output should look like:

Use analytic functions:
SELECT "DATE",
cust,
AVG(bought_usd) OVER (PARTITION BY cust ORDER BY "DATE") AS avg,
COUNT(*) OVER (PARTITION BY cust ORDER BY "DATE") AS cnt
FROM salestable
ORDER BY cust, "DATE"
Note: DATE is a reserved word. You should not use it as an identifier.
Which, for the sample data:
CREATE TABLE salestable ("DATE", cust, bought_usd) AS
SELECT DATE '2010-10-01', 'Cust A', 100 FROM DUAL UNION ALL
SELECT DATE '2010-12-18', 'Cust A', 50 FROM DUAL UNION ALL
SELECT DATE '2010-12-18', 'Cust B', 120 FROM DUAL UNION ALL
SELECT DATE '2011-10-01', 'Cust B', 180 FROM DUAL;
Outputs:
DATE
CUST
AVG
CNT
2010-10-01 00:00:00
Cust A
100
1
2010-12-18 00:00:00
Cust A
75
2
2010-12-18 00:00:00
Cust B
120
1
2011-10-01 00:00:00
Cust B
150
2
db<>fiddle here

Related

SQL - How to sum revenue by customer over the last 7 days for each date

I want to sum the previous 7 days revenue from each date for each customer. There are some missing dates for some customers and various different customers so I cannot use a Lag function. I was previously using windows but I could only partition by customer_ID and could not partition by the date range as well.
Some sample data as follows:
Customer_ID
Date
Revenue
1
01/02/21
$20
2
01/02/21
$30
1
02/02/21
$40
2
02/02/21
$50
1
03/02/21
$20
2
03/02/21
$60
1
04/02/21
$10
2
04/02/21
$80
1
05/02/21
$100
2
05/02/21
$40
1
06/02/21
$20
2
06/02/21
$30
1
07/02/21
$50
2
07/02/21
$70
1
08/02/21
$10
2
08/02/21
$20
1
09/02/21
$3
2
09/02/21
$40
This result would give the sum of the previous seven days revenue broken down by customer ID for each date. It is ordered by Customer_ID and Date
Customer_ID
Date
Revenue
1
01/02/21
$20
1
02/02/21
$60
1
03/02/21
$80
1
04/02/21
$90
1
05/02/21
$190
1
06/02/21
$210
1
07/02/21
$260
1
08/02/21
$250
1
09/02/21
$240
2
01/02/21
$30
2
02/02/21
$80
2
03/02/21
$140
2
04/02/21
$220
2
05/02/21
$260
2
06/02/21
$290
2
07/02/21
$360
2
08/02/21
$350
2
09/02/21
$340
Data:
Database table
Query Result:
Query Result
select customer_id,date,sum(revenue) from customer_table where date >= sysdate-7 and date < =sysdate group by customer_id,date;
Hope this helps you
You can try going with a self join, where you match on:
tab1.customer_id = table2.customer_id
tab1.date being matched with till-6-days-before records of tab2.date.
Then apply the SUM on t2.revenues and aggregate on the selected fields.
SELECT t1.Customer_ID,
t1.Date,
SUM(t2.Revenue) AS total
FROM tab t1
LEFT JOIN tab t2
ON t1.Customer_ID = t2.Customer_ID
AND t1.Date BETWEEN t2.Date AND DATEADD(day, -6, t2.Date)
GROUP BY t1.Customer_ID,
t1.Date
This approach would avoid the issue of missing dates for customers, as long as you are comparing dates instead of taking the "last 7 records" with LAG.
with cte as (-- Customer_ID Date Revenue
select 1 customer_id, DATE( '01/02/2021','DD/MM/YYYY') Some_date, 20 Revenue
union all select 2 customer_id, DATE( '01/02/2021','DD/MM/YYYY') Some_date, 30 Revenue
union all select 1 customer_id, DATE( '03/02/2021','DD/MM/YYYY') Some_date, 20 Revenue
union all select 2 customer_id, DATE( '03/02/2021','DD/MM/YYYY') Some_date, 60 Revenue
union all select 1 customer_id, DATE( '04/02/2021','DD/MM/YYYY') Some_date, 10 Revenue
union all select 2 customer_id, DATE( '04/02/2021','DD/MM/YYYY') Some_date, 80 Revenue
union all select 1 customer_id, DATE( '05/02/2021','DD/MM/YYYY') Some_date, 100 Revenue
union all select 2 customer_id, DATE( '05/02/2021','DD/MM/YYYY') Some_date, 40 Revenue
union all select 1 customer_id, DATE( '06/02/2021','DD/MM/YYYY') Some_date, 20 Revenue
union all select 2 customer_id, DATE( '06/02/2021','DD/MM/YYYY') Some_date, 30 Revenue
union all select 1 customer_id, DATE( '07/02/2021','DD/MM/YYYY') Some_date, 50 Revenue
union all select 2 customer_id, DATE( '07/02/2021','DD/MM/YYYY') Some_date, 70 Revenue
union all select 1 customer_id, DATE( '08/02/2021','DD/MM/YYYY') Some_date, 10 Revenue
union all select 2 customer_id, DATE( '08/02/2021','DD/MM/YYYY') Some_date, 20 Revenue
union all select 1 customer_id, DATE( '09/02/2021','DD/MM/YYYY') Some_date, 3 Revenue
union all select 1 customer_id, DATE( '02/02/2021','DD/MM/YYYY') Some_date, 40 Revenue
union all select 2 customer_id, DATE( '02/02/2021','DD/MM/YYYY') Some_date, 50 Revenue
union all select 2 customer_id, DATE( '09/02/2021','DD/MM/YYYY') Some_date, 40 Revenue)
select customer_id, revenue
, DATE_TRUNC('week', Some_date ) week_number
, sum(revenue)
over(partition by customer_id,week_number
order by Some_date asc
rows between unbounded preceding and current row) volia
from cte

BigQuery: Rolling daily count visitor's summary of payment

I have this data:
date
visitor_id
total_payment
2022-01-01
A
20
2022-01-01
B
15
2022-01-01
C
20
2022-01-02
B
10
2022-01-02
D
25
I'd like to have daily count of visitor with total_payment equal or greater than 20$, with that being said, result I'm hoping is:
date
count_visitor
2022-01-01
2
2022-01-02
4
2022-01-01 is 2 because only A and C have payment more than 20$, however on 2022-01-02 additional 2 more because B is 35$ (sum) and D is 25$.
Is there any possible query for this? I hope I'm clear on my description. Thank you in advance.
You can use this query as solution.
First, I calculate cumulative payments of each user.
Then, I find the minimum date for each user that exceeds 20$ cumulative payment.
At the last step, I count number of users for each minimum date, and also accumulate that number.
In the output you don't have to have first_day_users column, but I kept it to make it easier to understand the code.
So the output looks like that:
WITH
data AS(
SELECT "2022-01-01" AS date, "A" AS visitor_id, 20 AS total_payment UNION ALL
SELECT "2022-01-01" AS date, "B" AS visitor_id, 15 AS total_payment UNION ALL
SELECT "2022-01-01" AS date, "C" AS visitor_id, 20 AS total_payment UNION ALL
SELECT "2022-01-02" AS date, "B" AS visitor_id, 10 AS total_payment UNION ALL
SELECT "2022-01-02" AS date, "D" AS visitor_id, 25 AS total_payment
),
user_cumulatives as
(
SELECT
visitor_id,
date,
SUM(total_payment) OVER (PARTITION BY visitor_id ORDER BY date) as cumulative_payment
FROM data
),
user_first_dates as
(
select visitor_id, min(date) as date
from user_cumulatives
where cumulative_payment >= 20
group by 1
)
select date, count(*) as first_day_users, sum(count(*)) over (order by date) as count_visitor
from user_first_dates
group by 1
order by date
Welcome #Indri
The query below will give you a running sum of the rows per day where the total_amount of greater than of equal to 20, I believe this should give you the answer you are looking for:
WITH data AS(
SELECT "2022-01-01" AS date, "A" AS visitor_id, 20 AS total_payment
UNION ALL
SELECT "2022-01-01" AS date, "B" AS visitor_id, 15 AS total_payment
UNION ALL
SELECT "2022-01-01" AS date, "C" AS visitor_id, 20 AS total_payment
UNION ALL
SELECT "2022-01-02" AS date, "A" AS visitor_id, 10 AS total_payment
UNION ALL
SELECT "2022-01-02" AS date, "D" AS visitor_id, 25 AS total_payment
)
SELECT
*,
COUNT(*) OVER(ORDER BY date)
FROM data
WHERE total_payment >= 20

How to calculate needed amount for supply order?

Table "client_orders":
date
ordered
id
28.05
50
1
23.06
60
2
24.05
50
1
25.06
130
2
Table "stock":
id
amount
date
1
60
23.04
2
90
25.04
1
10
24.04
2
10
24.06
I want to calculate the amount I need to order (to fulfill the stock) for what date. For instance, it should be:
30 by 28.05 (60+10-50-50=-30) for id = 1
-90 by 25.06 (90-60+10-130=-90) for id = 2
I tried to do it with LAG function, but the problem is that the stock here is not updating.
SELECT *,
SUM(amount - ordered) OVER (PARTITION BY sd.id ORDER BY d.date ASC)
FROM stock sd
LEFT JOIN (SELECT date,
id,
ordered
FROM client_orders) AS d
ON sd.id = d.id
Couldn't find anything similar on the web. Grateful if you share articles/examples how to do that.
You could make a union of the two tables and sum all stock amounts with the negative of ordered amounts. For the date you could instead take the corresponding maximum value.
SELECT id,
SUM(amount),
MAX(date)
FROM (SELECT id,
-ordered AS amount,
date
FROM client_orders
UNION
SELECT *
FROM stock
) stock_and_orders
GROUP BY id
Try it here.

Identifying who spent more than a certain amount within any 30 day period?

I have a table that lists each customer's transactions along with the date they occurred and how much was spent. What I want to do is get a list of all customers who spent £3k or more within any 30-day period.
I can get a list of who spent £3k or more within the last 30 days using the code below, but I'm not sure how to adapt this to cover any 30-day period. Any help would be appreciated please!
select *
from
(
select customer_id, sum(spend) as total_spend
from transaction_table
where transaction_date between (current date - 30 days) and current date
group by customer_id
)
where total_spend >=3000
;
Try the following.
The idea is to calculate running sum of SPEND for last 30 days for each row.
WITH TRANSACTION_TABLE (CUSTOMER_ID, TRANSACTION_DATE, SPEND) AS
(
VALUES
(1, DATE ('2021-01-01'), 1000)
, (1, DATE ('2021-01-31'), 2000)
--, (1, DATE ('2021-02-01'), 2000)
)
SELECT DISTINCT CUSTOMER_ID
FROM
(
SELECT
CUSTOMER_ID
--, TRANSACTION_DATE, SPEND
, SUM (SPEND) OVER (PARTITION BY CUSTOMER_ID ORDER BY DAYS (TRANSACTION_DATE) RANGE BETWEEN 30 PRECEDING AND CURRENT ROW) AS SPEND_RTOTAL
FROM TRANSACTION_TABLE
)
WHERE SPEND_RTOTAL >= 3000
You can use SUM() with a window function and a window frame of 30. For example:
select *
from (
select t.*,
sum(t.spent) over(
partition by customer_id
order by julian_day(transaction_date)
range between 30 preceding and current row
) as total_spend
from transaction_table t
) x
where total_spend >= 3000
For the data set:
CUSTOMER_ID TRANSACTION_DATE SPENT
------------ ----------------- -----
1 2021-10-01 2000
1 2021-10-15 1500
1 2021-12-01 1000
2 2021-11-01 2500
Result:
CUSTOMER_ID TRANSACTION_DATE SPENT TOTAL_SPEND
------------ ----------------- ------ -----------
1 2021-10-15 1500 3500
See running example at db<>fiddle.

Writing subquery within SUM using values of 1 table

Now I have a table and I am trying to calculate for each book_id the total sales in the past 100 days for every day in the past 1 year.
book_id location seller daily_sales order_day
ABC 1 XYZ 100 2017-05-05
ABC 1 XYZ 120 2017-05-07
ABC 1 XYZ 40 2017-02-10
.
.
.
So what I am trying to expect in the result is:
book_id order_day sum
ABC 2017-05-05 100+40
ABC 2017-05-07 100+120+40
ABC 2017-02-10 40
For this I wrote a query like this:
select book_id, to_char(order_day),
SUM(case when order_day between order_day -100 and order_day then daily_sales else 0 end) sum
FROM bookDetailsTable
where location = 1 AND ORDER_DAY BETWEEN TO_DATE('20170725','YYYYMMDD') - 359 AND TO_DATE('20170725','YYYYMMDD')
group by seller, book_id, order_day
I guess I am doing wrong and I should write a select statement within the SUM statement to select data for the past 100 days.
You should get the result with this
select A.book_id,
A.order_day,
( select sum(b.daily_sales)
from bookDetailsTable b
where A.book_id = B.book_id
and B.order_day between A.order_day -100 and A.order_day
)
from bookDetailsTable A
where A.order_day between ADD_MONTHS(trunc(sysdate),-12) and trunc(sysdate)
If you understand the principle of the query, you should be able to add your other restrictions, like seller or location
This is a perfect case for using analytic functions, specifically the SUM() analytic function, along with the windowing clause:
WITH bookdetailstable AS (SELECT 'ABC' book_id, 1 LOCATION, 'XYZ' seller, 100 daily_sales, to_date('05/05/2016', 'dd/mm/yyyy') order_day FROM dual UNION ALL
SELECT 'ABC' book_id, 1 LOCATION, 'XYZ' seller, 120 daily_sales, to_date('07/05/2016', 'dd/mm/yyyy') order_day FROM dual UNION ALL
SELECT 'ABC' book_id, 1 LOCATION, 'XYZ' seller, 40 daily_sales, to_date('10/02/2016', 'dd/mm/yyyy') order_day FROM dual UNION ALL
SELECT 'ABC' book_id, 1 LOCATION, 'XYZ' seller, 600 daily_sales, to_date('10/02/2017', 'dd/mm/yyyy') order_day FROM dual)
SELECT book_id,
to_char(order_day, 'yyyy-mm-dd') order_day,
total_sales_last_100_days
FROM (SELECT book_id,
order_day,
SUM(daily_sales) OVER (PARTITION BY book_id ORDER BY order_day
RANGE BETWEEN 100 PRECEDING AND CURRENT ROW) total_sales_last_100_days
FROM bookdetailstable
where order_day >= add_months(trunc(sysdate) - 100, -12))
where order_day >= add_months(trunc(SYSDATE), -12);
BOOK_ID ORDER_DAY TOTAL_SALES_LAST_100_DAYS
------- ---------- -------------------------
ABC 2016-02-10 40
ABC 2016-05-05 140
ABC 2016-05-07 260
ABC 2017-02-10 600
This simply says get the sum of daily_sales for each book_id (you can think of the partition by clause as being similar to the group by clause - it simply defines the group of rows the function applies over) ordered by the order_day, looking at the 100 preceding rows and the current row.
If you needed to work out the cumulative sum for specific book_ids based on location (and seller and ....), then you would need to include the extra grouping columns in the partition by clause.
Since you want to restrict the results to the past year, assuming you want the first row to return the count for the past 100 days as well, rather than starting with the current day, you need to include 100 days prior to a year ago. Then you restrict the rows to the year's worth of data you're interested in.
That's because analytic functions work across the data after it's been filtered by the where clause, so if you want to include data from outside the current where clause, you're going to have to look for a way to include those rows and then do the additional filtering later.