BigQuery: Rolling daily count visitor's summary of payment - sql

I have this data:
date
visitor_id
total_payment
2022-01-01
A
20
2022-01-01
B
15
2022-01-01
C
20
2022-01-02
B
10
2022-01-02
D
25
I'd like to have daily count of visitor with total_payment equal or greater than 20$, with that being said, result I'm hoping is:
date
count_visitor
2022-01-01
2
2022-01-02
4
2022-01-01 is 2 because only A and C have payment more than 20$, however on 2022-01-02 additional 2 more because B is 35$ (sum) and D is 25$.
Is there any possible query for this? I hope I'm clear on my description. Thank you in advance.

You can use this query as solution.
First, I calculate cumulative payments of each user.
Then, I find the minimum date for each user that exceeds 20$ cumulative payment.
At the last step, I count number of users for each minimum date, and also accumulate that number.
In the output you don't have to have first_day_users column, but I kept it to make it easier to understand the code.
So the output looks like that:
WITH
data AS(
SELECT "2022-01-01" AS date, "A" AS visitor_id, 20 AS total_payment UNION ALL
SELECT "2022-01-01" AS date, "B" AS visitor_id, 15 AS total_payment UNION ALL
SELECT "2022-01-01" AS date, "C" AS visitor_id, 20 AS total_payment UNION ALL
SELECT "2022-01-02" AS date, "B" AS visitor_id, 10 AS total_payment UNION ALL
SELECT "2022-01-02" AS date, "D" AS visitor_id, 25 AS total_payment
),
user_cumulatives as
(
SELECT
visitor_id,
date,
SUM(total_payment) OVER (PARTITION BY visitor_id ORDER BY date) as cumulative_payment
FROM data
),
user_first_dates as
(
select visitor_id, min(date) as date
from user_cumulatives
where cumulative_payment >= 20
group by 1
)
select date, count(*) as first_day_users, sum(count(*)) over (order by date) as count_visitor
from user_first_dates
group by 1
order by date

Welcome #Indri
The query below will give you a running sum of the rows per day where the total_amount of greater than of equal to 20, I believe this should give you the answer you are looking for:
WITH data AS(
SELECT "2022-01-01" AS date, "A" AS visitor_id, 20 AS total_payment
UNION ALL
SELECT "2022-01-01" AS date, "B" AS visitor_id, 15 AS total_payment
UNION ALL
SELECT "2022-01-01" AS date, "C" AS visitor_id, 20 AS total_payment
UNION ALL
SELECT "2022-01-02" AS date, "A" AS visitor_id, 10 AS total_payment
UNION ALL
SELECT "2022-01-02" AS date, "D" AS visitor_id, 25 AS total_payment
)
SELECT
*,
COUNT(*) OVER(ORDER BY date)
FROM data
WHERE total_payment >= 20

Related

Cumulative average and count over occurrences increasing in time

I am looking to calculate an average (over number of occurrences) and observation count over increasing dates per instance (take customer as an example instance) in Oracle SQL.
So the count will increase as date goes up, the average could go up or down.
I can do it for an individual case and a fixed time interval, but I would like to see a series for every customer, with every row a separate date where a sale occurred. Right now, I have a single row per customer. Here is the SQL summarizing the average and count for a fixed time interval:
SELECT AVG(bought_usd) as avg_bought
, COUNT(*) as num_of_interactions
, cust_id
FROM salesTable
WHERE obsdate >= DATE('2000-01-01')
AND obsdate <= DATE('2022-01-01')
GROUP BY cust_id
So for an input of:
the output should look like:
Use analytic functions:
SELECT "DATE",
cust,
AVG(bought_usd) OVER (PARTITION BY cust ORDER BY "DATE") AS avg,
COUNT(*) OVER (PARTITION BY cust ORDER BY "DATE") AS cnt
FROM salestable
ORDER BY cust, "DATE"
Note: DATE is a reserved word. You should not use it as an identifier.
Which, for the sample data:
CREATE TABLE salestable ("DATE", cust, bought_usd) AS
SELECT DATE '2010-10-01', 'Cust A', 100 FROM DUAL UNION ALL
SELECT DATE '2010-12-18', 'Cust A', 50 FROM DUAL UNION ALL
SELECT DATE '2010-12-18', 'Cust B', 120 FROM DUAL UNION ALL
SELECT DATE '2011-10-01', 'Cust B', 180 FROM DUAL;
Outputs:
DATE
CUST
AVG
CNT
2010-10-01 00:00:00
Cust A
100
1
2010-12-18 00:00:00
Cust A
75
2
2010-12-18 00:00:00
Cust B
120
1
2011-10-01 00:00:00
Cust B
150
2
db<>fiddle here

How to calculate needed amount for supply order?

Table "client_orders":
date
ordered
id
28.05
50
1
23.06
60
2
24.05
50
1
25.06
130
2
Table "stock":
id
amount
date
1
60
23.04
2
90
25.04
1
10
24.04
2
10
24.06
I want to calculate the amount I need to order (to fulfill the stock) for what date. For instance, it should be:
30 by 28.05 (60+10-50-50=-30) for id = 1
-90 by 25.06 (90-60+10-130=-90) for id = 2
I tried to do it with LAG function, but the problem is that the stock here is not updating.
SELECT *,
SUM(amount - ordered) OVER (PARTITION BY sd.id ORDER BY d.date ASC)
FROM stock sd
LEFT JOIN (SELECT date,
id,
ordered
FROM client_orders) AS d
ON sd.id = d.id
Couldn't find anything similar on the web. Grateful if you share articles/examples how to do that.
You could make a union of the two tables and sum all stock amounts with the negative of ordered amounts. For the date you could instead take the corresponding maximum value.
SELECT id,
SUM(amount),
MAX(date)
FROM (SELECT id,
-ordered AS amount,
date
FROM client_orders
UNION
SELECT *
FROM stock
) stock_and_orders
GROUP BY id
Try it here.

Add a column with customers orders count at the time they passed the order

I have the following table
order_id
created_at
customer_id
1
2020-01-02
11
2
2020-02-03
12
3
2020-02-03
11
I would like to add a column "customer_orders_count" that will assign the number of orders that a customer passed to each transaction, ie obtain this table :
order_id
created_at
customer_id
customer_orders_count
1
2020-01-02
11
1
2
2020-02-03
12
1
2
2020-02-03
11
2
My problem it's I can't find how to calculated a local "customer_orders_count" dependind on each order, I only managed to add a column with the global "customer_orders_count" and for example for the first row order_id=1 I'll get customer_orders_count=2 whereas I'll like to be 1.
Does anyone has and idea ?
Use cumulative count:
with mytable as (
select 1 as order_id, date '2020-01-02' as created_at, 11 as customer_id union all
select 2, '2020-02-03', 12 union all
select 3 , '2020-02-03', 11
)
select *, count(*) over (partition by customer_id order by created_at) as customer_orders_count
from mytable
order by order_id
Use row_number():
select t.*,
row_number() over (partition by customer_id order by created_at) as customer_order_count
from t;
This is subtly different from using a cumulative count(). This version guarantees that the numbers for a given customer are never duplicated, even when the dates are the same. A cumulative count has no such guarantee.

take sum of last 7 days from the observed date in BigQuery

I have a table on which I want to compute the sum of revenue on last 7 days from the observed day. Here is my table -
with temp as
(
select DATE('2019-06-29') as transaction_date, "x"as id, 0 as revenue
union all
select DATE('2019-06-30') as transaction_date, "x"as id, 80 as revenue
union all
select DATE('2019-07-04') as transaction_date, "x"as id, 64 as revenue
union all
select DATE('2019-07-06') as transaction_date, "x"as id, 64 as revenue
union all
select DATE('2019-07-11') as transaction_date, "x"as id, 75 as revenue
union all
select DATE('2019-07-12') as transaction_date, "x"as id, 0 as revenue
)
select * from temp
I want to take a sum of last 7 days for each transaction_date. For instance for the last record which has transaction_date = 2019-07-12, I would like to add another column which adds up revenue for last 7 days from 2019-07-12 (which is until 2019-07-05), hence the value of new rollup_revenue column would be 0 + 75 + 64 = 139. Likewise, I need to compute the rollup for all the dates for every ID.
Note - the ID may or may not appear daily.
I have tried self join but I am unable to figure it out.
Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
SUM(revenue) OVER(
PARTITION BY id ORDER BY UNIX_DATE(transaction_date)
RANGE BETWEEN 6 PRECEDING AND CURRENT ROW
) rollup_revenue
FROM `project.dataset.temp`
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.temp` AS (
SELECT DATE '2019-06-29' AS transaction_date, 'x' AS id, 0 AS revenue UNION ALL
SELECT '2019-06-30', 'x', 80 UNION ALL
SELECT '2019-07-04', 'x', 64 UNION ALL
SELECT '2019-07-06', 'x', 64 UNION ALL
SELECT '2019-07-11', 'x', 75 UNION ALL
SELECT '2019-07-12', 'x', 0
)
SELECT *,
SUM(revenue) OVER(
PARTITION BY id ORDER BY UNIX_DATE(transaction_date)
RANGE BETWEEN 6 PRECEDING AND CURRENT ROW
) rollup_revenue
FROM `project.dataset.temp`
-- ORDER BY transaction_date
with result
Row transaction_date id revenue rollup_revenue
1 2019-06-29 x 0 0
2 2019-06-30 x 80 80
3 2019-07-04 x 64 144
4 2019-07-06 x 64 208
5 2019-07-11 x 75 139
6 2019-07-12 x 0 139
One option uses a correlated subquery to find the rolling sum:
SELECT
transaction_date,
revenue,
(SELECT SUM(t2.revenue) FROM temp t2 WHERE t2.transaction_date
BETWEEN DATE_SUB(t1.transaction_date, INTERVAL 7 DAY) AND
t1.transaction_date) AS rev_7_days
FROM temp t1
ORDER BY
transaction_date;

Writing subquery within SUM using values of 1 table

Now I have a table and I am trying to calculate for each book_id the total sales in the past 100 days for every day in the past 1 year.
book_id location seller daily_sales order_day
ABC 1 XYZ 100 2017-05-05
ABC 1 XYZ 120 2017-05-07
ABC 1 XYZ 40 2017-02-10
.
.
.
So what I am trying to expect in the result is:
book_id order_day sum
ABC 2017-05-05 100+40
ABC 2017-05-07 100+120+40
ABC 2017-02-10 40
For this I wrote a query like this:
select book_id, to_char(order_day),
SUM(case when order_day between order_day -100 and order_day then daily_sales else 0 end) sum
FROM bookDetailsTable
where location = 1 AND ORDER_DAY BETWEEN TO_DATE('20170725','YYYYMMDD') - 359 AND TO_DATE('20170725','YYYYMMDD')
group by seller, book_id, order_day
I guess I am doing wrong and I should write a select statement within the SUM statement to select data for the past 100 days.
You should get the result with this
select A.book_id,
A.order_day,
( select sum(b.daily_sales)
from bookDetailsTable b
where A.book_id = B.book_id
and B.order_day between A.order_day -100 and A.order_day
)
from bookDetailsTable A
where A.order_day between ADD_MONTHS(trunc(sysdate),-12) and trunc(sysdate)
If you understand the principle of the query, you should be able to add your other restrictions, like seller or location
This is a perfect case for using analytic functions, specifically the SUM() analytic function, along with the windowing clause:
WITH bookdetailstable AS (SELECT 'ABC' book_id, 1 LOCATION, 'XYZ' seller, 100 daily_sales, to_date('05/05/2016', 'dd/mm/yyyy') order_day FROM dual UNION ALL
SELECT 'ABC' book_id, 1 LOCATION, 'XYZ' seller, 120 daily_sales, to_date('07/05/2016', 'dd/mm/yyyy') order_day FROM dual UNION ALL
SELECT 'ABC' book_id, 1 LOCATION, 'XYZ' seller, 40 daily_sales, to_date('10/02/2016', 'dd/mm/yyyy') order_day FROM dual UNION ALL
SELECT 'ABC' book_id, 1 LOCATION, 'XYZ' seller, 600 daily_sales, to_date('10/02/2017', 'dd/mm/yyyy') order_day FROM dual)
SELECT book_id,
to_char(order_day, 'yyyy-mm-dd') order_day,
total_sales_last_100_days
FROM (SELECT book_id,
order_day,
SUM(daily_sales) OVER (PARTITION BY book_id ORDER BY order_day
RANGE BETWEEN 100 PRECEDING AND CURRENT ROW) total_sales_last_100_days
FROM bookdetailstable
where order_day >= add_months(trunc(sysdate) - 100, -12))
where order_day >= add_months(trunc(SYSDATE), -12);
BOOK_ID ORDER_DAY TOTAL_SALES_LAST_100_DAYS
------- ---------- -------------------------
ABC 2016-02-10 40
ABC 2016-05-05 140
ABC 2016-05-07 260
ABC 2017-02-10 600
This simply says get the sum of daily_sales for each book_id (you can think of the partition by clause as being similar to the group by clause - it simply defines the group of rows the function applies over) ordered by the order_day, looking at the 100 preceding rows and the current row.
If you needed to work out the cumulative sum for specific book_ids based on location (and seller and ....), then you would need to include the extra grouping columns in the partition by clause.
Since you want to restrict the results to the past year, assuming you want the first row to return the count for the past 100 days as well, rather than starting with the current day, you need to include 100 days prior to a year ago. Then you restrict the rows to the year's worth of data you're interested in.
That's because analytic functions work across the data after it's been filtered by the where clause, so if you want to include data from outside the current where clause, you're going to have to look for a way to include those rows and then do the additional filtering later.