How to perform rolling sum in BigQuery

How to perform rolling sum in BigQuery - google-bigquery

I have sample data in BigQuery as -
with temp as (
select DATE("2016-10-02") date_field , 200 as salary
union all
select DATE("2016-10-09"), 500
union all
select DATE("2016-10-16"), 350
union all
select DATE("2016-10-23"), 400
union all
select DATE("2016-10-30"), 190
union all
select DATE("2016-11-06"), 550
union all
select DATE("2016-11-13"), 610
union all
select DATE("2016-11-20"), 480
union all
select DATE("2016-11-27"), 660
union all
select DATE("2016-12-04"), 690
union all
select DATE("2016-12-11"), 810
union all
select DATE("2016-12-18"), 950
union all
select DATE("2016-12-25"), 1020
union all
select DATE("2017-01-01"), 680
) ,
temp2 as (
select * , DATE("2017-01-01") as current_date
from temp
)
select * from temp2
I want to perform rolling sum on this table. As an example, I have set current date to 2017-01-01. Now, this being the current date, I want to go back 30 days and take sum of salary field. Hence, with 2017-01-01 being the current date, the total that should be returned is for the month of December , 2016, which is 690+810+950+1020. How can I do this using StandardSQL ?

Below is for BigQuery Standard SQL for Rolling last 30 days SUM
#standardSQL
SELECT *,
SUM(salary) OVER(
ORDER BY UNIX_DATE(date_field)
RANGE BETWEEN 30 PRECEDING AND 1 PRECEDING
) AS rolling_30_days_sum
FROM `project.dataset.your_table`
You can test, play with above using sample data from your question as below
#standardSQL
WITH temp AS (
SELECT DATE("2016-10-02") date_field , 200 AS salary UNION ALL
SELECT DATE("2016-10-09"), 500 UNION ALL
SELECT DATE("2016-10-16"), 350 UNION ALL
SELECT DATE("2016-10-23"), 400 UNION ALL
SELECT DATE("2016-10-30"), 190 UNION ALL
SELECT DATE("2016-11-06"), 550 UNION ALL
SELECT DATE("2016-11-13"), 610 UNION ALL
SELECT DATE("2016-11-20"), 480 UNION ALL
SELECT DATE("2016-11-27"), 660 UNION ALL
SELECT DATE("2016-12-04"), 690 UNION ALL
SELECT DATE("2016-12-11"), 810 UNION ALL
SELECT DATE("2016-12-18"), 950 UNION ALL
SELECT DATE("2016-12-25"), 1020 UNION ALL
SELECT DATE("2017-01-01"), 680
)
SELECT *,
SUM(salary) OVER(
ORDER BY UNIX_DATE(date_field)
RANGE BETWEEN 30 PRECEDING AND 1 PRECEDING
) AS rolling_30_days_sum
FROM temp
-- ORDER BY date_field
with result
Row date_field salary rolling_30_days_sum
1 2016-10-02 200 null
2 2016-10-09 500 200
3 2016-10-16 350 700
4 2016-10-23 400 1050
5 2016-10-30 190 1450
6 2016-11-06 550 1440
7 2016-11-13 610 1490
8 2016-11-20 480 1750
9 2016-11-27 660 1830
10 2016-12-04 690 2300
11 2016-12-11 810 2440
12 2016-12-18 950 2640
13 2016-12-25 1020 3110
14 2017-01-01 680 3470

This is not exactly a "rolling sum", but it's the exact answer to "I want to go back 30 days and take sum of salary field. Hence, with 2017-01-01 being the current date, the total that should be returned is for the month of December"
with temp as (
select DATE("2016-10-02") date_field , 200 as salary
union all
select DATE("2016-10-09"), 500
union all
select DATE("2016-10-16"), 350
union all
select DATE("2016-10-23"), 400
union all
select DATE("2016-10-30"), 190
union all
select DATE("2016-11-06"), 550
union all
select DATE("2016-11-13"), 610
union all
select DATE("2016-11-20"), 480
union all
select DATE("2016-11-27"), 660
union all
select DATE("2016-12-04"), 690
union all
select DATE("2016-12-11"), 810
union all
select DATE("2016-12-18"), 950
union all
select DATE("2016-12-25"), 1020
union all
select DATE("2017-01-01"), 680
) ,
temp2 as (
select * , DATE("2017-01-01") as current_date_x
from temp
)
select SUM(salary)
from temp2
WHERE date_field BETWEEN DATE_SUB(current_date_x, INTERVAL 30 DAY) AND DATE_SUB(current_date_x, INTERVAL 1 DAY)
3470
Note that I wasn't able to use current_date as a variable name, as it gets replaced by the actual current date.

Related

CONSECUTIVE DAYS QUERY [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 11 months ago.
Improve this question
I have an Oracle DB Connection that has data (SELECT * FROM SALES) as in the picture, i want a query that gives me which 3 consecutive days are those who have the sum of PREMIUM_TOTAL > 100.
I have tried with the method lead, lag , DATADIFF but failed. Also i'm new at this, if you can give me hints please.

If you want 3 rows from successive days then you can use a recursive query:
WITH successive_days (day, products, total, depth) AS (
SELECT entry_date,
TO_CHAR(product_id),
premium_total,
1
FROM table_name
UNION ALL
SELECT s.day + 1,
s.products || ',' || t.product_id,
s.total + t.premium_total,
s.depth + 1
FROM successive_days s
INNER JOIN table_name t
ON (s.day + 1 = t.entry_date)
WHERE s.depth < 3
)
SELECT day AS final_day, products, total
FROM successive_days
WHERE depth = 3
AND total >= 100;
Which, for the sample data:
CREATE TABLE table_name (product_id, entry_date, premium_total) AS
SELECT 1, DATE '2022-03-01', 1 FROM DUAL UNION ALL
SELECT 2, DATE '2022-03-01', 20 FROM DUAL UNION ALL
SELECT 4, DATE '2022-03-02', 30 FROM DUAL UNION ALL
SELECT 5, DATE '2022-03-03', 30 FROM DUAL UNION ALL
SELECT 10, DATE '2022-03-21', 12 FROM DUAL UNION ALL
SELECT 11, DATE '2022-03-31', 40.5 FROM DUAL UNION ALL
SELECT 13, DATE '2022-03-05', 70 FROM DUAL UNION ALL
SELECT 12, DATE '2022-03-05', 80 FROM DUAL UNION ALL
SELECT 14, DATE '2022-03-05', 10 FROM DUAL UNION ALL
SELECT 20, DATE '2022-03-06', 20 FROM DUAL UNION ALL
SELECT 21, DATE '2022-03-07', 30 FROM DUAL UNION ALL
SELECT 22, DATE '2022-03-07', 40 FROM DUAL UNION ALL
SELECT 30, DATE '2022-03-08', 20 FROM DUAL UNION ALL
SELECT 31, DATE '2022-03-09', 50 FROM DUAL UNION ALL
SELECT 40, DATE '2022-03-10', 2 FROM DUAL;
Outputs:
FINAL_DAY
PRODUCTS
TOTAL
2022-03-07 00:00:00
13,20,21
120
2022-03-07 00:00:00
13,20,22
130
2022-03-07 00:00:00
12,20,21
130
2022-03-07 00:00:00
12,20,22
140
2022-03-09 00:00:00
21,30,31
100
2022-03-09 00:00:00
22,30,31
110
If you want all the rows (at least 3) that are all within 3 successive days then you can use MATCH_RECOGNIZE:
SELECT MIN(entry_date) AS start_day,
MAX(entry_date) AS final_day,
LISTAGG(product_id, ',') WITHIN GROUP (ORDER BY entry_date) AS products,
SUM(premium_total) AS total
FROM table_name
MATCH_RECOGNIZE(
ORDER BY entry_date
MEASURES
MATCH_NUMBER() AS mno
ALL ROWS PER MATCH
AFTER MATCH SKIP TO NEXT ROW
PATTERN (first_day+ second_day+ third_day* final_day)
DEFINE
first_day AS FIRST(entry_date) = entry_date,
second_day AS FIRST(entry_date) + 1 = entry_date,
third_day AS FIRST(entry_date) + 2 = entry_date,
final_day AS FIRST(entry_date) + 2 = entry_date
AND SUM(premium_total) >= 100
)
GROUP BY mno;
Which, for the sample data, outputs:
START_DAY
FINAL_DAY
PRODUCTS
TOTAL
2022-03-05 00:00:00
2022-03-07 00:00:00
12,13,14,20,21,22
250
2022-03-05 00:00:00
2022-03-07 00:00:00
13,14,20,21,22
170
2022-03-05 00:00:00
2022-03-07 00:00:00
13,20,21,22
160
2022-03-06 00:00:00
2022-03-08 00:00:00
20,21,22,30
110
2022-03-07 00:00:00
2022-03-09 00:00:00
21,22,30,31
140
2022-03-07 00:00:00
2022-03-09 00:00:00
22,30,31
110
db<>fiddle here

Get List of Last 15 Days Date in SQL

Could SQL get list of date of last 15 days date in a single query?
We can get today date with
select current_date()
We also can get last 15 days date with
select date_add(current_date(), -15)
But how to show the list of last 15 days date?
For example the output is
2020-05-17,
2020-05-18,
2020-05-19,
2020-05-20,
2020-05-21,
2020-05-22,
2020-05-23,
2020-05-24,
2020-05-25,
2020-05-26,
2020-05-27,
2020-05-28,
2020-05-29,
2020-05-30,
2020-05-31

In Hive or Spark-SQL:
select date_add (date_add(current_date,-15),s.i) as dt
from ( select posexplode(split(space(15),' ')) as (i,x)) s
Result:
2020-05-18
2020-05-19
2020-05-20
2020-05-21
2020-05-22
2020-05-23
2020-05-24
2020-05-25
2020-05-26
2020-05-27
2020-05-28
2020-05-29
2020-05-30
2020-05-31
2020-06-01
2020-06-02
See also this answer.

WITH
cte AS ( SELECT 1 num UNION ALL SELECT 2 UNION ALL ... UNION ALL SELECT 15 )
SELECT DATEADD(CURRENT_DATE(), -num)
FROM cte;
Or, for example
WITH
cte1 AS ( SELECT 1 num UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 ),
cte2 AS ( SELECT 0 num
UNION ALL SELECT 1
UNION ALL SELECT 2 )
SELECT DATEADD(CURRENT_DATE(), -cte1.num - cte2.num * 5)
FROM cte1, cte2;

Best 10 of 12 in SQL

Scoring for a running race series. They get points at each monthly race based on their finish. Their total score is their best 10 of 12 monthly races. How do I get that for each member?
tblRacePoints
memnum - Membership number
RaceNo - YYYYMM, e.g., 201910
Points
I want for each their total score of all races, total score of their best 10 of 12, and each of their lowest two scores for the year. Not everyone has done all the races so they may not have 12 entries for the year.
How do I write a query to do this, and then to rank them by their best 10/12 points?

If you are using MSSQL database, you can use ROW_NUMBER as below to achieve your required output. Same logic can be used for some other databases too.
Note: Table structure is just an assumption.
WITH your_table(player_id,dt,points)
AS
(
SELECT 1,'20190101', 100 UNION ALL SELECT 1,'20190201', 200 UNION ALL
SELECT 1,'20190301', 300 UNION ALL SELECT 1,'20190401', 400 UNION ALL
SELECT 1,'20190501', 500 UNION ALL SELECT 1,'20190601', 600 UNION ALL
SELECT 1,'20190701', 700 UNION ALL SELECT 1,'20190801', 800 UNION ALL
SELECT 1,'20190901', 900 UNION ALL SELECT 1,'20191001', 1000 UNION ALL
SELECT 1,'20191101', 1100 UNION ALL SELECT 1,'20191201', 1200 UNION ALL
SELECT 2,'20190101', 400 UNION ALL SELECT 2,'20190201', 200 UNION ALL
SELECT 2,'20190301', 300 UNION ALL SELECT 2,'20190401', 400 UNION ALL
SELECT 2,'20190501', 500 UNION ALL SELECT 2,'20190601', 600 UNION ALL
SELECT 2,'20190701', 700 UNION ALL SELECT 2,'20190801', 800 UNION ALL
SELECT 2,'20190901', 900 UNION ALL SELECT 2,'20191001', 1000 UNION ALL
SELECT 2,'20191101', 1100 UNION ALL SELECT 2,'20191201', 1200
)
SELECT
player_id,
YEAR(dt) Year,
SUM(Points) total_point
FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY player_id, YEAR(dt) ORDER BY Points DESC) RN
FROM your_table
)A
WHERE RN <= 10
GROUP BY player_id, YEAR(dt)

How to calculate MTD and QTD by YTD value in Oracle

There are some data in my table t1 looks like below:
date dealer YTD_Value
2018-01 A 1100
2018-02 A 2000
2018-03 A 3000
2018-04 A 4200
2018-05 A 5000
2018-06 A 5500
2017-01 B 100
2017-02 B 200
2017-03 B 500
... ... ...
then I want to write a SQL to query this table and get below result:
date dealer YTD_Value MTD_Value QTD_Value
2018-01 A 1100 1100 1100
2018-02 A 2000 900 2000
2018-03 A 3000 1000 3000
2018-04 A 4200 1200 1200
2018-05 A 5000 800 2000
2018-06 A 5500 500 2500
2017-01 B 100 100 100
2017-02 B 200 100 200
2017-03 B 550 350 550
... ... ... ... ...
'YTD' means Year to date
'MTD' means Month to date
'QTD' means Quarter to date
So if I want to calculate MTD and QTD value for dealer 'A' in '2018-01', it should be the same as YTD.
If I want to calculate MTD value for dealer 'A' in '2018-06', MTD value should equal to YTD value in '2018-06' minus YTD value in '2018-05'. And the QTD value in '2018-06' should equal to YTD value in '2018-06' minus YTD value in '2018-03' or equal to sum MTD value in (2018-04,2018-05,2018-06)
The same rule for other dealers such as B.
How can I write the SQL to achieve this purpose?

The QTD calculation is tricky, but you can do this query without subqueries. The basic idea is to do a lag() for the monthly value. Then use a max() analytic function to get the YTD value at the beginning of the quarter.
Of course, the first quarter of the year has no such value, so a coalesce() is needed.
Try this:
with t(dte, dealer, YTD_Value) as (
select '2018-01', 'A', 1100 from dual union all
select '2018-02', 'A', 2000 from dual union all
select '2018-03', 'A', 3000 from dual union all
select '2018-04', 'A', 4200 from dual union all
select '2018-05', 'A', 5000 from dual union all
select '2018-06', 'A', 5500 from dual union all
select '2017-01', 'B', 100 from dual union all
select '2017-02', 'B', 200 from dual union all
select '2017-03', 'B', 550 from dual
)
select t.*,
(YTD_Value - lag(YTD_Value, 1, 0) over (partition by substr(dte, 1, 4) order by dte)) as MTD_Value,
(YTD_Value -
coalesce(max(case when substr(dte, -2) in ('03', '06', '09') then YTD_VALUE end) over
(partition by substr(dte, 1, 4) order by dte rows between unbounded preceding and 1 preceding
), 0
)
) as QTD_Value
from t
order by 1
Here is a db<>fiddle.

The following query should do the job. It uses a CTE that translates the varchar date column to dates, and then a few joins to recover the value to compare.
I tested it in this db fiddle and the output matches your expected results.
WITH cte AS (
SELECT TO_DATE(my_date, 'YYYY-MM') my_date, dealer, ytd_value FROM my_table
)
SELECT
TO_CHAR(ytd.my_date, 'YYYY-MM') my_date,
ytd.ytd_value,
ytd.dealer,
ytd.ytd_value - NVL(mtd.ytd_value, 0) mtd_value,
ytd.ytd_value - NVL(qtd.ytd_value, 0) qtd_value
FROM
cte ytd
LEFT JOIN cte mtd ON mtd.my_date = ADD_MONTHS(ytd.my_date, -1) AND mtd.dealer = ytd.dealer
LEFT JOIN cte qtd ON qtd.my_date = ADD_MONTHS(TRUNC(ytd.my_date, 'Q'), -1) AND mtd.dealer = qtd.dealer
ORDER BY dealer, my_date
PS : date is a reserved word in most RDBMS (including Oracle), I renamed that column to my_date in the query.

You can use lag() windows analytic and sum() over .. aggregation functions as :
select "date",dealer,YTD_Value,MTD_Value,
sum(MTD_Value) over (partition by qt order by "date")
as QTD_Value
from
(
with t("date",dealer,YTD_Value) as
(
select '2018-01','A',1100 from dual union all
select '2018-02','A',2000 from dual union all
select '2018-03','A',3000 from dual union all
select '2018-04','A',4200 from dual union all
select '2018-05','A',5000 from dual union all
select '2018-06','A',5500 from dual union all
select '2017-01','B', 100 from dual union all
select '2017-02','B', 200 from dual union all
select '2017-03','B', 550 from dual
)
select t.*,
t.YTD_Value - nvl(lag(t.YTD_Value)
over (partition by substr("date",1,4) order by substr("date",1,4) desc, "date"),0)
as MTD_Value,
substr("date",1,4)||to_char(to_date("date",'YYYY-MM'),'Q')
as qt,
substr("date",1,4) as year
from t
order by year desc, "date"
)
order by year desc, "date";
Rextester Demo

help me in executing the sql query

I have a table like below. I want to calculate the sum of amount for the first 5% customers and then next 20% and next 25% and next 25% and finally remaining. This is just the sample of DB table.
5%=1, so the sum is 100
Next 20%=4, so sum=1800(200+500+300+800)
Next 25%=5, so sum=2900(600+800+500+400+600)
Next 25%=5, so sum=2500(300+800+300+800+300)
Rest=1400
Cus_ID Amount
1004 100
1064 200
1126 500
1280 300
1678 800
1719 600
1862 800
2109 500
2892 400
2957 600
3097 300
3205 800
3399 300
3460 800
4169 300
4380 800
4689 100
4886 200
4906 300
Result
5% 20% 25% next 25% Rest
100 1800 2900 2500 1400

WITH T(Cus_ID,Amount ) AS
(
SELECT 1004, 100 UNION ALL
SELECT 1064, 200 UNION ALL
SELECT 1126, 500 UNION ALL
SELECT 1280, 300 UNION ALL
SELECT 1678, 800 UNION ALL
SELECT 1719, 600 UNION ALL
SELECT 1862, 800 UNION ALL
SELECT 2109, 500 UNION ALL
SELECT 2892, 400 UNION ALL
SELECT 2957, 600 UNION ALL
SELECT 3097, 300 UNION ALL
SELECT 3205, 800 UNION ALL
SELECT 3399, 300 UNION ALL
SELECT 3460, 800 UNION ALL
SELECT 4169, 300 UNION ALL
SELECT 4380, 800 UNION ALL
SELECT 4689, 100 UNION ALL
SELECT 4886, 200 UNION ALL
SELECT 4906, 300
), T2 AS
(
SELECT *,
ROW_NUMBER() OVER (ORDER BY Cus_ID) AS RN,
ROW_NUMBER() OVER (ORDER BY Cus_ID)/ CAST(COUNT(*) OVER() AS FLOAT) AS Pct
FROM T
), T3(Amount, Grp) AS
(
SELECT a.Amount, CASE WHEN ISNULL(b.Pct,0) < 0.05 THEN 1
WHEN b.Pct < 0.25 THEN 2
WHEN b.Pct < 0.50 THEN 3
WHEN b.Pct < 0.75 THEN 4
ELSE 5
END
FROM T2 a LEFT JOIN T2 b ON b.RN=a.RN-1
)
SELECT SUM(Amount) AS Amount, Grp
FROM T3
GROUP BY Grp

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to perform rolling sum in BigQuery - google-bigquery

Related

CONSECUTIVE DAYS QUERY [closed]

Get List of Last 15 Days Date in SQL

Best 10 of 12 in SQL

How to calculate MTD and QTD by YTD value in Oracle

help me in executing the sql query

Categories

Resources