take sum of last 7 days from the observed date in BigQuery - google-bigquery

I have a table on which I want to compute the sum of revenue on last 7 days from the observed day. Here is my table -
with temp as
(
select DATE('2019-06-29') as transaction_date, "x"as id, 0 as revenue
union all
select DATE('2019-06-30') as transaction_date, "x"as id, 80 as revenue
union all
select DATE('2019-07-04') as transaction_date, "x"as id, 64 as revenue
union all
select DATE('2019-07-06') as transaction_date, "x"as id, 64 as revenue
union all
select DATE('2019-07-11') as transaction_date, "x"as id, 75 as revenue
union all
select DATE('2019-07-12') as transaction_date, "x"as id, 0 as revenue
)
select * from temp
I want to take a sum of last 7 days for each transaction_date. For instance for the last record which has transaction_date = 2019-07-12, I would like to add another column which adds up revenue for last 7 days from 2019-07-12 (which is until 2019-07-05), hence the value of new rollup_revenue column would be 0 + 75 + 64 = 139. Likewise, I need to compute the rollup for all the dates for every ID.
Note - the ID may or may not appear daily.
I have tried self join but I am unable to figure it out.

Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
SUM(revenue) OVER(
PARTITION BY id ORDER BY UNIX_DATE(transaction_date)
RANGE BETWEEN 6 PRECEDING AND CURRENT ROW
) rollup_revenue
FROM `project.dataset.temp`
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.temp` AS (
SELECT DATE '2019-06-29' AS transaction_date, 'x' AS id, 0 AS revenue UNION ALL
SELECT '2019-06-30', 'x', 80 UNION ALL
SELECT '2019-07-04', 'x', 64 UNION ALL
SELECT '2019-07-06', 'x', 64 UNION ALL
SELECT '2019-07-11', 'x', 75 UNION ALL
SELECT '2019-07-12', 'x', 0
)
SELECT *,
SUM(revenue) OVER(
PARTITION BY id ORDER BY UNIX_DATE(transaction_date)
RANGE BETWEEN 6 PRECEDING AND CURRENT ROW
) rollup_revenue
FROM `project.dataset.temp`
-- ORDER BY transaction_date
with result
Row transaction_date id revenue rollup_revenue
1 2019-06-29 x 0 0
2 2019-06-30 x 80 80
3 2019-07-04 x 64 144
4 2019-07-06 x 64 208
5 2019-07-11 x 75 139
6 2019-07-12 x 0 139

One option uses a correlated subquery to find the rolling sum:
SELECT
transaction_date,
revenue,
(SELECT SUM(t2.revenue) FROM temp t2 WHERE t2.transaction_date
BETWEEN DATE_SUB(t1.transaction_date, INTERVAL 7 DAY) AND
t1.transaction_date) AS rev_7_days
FROM temp t1
ORDER BY
transaction_date;

Related

How can i do a rolling 12 month sum when some year month values are missing?

I am calculating rolling sum as such:
select
city,
month_year,
person,
sum(total) over (partition by person,city order by month_year rows between 11 preceding and current row) rolling_one_year
from
(select
city,
month_year,
person,
sum(amount_dollar) as total
from db1 d
group by 1,2,3) ;
however sometimes the not every person has a month_year value: e.g. a rolling 12 year some is as below IF we had consecutive month values:
but what if a month was missing for person e.g. 202208, according to the logic above it would calculate the following 202201 - 202301 which as we know 13 months.
How can i adapt my code above to ensure that the range of months selected is within 1 year?
A possible solution is to LEFT JOIN your data to the calendar table.
Here is a guide on how to create the calendar table if you don't have one.
Create a date table in hive
You should use a logical window frame RANGE instead of ROWS. consider below query.
WITH monthly_total AS (
SELECT '201911' year_month, 4 total UNION ALL
SELECT '201912' year_month, 10 total UNION ALL
SELECT '202201' year_month, 1 total UNION ALL
SELECT '202202' year_month, 3 total UNION ALL
SELECT '202203' year_month, 9 total UNION ALL
SELECT '202204' year_month, 4 total UNION ALL
SELECT '202205' year_month, 2 total UNION ALL
SELECT '202206' year_month, 8 total UNION ALL
SELECT '202207' year_month, 6 total UNION ALL
SELECT '202209' year_month, 3 total UNION ALL
SELECT '202210' year_month, 10 total UNION ALL
SELECT '202211' year_month, 1 total UNION ALL
SELECT '202212' year_month, 3 total UNION ALL
SELECT '202301' year_month, 50 total
)
SELECT *, SUM(total) OVER w AS rolling_12m_sum
FROM monthly_total
WINDOW w AS (
ORDER BY CAST(SUBSTR(year_month, 1, 4) AS INTEGER) * 12 + CAST(SUBSTR(year_month, 5, 2) AS INTEGER)
RANGE BETWEEN 11 PRECEDING AND CURRENT ROW
) ORDER BY year_month;
I'ved ignored partition by person,city for simplicity.
Below would be helpful in case you're not familiar with RANGE
https://learnsql.com/blog/difference-between-rows-range-window-functions/
Query results

Oracle Running Subtraction

I have the below data. I want to subtract the first row from Total Qty (80) and then subtract the rest of the rows from QTY from the previous row of QTY1.
QTY QTY1 DATE TOTAL QTY
2 78 01-JAN-20 80
1 77 15-JAN-20
46 31 22-JAN-20
16 15 27-JAN-20
Is there a way to do this? Any help is greatly appreciated. Thanks
select
t.*
,first_value(TOTAL_QTY)over(order by DT) - sum(QTY)over(order by DT) as QTY1
from t;
Full example with your sample data:
with T(QTY, DT, TOTAL_QTY) as (
select 2 , to_date('01-JAN-20','dd-mon-yy'),80 from dual union all
select 1 , to_date('15-JAN-20','dd-mon-yy'),null from dual union all
select 46, to_date('22-JAN-20','dd-mon-yy'),null from dual union all
select 16, to_date('27-JAN-20','dd-mon-yy'),null from dual
)
select
t.*
,first_value(TOTAL_QTY)over(order by DT) - sum(QTY)over(order by DT) as QTY1
from t;
Result:
QTY DT TOTAL_QTY QTY1
2 2020-01-01 80 78
1 2020-01-15 77
46 2020-01-22 31
16 2020-01-27 15
SQL tables represent unordered sets. Your question seems to rely on the ordering of the rows. Let me assume you have a column that represents the ordering.
Use a cumulative sum:
select t.*,
sum(total_qty) over () - sum(qty) over (order by <ordering col>) as qty1
from t;
Here is a db<>fiddle.
Something like this (the CTE is just your data): if you add any more stuff later (in the total_qty column), then that would also get added to the total_qty calcuation (as would be typical for additions to, and subtractions from, inventory.
with d as
(select 2 qty, 78 qty1 , to_date('01-JAN-20','dd-mon-rr') datecol, 80 total_qty from dual union all
select 1, 77, to_date('15-JAN-20','dd-mon-rr'),null from dual union all
select 46 , 31, to_date('22-JAN-20','dd-mon-rr'),null from dual union all
select 16 , 15 , to_date('27-JAN-20','dd-mon-rr'),null from dual
)
select sum(total_qty) over (order by datecol) - sum(qty) over (order by datecol)
from d
You can do:
select
qty,
first_value(total_qty) over(order by date)
- sum(qty) over(order by date) as qty1,
date, total_qty
from t
order by date

Percentile for Year-to-Day (successive YtD)

I have the following data:
ID |MPERIOD|FRDATE |FR
===+=======+==========+==
100|2017M01|01.01.2017|60 \ \ \
101|2017M01|02.01.2017|75 > YtD 2017M01 | |
103|2017M01|08.01.2017|48 / > Ytd 2017M02 |
104|2017M02|06.02.2017|55 | > YtD 2017M03
105|2017M02|15.02.2017|63 / |
106|2017M03|18.03.2017|41 |
107|2017M03|22.03.2017|71 /
...|.......|..........|..
I need to calculate 80% percentile for each month and for YtD in (up to) that month (from start of year up to current calculation moment).
I use the following SQL query:
SELECT DISTINCT mperiod,
ROUND(PERCENTILE_CONT(0.8) WITHIN GROUP (ORDER BY fr OVER (PARTITION BY mperiod),2) "80%_FR",
ROUND(PERCENTILE_CONT(0.8) WITHIN GROUP (ORDER BY fr OVER (PARTITION BY SUBSTR(mperiod,1,4)),2) "80%_FR_YtD"
FROM mytable
ORDER BY 1
If I run this query in last day of month when I do not have data for the following month yet then this SQL will correctly calculate YtD value. For example, if I have data for first six months and do not have data for seventh month, and calculate this for sixth month then calculation with year partition OVER (PARTITION BY SUBSTR(mperiod,1,4) will calculate correct YtD value. But if I have data after this month it will be included in PARTITION BY and will not calculate up to that moment.
How to calculate YtD retroactively, for previous months!? For example, the calculation of YtD for third month should include calculation for only those first three months in year, not all months in year.
Since you can't use a windowing clause or add in additional order by columns in PERCENTILE_CONT (boo!), here's one way of achieving your aims. N.B. it's not pretty, and I'm sure it won't be terrifically performant, but it should work at least!
WITH mytable AS (SELECT 100 ID, '2017M01' mperiod, to_date('01/01/2017', 'dd/mm/yyyy') frdate, 60 fr FROM dual UNION ALL
SELECT 101 ID, '2017M01' mperiod, to_date('02/01/2017', 'dd/mm/yyyy') frdate, 75 fr FROM dual UNION ALL
SELECT 103 ID, '2017M01' mperiod, to_date('08/01/2017', 'dd/mm/yyyy') frdate, 48 fr FROM dual UNION ALL
SELECT 104 ID, '2017M02' mperiod, to_date('06/02/2017', 'dd/mm/yyyy') frdate, 55 fr FROM dual UNION ALL
SELECT 105 ID, '2017M02' mperiod, to_date('15/02/2017', 'dd/mm/yyyy') frdate, 63 fr FROM dual UNION ALL
SELECT 106 ID, '2017M03' mperiod, to_date('18/03/2017', 'dd/mm/yyyy') frdate, 41 fr FROM dual UNION ALL
SELECT 107 ID, '2017M03' mperiod, to_date('22/03/2017', 'dd/mm/yyyy') frdate, 71 fr FROM dual UNION ALL
SELECT 108 ID, '2016M12' mperiod, to_date('22/12/2016', 'dd/mm/yyyy') frdate, 42 fr FROM dual UNION ALL
SELECT 109 ID, '2016M11' mperiod, to_date('22/11/2016', 'dd/mm/yyyy') frdate, 32 fr FROM dual),
unpckd AS (SELECT mt.ID,
mt.mperiod,
mt.frdate,
mt.fr,
CASE WHEN substr(mt.mperiod, -2) <= d.id THEN SUBSTR(mt.mperiod, 1, 5) || to_char(d.id, 'fm09')
END new_mperiod,
d.id dummy_id
FROM mytable mt
INNER JOIN (SELECT LEVEL ID
FROM dual
CONNECT BY LEVEL <= 12) d ON substr(mt.mperiod, -2) <= d.id),
res AS (SELECT mperiod,
new_mperiod,
ROUND(PERCENTILE_CONT(0.8) WITHIN GROUP (ORDER BY fr) OVER (PARTITION BY CASE WHEN mperiod = new_mperiod THEN mperiod END),2) fr_80,
ROUND(PERCENTILE_CONT(0.8) WITHIN GROUP (ORDER BY fr) OVER (PARTITION BY new_mperiod),2) fr_80_ytd
FROM unpckd)
SELECT DISTINCT new_mperiod mperiod,
fr_80 "80%_FR",
fr_80_ytd "80%_FR_YtD"
FROM res
WHERE new_mperiod = mperiod
ORDER BY 1;
MPERIOD 80%_FR 80%_FR_YtD
-------- ---------- ----------
2016M11 32 32
2016M12 42 40
2017M01 69 69
2017M02 61.4 65.4
2017M03 65 69.4
This works by doing a partial cross join between the numbers 1 to 12 (12 months in the year) and the last two digits of the mperiod. Once we have that, we now know the overall ytd period that the rows belong to (ie. number 1 will match to the 2017M01, 2 will match to 2017M01 and 2017M02, etc), so you can now produce a label for this calculated value (which I've called new_mperiod) and use that to partition against.
It's obviously going to be inefficient (since the partial cross join will generate more rows than is necessary for a year that's not got data for all its months, which get filtered out later, but I can't think of a better way of doing it.

Oracle: Need to calculate rolling average for past 3 months where we have more than one submission per month

I've seen many examples of rolling averages in oracle but done do quite what I desire.
This is my raw data
DATE SCORE AREA
----------------------------
01-JUL-14 60 A
01-AUG-14 45 A
01-SEP-14 45 A
02-SEP-14 50 A
01-OCT-14 30 A
02-OCT-14 45 A
03-OCT-14 50 A
01-JUL-14 60 B
01-AUG-14 45 B
01-SEP-14 45 B
02-SEP-14 50 B
01-OCT-14 30 B
02-OCT-14 45 B
03-OCT-14 50 B
This is the desired result for my rolling average
MMYY AVG AREA
-------------------------
JUL-14 60 A
AUG-14 52.5 A
SEP-14 50 A
OCT-14 44 A
JUL-14 60 B
AUG-14 52.5 B
SEP-14 50 B
OCT-14 44 B
The way I need it to work is that for each MMYY, I need to look back 3 months, and AVG the scores per dept. So for example,
For Area A in OCT, in the last 3 months from oct, there were 6 studies, (45+45+50+30+45+50)/6 = 44.1
Normally I would write the query like so
SELECT
AREA,
TO_CHAR(T.DT,'MMYY') MMYY,
ROUND(AVG(SCORE)
OVER (PARTITION BY AREA ORDER BY TO_CHAR(T.DT,'MMYY') ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),1)
AS AVG
FROM T
This will look over the last 3 enteries not the last 3 months
One way to do this is to mix aggregation functions with analytic functions. The key idea for average is to avoid using avg() and instead do a sum() divided by a count(*).
SELECT AREA, TO_CHAR(T.DT, 'MMYY') AS MMYY,
SUM(SCORE) / COUNT(*) as AvgScore,
SUM(SUM(SCORE)) OVER (PARTITION BY AREA ORDER BY MAX(T.DT) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) / SUM(COUNT(*)) OVER (PARTITION BY AREA ORDER BY MAX(T.DT) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM t
GROUP BY AREA, TO_CHAR(T.DT, 'MMYY') ;
Note the order by clause. If your data spans years, then using the MMYY format poses problems. It is better to use a format such as YYYY-MM for months, because the alphabetical ordering is the same as the natural ordering.
You can specify also ranges, not only rows.
SELECT
AREA,
TO_CHAR(T.DT,'MMYY') MMYY,
ROUND(AVG(SCORE)
OVER (PARTITION BY AREA
ORDER BY DT RANGE BETWEEN INTERVAL '3' MONTH PRECEDING AND CURRENT ROW))
AS AVG
FROM T
Since CURRENT ROW is the default, just ORDER BY DT RANGE INTERVAL '3' MONTH PRECEDING should work as well. Perhaps you have to do some fine-tuning, I did not test the behaviour regarding the 28/29/30/31 days per month issue.
Check the Oracle Windowing Clause for further details.
SQL> WITH DATA AS(
2 SELECT to_date('01-JUL-14','DD-MON-RR') dt, 60 score, 'A' area FROM dual UNION ALL
3 SELECT to_date('01-AUG-14','DD-MON-RR') dt, 45 score, 'A' area FROM dual UNION ALL
4 SELECT to_date('01-SEP-14','DD-MON-RR') dt, 45 score, 'A' area FROM dual UNION ALL
5 SELECT to_date('02-SEP-14','DD-MON-RR') dt, 50 score, 'A' area FROM dual UNION ALL
6 SELECT to_date('01-OCT-14','DD-MON-RR') dt, 30 score, 'A' area FROM dual UNION ALL
7 SELECT to_date('02-OCT-14','DD-MON-RR') dt, 45 score, 'A' area FROM dual UNION ALL
8 SELECT to_date('03-OCT-14','DD-MON-RR') dt, 50 score, 'A' area FROM dual UNION ALL
9 SELECT to_date('01-JUL-14','DD-MON-RR') dt, 60 score, 'B' area FROM dual UNION ALL
10 SELECT to_date('01-AUG-14','DD-MON-RR') dt, 45 score, 'B' area FROM dual UNION ALL
11 SELECT to_date('01-SEP-14','DD-MON-RR') dt, 45 score, 'B' area FROM dual UNION ALL
12 SELECT to_date('02-SEP-14','DD-MON-RR') dt, 50 score, 'B' area FROM dual UNION ALL
13 SELECT to_date('01-OCT-14','DD-MON-RR') dt, 30 score, 'B' area FROM dual UNION ALL
14 SELECT to_date('02-OCT-14','DD-MON-RR') dt, 45 score, 'B' area FROM dual UNION ALL
15 SELECT to_date('03-OCT-14','DD-MON-RR') dt, 50 score, 'B' area FROM dual)
16 SELECT TO_CHAR(T.DT, 'MON-RR') AS MMYY,
17 round(
18 SUM(SUM(SCORE)) OVER (PARTITION BY AREA ORDER BY MAX(T.DT) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)/
19 SUM(COUNT(*)) OVER (PARTITION BY AREA ORDER BY MAX(T.DT) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),1)
20 AS avg_score,
21 AREA
22 FROM data t
23 GROUP BY AREA, TO_CHAR(T.DT, 'MON-RR')
24 /
MMYY AVG_SCORE A
------ ---------- -
JUL-14 60 A
AUG-14 52.5 A
SEP-14 50 A
OCT-14 44.2 A
JUL-14 60 B
AUG-14 52.5 B
SEP-14 50 B
OCT-14 44.2 B
8 rows selected.
SQL>
From next time, I would expect you to provide the create and insert statements so that we don't have to spend time on preparing a test case.
And, why YY format? Haven't you seen the Y2K bug? Please use YYYY format.

Moving average of 2 columns

Hello I have a problem. I know how to calculate moving average last 3 months using oracle analytic functions... but my situatiion is a little different
Month-----ProductType-----Sales----------Average(HAVE TO FIND THIS)
1---------A---------------10
1---------B---------------12
1---------C---------------17
2---------A---------------21
3---------C---------------2
3---------B---------------21
4---------B---------------23
5
6
7
8
9
So we have sales for each month and each product type... I need to calculate the moving average of the last 3 months and the particular product.
example:
For month 4 and Produt B it would be (21+0+12)/3
Any ideas ?
Another option is to use the windowing clause of analytic functions
with my_data as (
select 1 as month, 'A' as product, 10 as sales from dual union all
select 1 as month, 'B' as product, 12 as sales from dual union all
select 1 as month, 'C' as product, 17 as sales from dual union all
select 2 as month, 'A' as product, 21 as sales from dual union all
select 3 as month, 'C' as product, 2 as sales from dual union all
select 3 as month, 'B' as product, 21 as sales from dual union all
select 4 as month, 'B' as product, 23 as sales from dual
)
select
month,
product,
sales,
nvl(sum(sales)
over (partition by product order by month
range between 3 preceding and 1 preceding),0)/3 as average_sales
from my_data
order by month, product
SELECT month,
productType,
sales,
(lag(sales, 3) over (partition by produtType order by month) +
lag(sales, 2) over (partition by productType order by month) +
lag(sales, 1) over (partition by productType order by month)/3 moving_avg
FROM your_table_name