BigQuery conditional running sum - google-bigquery

Using SQL (Standard BigQuery), I need to calculate a running average of the most recent 4 weeks of sales in which flag was FALSE. The average is actually a baseline, so it does not include the current week's sales.
week flag sales
1 FALSE 3
2 FALSE 1
3 FALSE 3
4 FALSE 0
5 FALSE 3
6 FALSE 6
7 TRUE 3
8 TRUE 1
9 FALSE 3
10 FALSE 9
11 FALSE 6
12 FALSE 4
13 TRUE 4
14 TRUE 2
15 FALSE 1
For example, week 6 has (week2+week3+week+week5)/4=(1+3+0+3)/4=7/4=1.75.
For, say, week 10, the running average should not include week 7 and week 8 since flag is true. Week 10 should be (week4+week+5+week6+week9)/4=3
The whole table should like
week avg
1 NULL
2 NULL
3 NULL
4 NULL
5 1.75
6 1.75
7 3
8 3
9 3
10 3
11 5.25
12 6
13 5.5
14 5.5
15 5.5
I've been trying to augment the answer here
SQL Select Statement For Calculating A Running Average Column
Thanks,
Jim

Below is for BigQuery Standard SQL
#standardSQL
SELECT week,
(SELECT IF(COUNT(1) = 4, AVG(sales), NULL)
FROM (
SELECT sales FROM UNNEST(arr) WHERE NOT flag ORDER BY week DESC LIMIT 4
)
)
FROM (
SELECT week, ARRAY_AGG(STRUCT(week, flag, sales)) OVER(win) arr
FROM `project.dataset.table`
WINDOW win AS (ORDER BY week ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
-- ORDER BY week

Related

Burndown analysis in SQL Server Management Studio

I'm trying to prepare my data to create a burndown visual. As you can see the Rate column isn't simply A - B, as it carries forward the previous value if B is null.
I've tried some case statements using lag and sums but no avail.
Some direction on the case statement or an optimal solution would be ideal.
For example, this is how my data looks:
ID
A
B
1
20
NULL
2
20
3
3
20
NULL
4
20
7
5
20
NULL
6
20
NULL
7
20
NULL
8
20
5
9
20
7
And I want a rate column that looks like this.
ID
A
B
Rate
1
20
NULL
20
2
20
3
17
3
20
NULL
17
4
20
7
10
5
20
NULL
10
6
20
NULL
10
7
20
NULL
10
8
20
5
5
9
20
7
-2
Thanks to #Larnu for the guidance.
Here is the solution when you have your data partitioned by some group ID and ordered by some data or row ID.
SELECT
GROUP_ID,
ROW_ID,
COL_A,
COL_B,
COL_A - (SUM(ISNULL(COL_B,0)) OVER (PARTITION BY GROUP_ID ORDER BY ROW_ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW))
FROM table

Create Date in Google Big Query using Newly Created Column

I have a view that converts fiscal year periods to calendar periods, creating a new column called "NewPeriod". I would then like to create a date using this "NewPeriod" column using the Date() function, Date(Year, NewPeriod, "1"). I am unable to use the NewPeriod in the Date function, is there a way I can accomplish this in the same view?
SELECT distinct
company_code,
Period,
Year,
CASE COMPANY_CODE
WHEN 1 THEN CASE Period
WHEN 4 THEN 1
WHEN 5 THEN 2
WHEN 6 THEN 3
WHEN 7 THEN 4
WHEN 8 THEN 5
WHEN 9 THEN 6
WHEN 10 THEN 7
WHEN 11 THEN 8
WHEN 12 THEN 9
WHEN 1 THEN 10
WHEN 2 THEN 11
WHEN 3 THEN 12
ELSE
Period
END
Else Period
END AS NewPeriod,
FROM
`table`

Convert column into the rows

This is my current result set of my query:
Question Sol25A Sol25B Sol25C Sol40A Sol40B
======================================================
A 1 4 2 6 0
B 2 3 2 1 9
C 6 7 1 0 8
======================================================
Total = 9 14 5 7 17
======================================================
And I want the result in this form:
Product Total
===============
Sol25A 9
Sol25B 14
Sol25C 5
Sol40A 7
Sol40B 17
Can you please provide me the query for me, this will be the great help for me.
I would suggest that you unpivot using cross apply and then aggregate:
select product, sum(val)
from t cross apply
(values ('Sol25A', Sol25A), ('Sol25B', Sol25B), ('Sol25C', Sol25C),
('Sol40A', Sol40A), ('Sol40B', Sol40B)
) v(product, val)
group by product;

SQL - Select rows after reaching minimum value/threshold

Using Sql Server Mgmt Studio. My data set is as below.
ID Days Value Threshold
A 1 10 30
A 2 20 30
A 3 34 30
A 4 25 30
A 5 20 30
B 1 5 15
B 2 10 15
B 3 12 15
B 4 17 15
B 5 20 15
I want to run a query so only rows after the threshold has been reached are selected for each ID. Also, I want to create a new days column starting at 1 from where the rows are selected. The expected output for the above dataset will look like
ID Days Value Threshold NewDayColumn
A 3 34 30 1
A 4 25 30 2
A 5 20 30 3
B 4 17 15 1
B 5 20 15 2
It doesn't matter if the data goes below the threshold for the latter rows, I want to take the first row when threshold is crossed as 1 and continue counting rows for the ID.
Thank you!
You can use window functions for this. Here is one method:
select t.*, row_number() over (partition by id order by days) as newDayColumn
from (select t.*,
min(case when value > threshold then days end) over (partition by id) as threshold_days
from t
) t
where days >= threshold_days;

How to get the average of every three records in a column starting from first record in MS Access/SQL?

I am working on something where i am stuck in getting the average of say every three/four/five records starting from first record in a column. If i have a table with data say
ID_Col1 | Value_Col2
1 | 1.5
2 | 2
3 | 2.5
4 | 3
5 | 3.5
6 | 4
7 | 4.5
8 | 5
9 | 5.5
10 | 6
If we say average of every three records then the Output required is
every_three_records_average_Column
none
none
average(1.5, 2, 2.5)
average(2, 2.5, 3)
average(2.5, 3, 3.5)
average(3, 3.5, 4)
average(3.5, 4, 4.5)
average(4, 4.5, 5)
average(4.5, 5, 5.5)
average(5, 5.5, 6)
Does anyone have any idea to get this kind of output in SQL query.
Any help would be much appreciated.
Thanks,
Honey
SQL Fiddle Demo
SELECT
T1.[ID_Col1], T2.[ID_Col1], T3.[ID_Col1],
T1.[Value_Col2] , T2.[Value_Col2] , T3.[Value_Col2],
(T1.[Value_Col2] + T2.[Value_Col2] + T3.[Value_Col2])/3
FROM Source T1
JOIN Source T2
ON T1.[ID_Col1] = T2.[ID_Col1] - 1
JOIN Source T3
ON T2.[ID_Col1] = T3.[ID_Col1] - 1
OUTPUT
Consider a correlated aggregate subquery filtering on last three IDs:
SELECT myTable.ID_Col1, myTable.Value_Col2,
(SELECT Avg(sub.Value_Col2)
FROM myTable As sub
WHERE sub.ID_Col1 >= myTable.ID_Col1 - 2
AND sub.ID_Col1 <= myTable.ID_Col1
AND myTable.ID_Col1 >= 3) As LastThreeAvg
FROM myTable;
Output
ID_Col1 Value_Col2 LastThreeAvg
1 1.5
2 2
3 2.5 2
4 3 2.5
5 3.5 3
6 4 3.5
7 4.5 4
8 5 4.5
9 5.5 5
10 6 5.5
However, if ID_Col1 is an AutoNumber field, there is no guarantee values will remain in numeric ordinal count. Therefore, a calculated row number, RowNo, is needed in both the derived table and aggregate subquery. In MS Access SQL without CTEs, the query becomes a bit verbose:
SELECT dT.ID_Col1, dT.Value_Col2,
(SELECT Avg(sub.Value_Col2)
FROM
(SELECT ID_Col1, Value_Col2,
(SELECT Count(*)
FROM myTable As sub
WHERE sub.ID_Col1 <= myTable.ID_Col1) As RowNo
FROM myTable) As sub
WHERE sub.RowNo >= dT.RowNo - 2
AND sub.RowNo <= dT.RowNo
AND sub.RowNo >= 3) As LastThreeAvg
FROM
(SELECT ID_Col1, Value_Col2,
(SELECT Count(*)
FROM myTable As sub
WHERE sub.ID_Col1 <= myTable.ID_Col1) As RowNo
FROM myTable) As dT
SELECT
(
SELECT Avg(A.Value_Col2) As Result
FROM myTable As A
WHERE A.ID_Col1 >= C.ID_Col1 and A.ID_Col1 < C.ID_Col1 + [MyParam]
)
FROM myTable As C
WHERE C.ID_Col1 + [MyParam] -1 <= (SELECT MAX (D.ID_Col1) From myTable As D)
Explanation:
External query: For each record in mytable C, until MyParam (3, 4, or 5 in the question), records befor the last record.
Represented in the query in the where clause: FROM myTable As C WHERE C.ID_Col1 + [MyParam] -1 <= (SELECT MAX (D.ID_Col1) From myTable As D)
Inner query: Calculate the average Value_Col2 of MyParam records, starting the current record.
Representd in the Select statement: SELECT Avg(A.Value_Col2) and in the Where clause: WHERE A.ID_Col1 >= C.ID_Col1, as C.ID_Col1 being the current ID, and and no more than [MyParam] records: A.ID_Col1 < C.ID_Col1 + [MyParam].
Test
MyTable:
ID_Col1 Value_Col2
1 1.5
2 2
3 2.5
4 3
5 3.5
6 4
7 4.5
8 5
9 5.5
10 6
11 6.5
12 7
13 7.5
14 8
15 8.5
16 9
17 9.5
Result for MyParam = 3
Result
2
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
8.5
9
Result for MyParam = 5
Result
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
8.5