Running Total of all Previous Rows BigQuery

Running Total of all Previous Rows BigQuery - sql

I have a BigQuery Table which looks like Below:
ID SessionNumber CountOfAction Category
1 1 1 B
1 2 3 A
1 3 1 A
1 4 4 B
1 5 5 B
I am trying to get the running total of all previous rows for CountofAction where category = A. The final Output should be
ID SessionNumber CountOfAction
1 1 0 --no previous rows have countofAction for category = A
1 2 0 --no previous rows have countofAction for category = A
1 3 3 --previous row (Row 2) has countofAction = 3 for category = A
1 4 4 --previous rows (Row 2 and 3) have countofAction = 3 and 1 for category = A
1 5 4 --previous rows (Row 2 and 3) have countofAction = 3 and 1 for category = A
Below is the query I have written but it doesn't give me desired output
select
ID,
SessionNumber ,
SUM(CountofAction) OVER(Partition by clieIDntid ORDER BY SessionNumber ROWS BETWEEN UNBOUNDED
PRECEDING AND 1 PRECEDING)as CumulativeCountofAction
From TAble1 where category = 'A'
I would really appreciate any help on this! Thanks in advance

Filtering on category in the where clause evicts (id, sessionNumber) tuples where category 'A' does not appear, which is not what you want.
Instead, you can use aggregation and a conditional sum():
select
id,
sessionNumber,
sum(sum(if(category = 'A', countOfAction, 0))) over(
partition by id
order by sessionNumber
rows between unbounded preceding and 1 preceding
) CumulativeCountofAction
from mytable t
group by id, sessionNumber
order by id, sessionNumber

Below is for BigQuery Standard SQL
#standardSQL
SELECT ID, SessionNumber,
IFNULL(SUM(IF(category = 'A', CountOfAction, 0)) OVER(win), 0) AS CountOfAction
FROM `project.dataset.table`
WINDOW win AS (ORDER BY SessionNumber ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
If to apply to sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 ID, 1 SessionNumber, 1 CountOfAction, 'B' Category UNION ALL
SELECT 1, 2, 3, 'A' UNION ALL
SELECT 1, 3, 1, 'A' UNION ALL
SELECT 1, 4, 4, 'B' UNION ALL
SELECT 1, 5, 5, 'B'
)
SELECT ID, SessionNumber,
IFNULL(SUM(IF(category = 'A', CountOfAction, 0)) OVER(win), 0) AS CountOfAction
FROM `project.dataset.table`
WINDOW win AS (ORDER BY SessionNumber ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
result is
Row ID SessionNumber CountOfAction
1 1 1 0
2 1 2 0
3 1 3 3
4 1 4 4
5 1 5 4

Related

Flag rows that appear between rows with specific strings

Let's say I have a table like this:
user_id
order
action
1
1
start
1
2
other
1
3
other
1
4
end
1
5
other
2
1
start
2
2
other
2
3
end
2
4
other
2
5
start
2
6
other
2
7
end
And I want to create a new column that flags the rows that appear between "start" and "end" events for each user (ordering by "order"):
user_id
order
action
is_between_start_and_end
1
1
start
NULL
1
2
other
1
1
3
other
1
1
4
end
NULL
1
5
other
NULL
2
1
start
NULL
2
2
other
1
2
3
end
NULL
2
4
other
NULL
2
5
start
NULL
2
6
other
1
2
7
end
NULL
How can I achieve this?

Consider below approach
select * except(grp),
if(
countif(action = 'end') over (partition by user_id, grp order by `order`) = 0
and action != 'start', 1, null
) as is_between_start_and_end
from (
select *,
countif(action = 'start') over (partition by user_id order by `order`) as grp
from your_table
)
if applied to sample data in y our question - output is

This can be solved with windows functions.
with tbl as (
Select 1 as user_id, 1 as order_it,"start" as action
Union all select 1 , 2 ,"other"
Union all select 1 , 3 ,"other"
Union all select 1 , 4 ,"end"
Union all select 1 , 5 ,"other"
Union all select 2 , 1 ,"start"
Union all select 2 , 2 ,"other"
Union all select 2 , 3 ,"end"
Union all select 2 , 4 ,"other"
Union all select 2 , 5 ,"start"
Union all select 2 , 6 ,"other"
Union all select 2 , 7 ,"end"
),
helper as (
Select *,
countif(action="end") over win_before as ends,
countif(action="start") over win_before as starts,
first_value(if(action="end" or action="start",action,null) ignore nulls) over (partition by user_id order by order_it rows between current row and unbounded following) as end_to_come
from tbl
window win_before as (partition by user_id order by order_it rows between unbounded preceding and current row)
order by user_id,order_it
)
select *,
if(end_to_come="end" and starts-ends=1,1,null) as is_between_start_and_end
from helper
order by user_id,order_it

This should work but could surely be more optimized
with input as (
select 1 user_id, 1 as order_, 'start' action union all
select 1, 2, 'other' union all
select 1, 3, 'other' union all
select 1 , 4 , 'end' union all
select 1 , 5 , 'other' union all
select 2 , 1 , 'start' union all
select 2 , 2 , 'other' union all
select 2 , 3 , 'end' union all
select 2 , 4 , 'other' union all
select 2 , 5 , 'start' union all
select 2 , 6 , 'other' union all
select 2 , 7 , 'end'
)
select
*,
if (
order_ > max(if(action = 'start', order_, null))
over(partition by user_id order by order_ range between unbounded preceding and current row) and
order_ < min(if(action = 'end', order_, null))
over(partition by user_id order by order_ range between current row and unbounded following) and
coalesce(order_ not between
max(if(action = 'end', order_, null))
over(partition by user_id order by order_ range between unbounded preceding and 1 preceding)
and min(if(action = 'start', order_, null))
over(partition by user_id order by order_ range between 1 following and unbounded following), true)
, 1, null) as flag
from input
order by 1,2
Edit: It should also take into account weird cases where for instance a 3rd user has end > other > start > other > end > other in that order. The flag should only apply to the 4th item. If you have start > other > start > other > end however, it's unclear if items 2,3,4 or 4 or 2,4 should be flagged. I think it would only flag 4 here
Edit2: This version should flag 2,3,4
if (
order_ > max(if(action = 'start', order_, null))
over(partition by user_id order by order_ range between unbounded preceding and 1 preceding) and
order_ < min(if(action = 'end', order_, null))
over(partition by user_id order by order_ range between current row and unbounded following) and
coalesce(max(if(action = 'start', order_, null))
over(partition by user_id order by order_ range between unbounded preceding and 1 preceding) >
max(if(action = 'end', order_, null))
over(partition by user_id order by order_ range between unbounded preceding and current row),true)
, 1, null) as flag

Selecting top most row in Bigquery based on conditions

I have a huge table, where sometimes 1 product ID has multiple specifications. I want to select the newest but unfortunately, I don't have the date information. please consider this example dataset
Row ID Type Sn Sn_Ind
1 3 SLN SL20 20
2 1 SL SL 0
3 2 SL SL 0
4 1 M SL21 10
5 3 M SL21 10
6 1 SLN SL20 20
I used the below query to somehow group the products in give them row numbers like
with cleanedMasterData as(
SELECT *
FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Sn DESC, Sn_Ind DESC) AS rn
FROM `project.dataset.table`
)
-- where rn = 1
)
select * from cleanedMasterData
Please find below the example table after cleaning
Row ID Type Sn Sn_Ind rn
1 1 SL SL 0 1
2 1 M SL21 10 2
3 1 SLN SL20 20 3
4 2 SL SL 0 1
5 3 M SL21 10 1
6 3 SLN SL20 20 2
but if you see for ID 2 and 3, I can easily select the top row with where rn = 1
but for ID 1, my preferred row would be 2 because that is the newest.
My question here is how do I prioritise a value in column so that I can get the desired solution like :
Row ID Type Sn Sn_Ind rn
1 1 M SL21 10 1
2 2 SL SL 0 1
3 3 M SL21 10 1
As the values are fixed in Sn column - for ex SL, SL20, SL19, SL21 etc - If somehow I can give weightage to these values and create a new temp column with weightage and sort based on it, then?
Thank you for your support in advance!!

Consider below
SELECT *
FROM `project.dataset.table`
WHERE TRUE
QUALIFY ROW_NUMBER() OVER(PARTITION BY ID ORDER BY IF(Sn = 'SL', 0, 1) DESC, Sn DESC) = 1
If applied to sample data in your question - output is

It wasn't difficult, I tried a few things and it worked out. If anyone can optimize the below solution even more that would be awesome.
first the dataset
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 ID, 'SLN' Type, 'SL20' Sn, 20 Sn_Ind UNION ALL
SELECT 1 , 'SL' , 'SL' , 0 UNION ALL
SELECT 2 , 'SL' , 'SL' , 0 UNION ALL
SELECT 1 , 'M' , 'SL21' , 10 UNION ALL
SELECT 3 , 'M' , 'SL21' , 10 UNION ALL
SELECT 1 , 'SLN' , 'SL20' , 20
)
with weightage as(
SELECT
*,
MAX(CASE Sn WHEN 'SL' THEN 0 ELSE 1 END) OVER (PARTITION BY Sn) AS weightt,
FROM
`project.dataset.table`
ORDER BY
weightt DESC, Sn DESC
), main as (
select * EXCEPT(rn, weightt)
from (
select * ,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY weightt DESC, Sn DESC) AS rn
from weightage )
where rn = 1
)
select * from main
after this, I can get the desired result
Row ID Type Sn Sn_Ind
1 1 M SL21 10
2 2 SL SL 0
3 3 M SL21 10

count zeros between 1s in same column

I've data like this.
ID IND
1 0
2 0
3 1
4 0
5 1
6 0
7 0
I want to count the zeros before the value 1. So that, the output will be like below.
ID IND OUT
1 0 0
2 0 0
3 1 2
4 0 0
5 1 1
6 0 0
7 0 2
Is it possible without pl/sql? I tried to find the differences between row numbers but couldn't achieve it.

The match_recognize clause, introduced in Oracle 12.1, can do quick work of such "row pattern recognition" problems. The solution is just a bit complex due to the special treatment of a "last row" with ID = 0, but it is straightforward otherwise.
As usual, the with clause is not part of the solution; I include it to test the query. Remove it and use your actual table and column names.
with
inputs (id, ind) as (
select 1, 0 from dual union all
select 2, 0 from dual union all
select 3, 1 from dual union all
select 4, 0 from dual union all
select 5, 1 from dual union all
select 6, 0 from dual union all
select 7, 0 from dual
)
select id, ind, out
from inputs
match_recognize(
order by id
measures case classifier() when 'Z' then 0
when 'O' then count(*) - 1
else count(*) end as out
all rows per match
pattern ( Z* ( O | X ) )
define Z as ind = 0, O as ind != 0
);
ID IND OUT
---------- ---------- ----------
1 0 0
2 0 0
3 1 2
4 0 0
5 1 1
6 0 0
7 0 2

You can treat this as a gaps-and-islands problem. You can define the "islands" by the number of "1"s one or after each row. Then use a window function:
select t.*,
(case when ind = 1 or row_number() over (order by id desc) = 1
then sum(1 - ind) over (partition by grp)
else 0
end) as num_zeros
from (select t.*,
sum(ind) over (order by id desc) as grp
from t
) t;
If id is sequential with no gaps, you can do this without a subquery:
select t.*,
(case when ind = 1 or row_number() over (order by id desc) = 1
then id - coalesce(lag(case when ind = 1 then id end ignore nulls) over (order by id), min(id) over () - 1)
else 0
end)
from t;
I would suggest removing the case conditions and just using the then clause for the expression, so the value is on all rows.

What is the most efficient SQL query to find the max N values for every entities in a table

I wrote these 2 queries, the first one is keeping duplicates and the second one is dropping them
Does anyone know a more efficient way to achieve this?
Queries are for MSSQL, returning the top 3 values
1-
SELECT TMP.entity_id, TMP.value
FROM(
SELECT TAB.entity_id, LEAD(TAB.entity_id, 3, 0) OVER(ORDER BY TAB.entity_id, TAB.value) AS next_id, TAB.value
FROM mytable TAB
) TMP
WHERE TMP.entity_id <> TMP.next_id
2-
SELECT TMP.entity_id, TMP.value
FROM(
SELECT TMX.entity_id, LEAD(TMX.entity_id, 3, 0) OVER(ORDER BY TMX.entity_id, TMX.value) AS next_id, TMX.value
FROM(
SELECT TAB.entity_id, LEAD(TAB.entity_id, 1, 0) OVER(ORDER BY TAB.entity_id, TAB.value) AS next_id, TAB.value, LEAD(TAB.value, 1, 0) OVER(ORDER BY TAB.entity_id, TAB.value) AS next_value
FROM mytable TAB
) TMX
WHERE TMP.entity_id <> TMP.next_id OR TMX.value <> TMX.next_value
) TMP
WHERE TMP.entity_id <> TMP.next_id
Example:
Table:
entity_id value
--------- -----
1 9
1 11
1 12
1 3
2 25
2 25
2 5
2 37
3 24
3 9
3 2
3 15
Result Query 1 (25 appears twice for entity_id 2):
entity_id value
--------- -----
1 9
1 11
1 12
2 25
2 25
2 37
3 9
3 15
3 24
Result Query 2 (25 appears only once for entity_id 2):
entity_id value
--------- -----
1 9
1 11
1 12
2 5
2 25
2 37
3 9
3 15
3 24

You can use the ROW_NUMBER which will allow duplicates as follows:
select entity_id, value from
(select t.*, row_number() over (partition by entity_id order by value desc) as rn
from your_Table) where rn <= 3
You can use the rank to remove the duplicate as follows:
select distinct entity_id, value from
(select t.*, rank() over (partition by entity_id order by value desc) as rn
from your_Table) where rn <= 3

How to GROUP BY in SQL and then mark as 0,1

I need to GROUP BY item_id and check if user_id in any of those matches a variable. If so, I want it to = 1, if not 0.
for example, imagine table like this:
item_id, user_id
1 1
1 3
2 4
2 1
2 7
2 3
3 4
3 6
4 8
4 1
5 3
IF (user_id = 3,1,0) AS match,
Want my Query to come back as
item_id, match
1 1
2 1
3 0
4 0
5 1
Where "1" all occurrences of user_id 3 in an item_id group.

You need the right aggregation function:
select item_id,
max(case when user_id = 3 then 1 else 0 end) as hasmatch
from t
group by item_id
order by item_id

In MySQL, true is 1 and false is 0, so you can just do:
SELECT item_id, MAX(user_id = 3) AS has_match
FROM table
GROUP BY 1
You can even count the number of matches:
SELECT item_id, SUM(user_id = 3) AS matches
FROM table
GROUP BY 1
GROUP BY 1 is short for GROUP BY item_id, as item_id is the first select expression.

I would do it as follows:
SELECT
A.item_id, ISNULL(B.count, 0)
FROM
(SELECT DISTINCT item_id 'item_id' FROM myTable) AS A
LEFT JOIN
(
SELECT item_id, count(*) 'count'
FROM myTable WHERE user_id IN (3, 1, 0)
GROUP BY item_id
) AS B
ON A.item_id = B.item_id

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Running Total of all Previous Rows BigQuery - sql

Related

Flag rows that appear between rows with specific strings

Selecting top most row in Bigquery based on conditions

count zeros between 1s in same column

What is the most efficient SQL query to find the max N values for every entities in a table

How to GROUP BY in SQL and then mark as 0,1

Categories

Resources