Summing numbers across multiple columns in BigQuery - sql

I have a query which returns many columns which are either 1 or 0 depending on a users interaction with many points of a website, my data looks like this:
UserID Variable_1 Variable_2 Variable_3 Variable_4 Variable_5
User 1 1 0 1 0 0
User 2 0 0 1 0 0
User 3 0 0 0 0 1
User 4 0 1 1 1 1
User 5 1 0 0 0 1
Each variable is defined with it's own line of code like:
MAX(IF(LOWER(hits_product.productbrand) LIKE "Variable_1",1,0)) AS Variable_1,
I'd like to have one column that sums up all the rows per user. which looks like this:
UserID Total Variable_1 Variable_2 Variable_3 Variable_4 Variable_5
User 1 2 1 0 1 0 0
User 2 3 1 1 1 0 0
User 3 0 0 0 0 0 0
User 4 5 1 1 1 1 1
User 5 3 1 0 1 0 1
What is the most elegant way to achieve this?

Even though it happen that for OP's particular case simple COUNT(DISTINCT) will suffice - I still wanted to answer original question of how to sum up all numerical columns into one Total without having dependency on number and names of those columns
Below is for BigQuery Standard SQL
#standardSQL
SELECT
UserID,
( SELECT SUM(CAST(value AS INT64))
FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d+),?')) value
) Total,
* EXCEPT(UserID)
FROM t
This can be tested / played with using dummy data from question
#standardSQL
WITH t AS (
SELECT 'User 1' UserID, 1 Variable_1, 0 Variable_2, 1 Variable_3, 0 Variable_4, 0 Variable_5 UNION ALL
SELECT 'User 2', 1, 1, 1, 0, 0 UNION ALL
SELECT 'User 3', 0, 0, 0, 0, 0 UNION ALL
SELECT 'User 4', 1, 1, 1, 1, 1 UNION ALL
SELECT 'User 5', 1, 0, 1, 0, 1
)
SELECT
UserID,
( SELECT SUM(CAST(value AS INT64))
FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d+),?')) value
) Total,
* EXCEPT(UserID)
FROM t
ORDER BY UserID
result is
Row UserID Total Variable_1 Variable_2 Variable_3 Variable_4 Variable_5
1 User 1 2 1 0 1 0 0
2 User 2 3 1 1 1 0 0
3 User 3 0 0 0 0 0 0
4 User 4 5 1 1 1 1 1
5 User 5 3 1 0 1 0 1

A simple method uses a subquery or CTE:
select t.*, (v1 + v2 + v3 . . . ) as total
from (<your query here>
) t;
Not knowing what the data looks like, it is quite possible that count(distinct hits_product.productbrand) would also do the trick.

How about defining multiple variable columns into one repeated 'variables' column, of KeyValue messages, where a key would be your variable name and value a number, it can greatly simplify your calculation.

Related

count zeros between 1s in same column

I've data like this.
ID IND
1 0
2 0
3 1
4 0
5 1
6 0
7 0
I want to count the zeros before the value 1. So that, the output will be like below.
ID IND OUT
1 0 0
2 0 0
3 1 2
4 0 0
5 1 1
6 0 0
7 0 2
Is it possible without pl/sql? I tried to find the differences between row numbers but couldn't achieve it.
The match_recognize clause, introduced in Oracle 12.1, can do quick work of such "row pattern recognition" problems. The solution is just a bit complex due to the special treatment of a "last row" with ID = 0, but it is straightforward otherwise.
As usual, the with clause is not part of the solution; I include it to test the query. Remove it and use your actual table and column names.
with
inputs (id, ind) as (
select 1, 0 from dual union all
select 2, 0 from dual union all
select 3, 1 from dual union all
select 4, 0 from dual union all
select 5, 1 from dual union all
select 6, 0 from dual union all
select 7, 0 from dual
)
select id, ind, out
from inputs
match_recognize(
order by id
measures case classifier() when 'Z' then 0
when 'O' then count(*) - 1
else count(*) end as out
all rows per match
pattern ( Z* ( O | X ) )
define Z as ind = 0, O as ind != 0
);
ID IND OUT
---------- ---------- ----------
1 0 0
2 0 0
3 1 2
4 0 0
5 1 1
6 0 0
7 0 2
You can treat this as a gaps-and-islands problem. You can define the "islands" by the number of "1"s one or after each row. Then use a window function:
select t.*,
(case when ind = 1 or row_number() over (order by id desc) = 1
then sum(1 - ind) over (partition by grp)
else 0
end) as num_zeros
from (select t.*,
sum(ind) over (order by id desc) as grp
from t
) t;
If id is sequential with no gaps, you can do this without a subquery:
select t.*,
(case when ind = 1 or row_number() over (order by id desc) = 1
then id - coalesce(lag(case when ind = 1 then id end ignore nulls) over (order by id), min(id) over () - 1)
else 0
end)
from t;
I would suggest removing the case conditions and just using the then clause for the expression, so the value is on all rows.

Select first date in which an event happen for each id

I have a series of Ids, some of them activate a product on certain month and that product remains activated for an X period of time, while others do not activate the product.
I want to create a column which indicates in which month the user activates the product or a NULL if the user doesn't activate it.
I've tried using a partition like the following:
SELECT id, fl_testdrive, month_dt,
CASE WHEN fl_testdrive = 1 then min(month_dt) OVER(PARTITION BY id ORDER BY month_dt ROWS UNBOUNDED PRECEDING) else 0 end as month_testdrive
FROM Table_1
However, when I try this solution, in the column month_testdrive, I do not obtain the first month in which the user appears, indepently of if he/she activated that product in that month or on a later one.
This is what I get with my query
Id flag_testdrive month_dt month_testdrive
1 0 1 1
1 0 2 1
1 1 3 1
1 1 4 1
2 0 2 0
2 0 3 0
3 1 4 4
3 1 5 4
What I'd expect:
Id flag_testdrive month_dt month_testdrive
1 0 1 3
1 0 2 3
1 1 3 3
1 1 4 3
2 0 2 0
2 0 3 0
3 1 4 4
3 1 5 4
This solution is a second best but is also fine:
Id flag_testdrive month_dt month_testdrive
1 0 1 0
1 0 2 0
1 1 3 3
1 1 4 3
2 0 2 0
2 0 3 0
3 1 4 4
3 1 5 4
You want CASE expression inside MIN():
MIN(CASE WHEN fl_testdrive = 1 THEN month_dt ELSE 0 END) OVER(PARTITION BY id, flag_testdrive ORDER BY month_dt ROWS UNBOUNDED PRECEDING)
Here's an option for you:
DECLARE #Testdate TABLE(
id INT
,flag_testdrive INT
,month_dt INT
)
INSERT INTO #Testdate (
[id]
, [flag_testdrive]
, [month_dt]
)
VALUES(1,0,1)
,(1,0,2)
,(1,1,3)
,(1,1,4)
,(2,0,2)
,(2,0,3)
,(3,1,4)
,(3,1,5)
SELECT
*
,COALESCE((SELECT MIN([aa].[month_dt]) FROM #Testdate aa
WHERE aa.[id] = a.id
AND aa.[flag_testdrive] = 1), 0) AS month_testdrive
FROM #Testdate a
Return the minimum month_dt for a given id only if flag_testdrive=1, wrapped in coalesce to return 0 instead of NULL.

Adjusting table based on previous values in BigQuery

I have a table that looks like below:
ID|Date |X| Flag |
1 |1/1/16|2| 0
2 |1/1/16|0| 0
3 |1/1/16|0| 0
1 |2/1/16|0| 0
2 |2/1/16|1| 0
3 |2/1/16|2| 0
1 |3/1/16|2| 0
2 |3/1/16|1| 0
3 |3/1/16|2| 0
I'm trying to make it so that flag is populated if X=2 in the PREVIOUS month. As such, it should look like this:
ID|Date |X| Flag |
1 |1/1/16|2| 0
2 |1/1/16|0| 0
3 |1/1/16|0| 0
1 |2/1/16|2| 1
2 |2/1/16|1| 0
3 |2/1/16|2| 0
1 |3/1/16|2| 1
2 |3/1/16|1| 0
3 |3/1/16|2| 1
I use this in SQL:
`select ID, date, X, flag into Work_Table from t
(
Select ID, date, X, flag,
Lag(X) Over (Partition By ID Order By date Asc) As Prev into Flag_table
From Work_Table
)
Update [dbo].[Flag_table]
Set flag = 1
where prev = '2'
UPDATE t
Set t.flag = [dbo].[Flag_table].flag FROM T
JOIN [dbo].[Flag_table]
ON t.ID= [dbo].[Flag_table].ID where T.date = [dbo].[Flag_table].date`
However I cannot do this in Bigquery. Any ideas?
Below is for BigQuery Standard SQL
#standardSQL
SELECT id, dt, x,
IF(LAG(x = 2) OVER(PARTITION BY id ORDER BY dt), 1, 0) flag
FROM `project.dataset.work_table`
You can test / play with it using dummy data from your question as
#standardSQL
WITH `project.dataset.work_table` AS (
SELECT 1 id, '1/1/16' dt, 2 x, 0 flag UNION ALL
SELECT 2, '1/1/16', 0, 0 UNION ALL
SELECT 3, '1/1/16', 0, 0 UNION ALL
SELECT 1, '2/1/16', 0, 0 UNION ALL
SELECT 2, '2/1/16', 1, 0 UNION ALL
SELECT 3, '2/1/16', 2, 0 UNION ALL
SELECT 1, '3/1/16', 2, 0 UNION ALL
SELECT 2, '3/1/16', 1, 0 UNION ALL
SELECT 3, '3/1/16', 2, 0
)
SELECT id, dt, x,
IF(LAG(x = 2) OVER(PARTITION BY id ORDER BY dt), 1, 0) flag
FROM `project.dataset.work_table`
ORDER BY dt, id
with result as
Row id dt x flag
1 1 1/1/16 2 0
2 2 1/1/16 0 0
3 3 1/1/16 0 0
4 1 2/1/16 0 1
5 2 2/1/16 1 0
6 3 2/1/16 2 0
7 1 3/1/16 2 0
8 2 3/1/16 1 0
9 3 3/1/16 2 1

SQL Query to filter record for particular record count

I have a table which have Identity, RecordId, Type, Reading And IsDeleted columns. Identity is primary key that is auto increment, RecordId is integer that can have duplicate values, Type is a type of reading that can be either 'one' or 'average', Reading is integer that contains any integer value, and IsDeleted is bit that can be 0 or 1 i.e. false or true.
Now, I want the query that contains all the records of table in such a manner that if COUNT(Id) for each RecordId is greater than 2 then display all the records of that RecordId.
If COUNT(Id) == 2 for that specific RecordId and Reading value of both i.e. 'one' or 'average' type of the records are same then display only average record.
If COUNT(Id) ==1 then display only that record.
For example :
Id RecordId Type Reading IsDeleted
1 1 one 4 0
2 1 one 5 0
3 1 one 6 0
4 1 average 5 0
5 2 one 1 0
6 2 one 3 0
7 2 average 2 0
8 3 one 2 0
9 3 average 2 0
10 4 one 5 0
11 4 average 6 0
12 5 one 7 0
Ans result can be
Id RecordId Type Reading IsDeleted
1 1 one 4 0
2 1 one 5 0
3 1 one 6 0
4 1 average 5 0
5 2 one 1 0
6 2 one 3 0
7 2 average 2 0
9 3 average 2 0
10 4 one 5 0
11 4 average 6 0
12 5 one 7 0
In short I want to skip the 'one' type reading which have an average reading with same value and its count for 'one' type reading not more than one.
Check out this program
DECLARE #t TABLE(ID INT IDENTITY,RecordId INT,[Type] VARCHAR(10),Reading INT,IsDeleted BIT)
INSERT INTO #t VALUES
(1,'one',4,0),(1,'one',5,0),(1,'one',6,0),(1,'average',5,0),(2,'one',1,0),(2,'one',3,0),
(2,'average',2,0),(3,'one',2,0),(3,'average',2,0),(4,'one',5,0),(4,'average',6,0),(5,'one',7,0),
(6,'average',6,0),(6,'average',6,0),(7,'one',6,0),(7,'one',6,0)
--SELECT * FROM #t
;WITH GetAllRecordsCount AS
(
SELECT *,Cnt = COUNT(RecordId) OVER(PARTITION BY RecordId ORDER BY RecordId)
FROM #t
)
-- Condition 1 : When COUNT(RecordId) for each RecordId is greater than 2
-- then display all the records of that RecordId.
, GetRecordsWithCountMoreThan2 AS
(
SELECT * FROM GetAllRecordsCount WHERE Cnt > 2
)
-- Get all records where count = 2
, GetRecordsWithCountEquals2 AS
(
SELECT * FROM GetAllRecordsCount WHERE Cnt = 2
)
-- Condition 3 : When COUNT(RecordId) == 1 then display only that record.
, GetRecordsWithCountEquals1 AS
(
SELECT * FROM GetAllRecordsCount WHERE Cnt = 1
)
-- Condition 1: When COUNT(RecordId) > 2
SELECT * FROM GetRecordsWithCountMoreThan2 UNION ALL
-- Condition 2 : When COUNT(RecordId) == 2 for that specific RecordId and Reading value of
-- both i.e. 'one' or 'average' type of the records are same then display only
-- average record.
SELECT t1.* FROM GetRecordsWithCountEquals2 t1
JOIN (Select RecordId From GetRecordsWithCountEquals2 Where [Type] = ('one') )X
ON t1.RecordId = X.RecordId
AND t1.Type = 'average' UNION ALL
-- Condition 2: When COUNT(RecordId) = 1
SELECT * FROM GetRecordsWithCountEquals1
Result
ID RecordId Type Reading IsDeleted Cnt
1 1 one 4 0 4
2 1 one 5 0 4
3 1 one 6 0 4
4 1 average5 0 4
5 2 one 1 0 3
6 2 one 3 0 3
7 2 average2 0 3
9 3 average2 0 2
11 4 average6 0 2
12 5 one 7 0 1
;with a as
(
select Id,RecordId,Type,Reading,IsDeleted, count(*) over (partition by RecordId, Reading) cnt,
row_number() over (partition by RecordId, Reading order by Type, RecordId) rn
from table
)
select Id,RecordId,Type,Reading,IsDeleted
from a where cnt <> 2 or rn = 1
Assuming your table is named the_table, let's do this:
select main.*
from the_table as main
inner join (
select recordId, count(Id) as num, count(distinct Reading) as reading_num
from the_table
group by recordId
) as counter on counter.recordId=main.recordId
where num=1 or num>2 or reading_num=2 or main.type='average';
Untested, but it should be some variant of that.
EDIT TEST HERE ON FIDDLE
The short summary is that we want to join the table with an aggregated version of o=itself, then filter it based in the count criteria you mentioned (num=1, then show it; num=2, show just average record if reading numbers are the same otherwise show both; num>2, show all records).

MySQL GROUP BY quantity

MySQL table is like this (VoteID is PK):
VoteID VoteValue CommentID
1 -1 1
2 -1 1
3 1 1
4 -1 1
5 1 2
6 1 2
7 -1 2
I need a result:
CommentID Plus Minus
1 1 3
2 2 1
Sum of "Pluses", Sum of "Minuses" groupped by CommentID
Is it possible to get desired in one SQL expression?
SELECT
CommentID,
SUM(CASE WHEN VoteValue > 0 THEN 1 ELSE 0 END) AS PLUS,
SUM(CASE WHEN VoteValue < 0 THEN 1 ELSE 0 END) AS MINUS
FROM
mytable
GROUP BY
CommentID
You need a query in the lines of:
SELECT CommentID,
SUM(IF(VoteValue > 0, 1, 0)) AS Plus,
SUM(IF(VoteValue < 0, 1, 0)) AS Minus
FROM yourTable
GROUP BY CommentID
ORDER BY CommentID