Populate column based on row values BigQuery Standard SQL - google-bigquery

I have a Table lets say :-
Name A B C D
------- --- --- --- ---
alpha 0 1 0 0.6
beta 0.6 0 0 0.1
gama 0 0 0 0.6
Now I want to populate values on Two columns(Result & Class) based on A, B, C, D values.
The condition is if value in any of the field(A,B,C,D) is >.5 then Result column should have "F" else it should have "P". Also the column whose valie is >.5 should be in Class example("A,D")
For better understanding here is the result I want:-
Name A B C D Result Class
------- --- --- --- --- -------- -------
alpha 0 1 0 0.6 F B,D
beta 0.6 0 0 0.1 F A
gama 0 0 0 0.4 P NULL
I am New to BigQuery and need Help. What would be workaround.
This what I have done till yet
SELECT *, CASE WHEN (A > .5 OR B > .5 OR C > .5 OR D >.5)
THEN 'F'
ELSE 'P' END AS Result AND Class....//here i am stuck
FROM table1
Actually, I have no Idea how to Build this exact Script. I was able to achieve first part where I was able to Populate Result column with "F" and "P" but could not make Class to populate column names....

Since you are analysing each column, I assume you do not have a extensive quantity of columns. Therefore, I created a simple JavaScript User Defined Function (UDF) in order to check the row's value and return the column's name if the condition is met.
I have used the provided sample data to test the below query.
#javaScript UDF
CREATE TEMP FUNCTION class(A FLOAT64, B FLOAT64, C FLOAT64, D FLOAT64)
RETURNS String
LANGUAGE js AS """
var class_array=[];
if(A > 0.5){class_array.push("A");}
if(B > 0.5){class_array.push("B");}
if(C > 0.5){class_array.push("C");}
if(D > 0.5){class_array.push("D");}
return class_array;
""";
#sample data
WITH data as (
SELECT "alpha" as Name, 0 as A, 1 as B, 0 as C, 0.6 as D UNION ALL
SELECT "beta", 0.6, 0, 0, 0.1 UNION ALL
SELECT "gama", 0, 0, 0, 0.4
)
Select name, A,B,C,D,
CASE WHEN (A > .5 OR B > .5 OR C > .5 OR D >.5) THEN "F" ELSE "P" END AS Result,
IF(class(A,B,C,D) is null , null, class(A,B,C,D)) as Class from data
And the output,
Row name A B C D Result Class
1 alpha 0 1 0 0.6 F B,D
2 beta 0.6 0 0 0.1 F A
3 gama 0 0 0 0.4 P
As it is shown within the UDF, each row's value is analysed and if the condition is met, the column's name is manually added to an array of strings. In addition, pay attention that the JS UDF returns a String, not an array. It automatically converts the previously created Array to String.
Lastly, I should point that is not possible to retrieve the column name within a query in this context. Although, you can retrieve it, in other scenarios, using INFORMATION_SCHEMA.

Below is for BigQuery Standard SQL
Using javaScript UDF helps in many cases but should be avoid if problem can be solved with SQL as in below example
#standardSQL
SELECT *,
( SELECT IF(LOGICAL_OR(val > 0.5), 'F', 'P')
FROM UNNEST([A,B,C,D]) val
) AS Result,
( SELECT STRING_AGG(['A','B','C','D'][OFFSET(pos)])
FROM UNNEST([A,B,C,D]) val WITH OFFSET pos
WHERE val > 0.5
) AS Class
FROM `project.dataset.table`
You can test , play with above using sample data from y our question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'alpha' name, 0 A, 1 B, 0 C, 0.6 D UNION ALL
SELECT 'beta', 0.6, 0, 0, 0.1 UNION ALL
SELECT 'gamma', 0, 0, 0, 0.4
)
SELECT *,
( SELECT IF(LOGICAL_OR(val > 0.5), 'F', 'P')
FROM UNNEST([A,B,C,D]) val
) AS Result,
( SELECT STRING_AGG(['A','B','C','D'][OFFSET(pos)])
FROM UNNEST([A,B,C,D]) val WITH OFFSET pos
WHERE val > 0.5
) AS Class
FROM `project.dataset.table`
with output as
Row name A B C D Result Class
1 alpha 0.0 1 0 0.6 F B,D
2 beta 0.6 0 0 0.1 F A
3 gamma 0.0 0 0 0.4 P null

Related

How to only SELECT rows with non-zero and non-null columns efficiently in Big Query?

I am having a table with large number of columns in Big Query.
The table has lot of rows with some column values as 0/0.0 and null.
For example
Row A B C D E F
1 "abc" 0 null "xyz" 0 0.0
2 "bcd" 1 5 "wed" 4 65.5
I need to select only those rows which have non zero Integer, Float and non NULL values. Basically, I need just Row 2 in the above table
I know I can do this by using this query for each of the columns
SELECT * FROM table WHERE (B IS NOT NULL AND B is !=0) AND
.
.
.
But I have lot of columns and writing query like this for each of the columns would be difficult. Is there any better approach to handle this?
Below example for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT "abc" a, 0 b, NULL c, "xyz" d, 0 e, 0.0 f UNION ALL
SELECT "bcd", 1, 5, "wed", 4, 65.5
)
SELECT *
FROM `project.dataset.table` t
WHERE NOT REGEXP_CONTAINS(TO_JSON_STRING(t), r':0[,}]|null[,}]')
with output
Row a b c d e f
1 bcd 1 5 wed 4 65.5

convert cell value to respective column in PostgreSQL

Here's the sample:
select * from tmp
--output
A B Value
---------------------
a x 1
b x 2
a y 3
b y 4
c y 5
After a SQL command grouping on B column, I'd like to make each value of column A to be a separate column as illustrated below:
B a b c
----------------------------
x 1 2 null
y 3 4 5
If there any specific terminology for this transformation? Thanks!
You need to find max of other value and group it by with anchor column(b in your case). Please note that your column count should be similar to number of values expected in field A.
select b,
max(case when A='a' then Value else null end)a,
max(case when A='b' then Value else null end)b,
max(case when A='c' then Value else null end)c
from tmp
group by 1

Replicate constant output based on the occurrance of specific events

I have a table with events (say X, Y, Z are random events and A, B are the ones I want to track). If I find event A, I want to output 1 on the current and following rows and if I find B I output -1 on the current and following rows, before I find any of them (A or B) I output 0. How do I do that using Hive (SQL)?
event | output | ordercol
X 0 1
Y 0 2
Z 0 3
B -1 4
X -1 5
X -1 6
B -1 7
X -1 8
A 1 9
X 1 10
B -1 11
Z -1 12
I know this could be accomplished using joins but I'm looking for a more elegant solution (maybe using Window Functions - I've tried dense_rank() and row_count() with no success)
According to this documentation, you can use first_value() and some additional logic:
select event,
(case first_value(case when event in ('A', 'B') then event end, true) over
(order by ordercol desc)
when 'A' then -1
when 'B' then 1
else 0
end)
from e;
This capability is called IGNORE NULLS in the standard and in other databases.

Select query for the percentage of rows that have a given value?

If I have the table below, how do I write a SELECT query to return any TYPE where the percentage of rows that have a value of 1 is greater than 50%?
So in this case, it would only return B, as 66% of rows with TYPE B have a value of 1.
TYPE VALUE
-------------
A 0
A 0
A 1
A 0
B 0
B 1
B 1
C 0
C 0
C 0
You can use conditional aggregation:
select type
from t
group by t
having avg(case when value = 1 then 1.0 else 0.0 end) > 0.5;
You can include the avg() expression on the select to get the proportion.

SQL Server Conditional filtering

I essentially would like to execute a statement where x=1 AND a=2. If a<>2 then return results for filter x=1.
I've tried OR statement but it ignores my a=2 filter if there is a scenario where a does equal 2. Example below
select *
from dbo.test
where (x=1 and a=2)
or
x=1
For output purposes below: type = x and Id = a
Expected result when Type = 1.
Id(a) Name Person Type(x)
2 a Mike 1
7 b Jim 1
3 c Tom 1
4 d Tim 1
5 e Dave 1
Expected result when Type = 1 and Id = 2
Id(a) Name Person Type(x)
2 a Mike 1
Expected result when Type = 1 and Id <> 2 (scenario when there is no '2' in Id column)
Id(a) Name Person Type(x)
8 a Mike 1
7 b Jim 1
3 c Tom 1
4 d Tim 1
5 e Dave 1
The issue is not when Id = 2. It is returning Type = 1 when Id <> 2. Does that mean a case statement?
select *
from dbo.test
where (x=1 and a=2)
or x=1
Is the same as:
select *
from dbo.test
where x=1
AND requires both conditions be met, and OR requires one condition to be met. The value of a is irrelevant.
UPDATE:
You can get what you're after using the RANK() function in conjunction with a CASE statement:
;WITH cte AS (SELECT *,RANK() OVER(ORDER BY CASE WHEN id = 2 THEN 1 ELSE 2 END)RNK
FROM Table1
WHERE Type = 1)
SELECT *
FROM cte
WHERE RNK = 1
If id = 2 is present in the table, only that record will be returned, otherwise all records will be returned.
Demo: SQL Fiddle
I'm guessing that you want to return a single row.
insert into test(x,a,v) values (0, 0, 'No good')
insert into test(x,a,v) values (1, 0, 'OK')
insert into test(x,a,v) values (1, 2, 'Better')
You want the row that has x=1 a=2 if it is there
You want the row that has x=1 if there is none better
You want nothing if there is no x=1 row
If that's the case then it is a bit tricky. You can score the rows using a CASE statement and then pick the row that has the best value. This may give you more than one row back if there is a tie.
SELECT v
FROM test
WHERE CASE WHEN x=1 AND a=2 THEN 1000
WHEN x=1 AND a<>2 THEN 100
ELSE 0 END =
(SELECT MAX(CASE WHEN x=1 AND a=2 THEN 1000
WHEN x=1 AND a<>2 THEN 100
ELSE 0 END)
FROM test)