How to search for multiple matches using the IN operator in BigQuery? - google-bigquery

Right now I am filtering my rows by using the WHERE operator and 2 conditional statements. It seems somewhat inefficient that I am writing 2 conditions. Would it be possible to check if "amznbida" and "ksga" are in the array by only writing in one statement?
standardSQL
-- Get all the keys
SELECT
*
FROM `encoded-victory-198215.DFP_TEST.test3`
WHERE
"amznbida" IN UNNEST(ARRAY(SELECT name FROM UNNEST(keywords)))
AND
"ksga"IN UNNEST(ARRAY(SELECT name FROM UNNEST(keywords)))

Just remove the UNNEST(ARRAY( part and leave the subquery - you should be fine.
working example:
SELECT
*,
t in (select * from unnest(a)) condition
FROM unnest([
struct('a' as t, ['a', 'b', 'c'] as a),
('b',['r', 'f'])
])

Below is for BigQuery Standard SQL
#standardSQL
SELECT *
FROM `encoded-victory-198215.DFP_TEST.test3`
WHERE 2 = (SELECT COUNT(DISTINCT name) FROM UNNEST(keywords) WHERE name IN ("amznbida", "ksga"))
Yu can test, play with above using dummy data as below
#standardSQL
WITH `encoded-victory-198215.DFP_TEST.test3` AS (
SELECT
ARRAY<STRUCT<value ARRAY<STRING>, name STRING>>[
STRUCT(['ksg-1', 'ksg-2'], 'ksga'), STRUCT(['amznbid-1', 'amznbid-2'], 'amznbida')
] keywords,
1 impression UNION ALL
SELECT
ARRAY<STRUCT<value ARRAY<STRING>, name STRING>>[
STRUCT(['xxx-1', 'xxx-2'], 'xxxa'), STRUCT(['amznbid-1', 'amznbid-2'], 'amznbida')
] keywords,
2 impression
)
SELECT *
FROM `encoded-victory-198215.DFP_TEST.test3`
WHERE 2 = (SELECT COUNT(DISTINCT name) FROM UNNEST(keywords) WHERE name IN ("amznbida", "ksga"))
with result
Row keywords.value keywords.name impression
1 ksg-1 ksga 1
ksg-2
amznbid-1 amznbida
amznbid-2

Related

PostgreSQL: Select unique rows where distinct values are in list

Say that I have the following table:
with data as (
select 'John' "name", 'A' "tag", 10 "count"
union all select 'John', 'B', 20
union all select 'Jane', 'A', 30
union all select 'Judith', 'A', 40
union all select 'Judith', 'B', 50
union all select 'Judith', 'C', 60
union all select 'Jason', 'D', 70
)
I know there are a number of distinct tag values, namely (A, B, C, D).
I would like to select the unique names that only have the tag A
I can get close by doing
-- wrong!
select
distinct("name")
from data
group by "name"
having count(distinct tag) = 1
however, this will include unique names that only have 1 distinct tag, regardless of what tag is it.
I am using PostgreSQL, although having more generic solutions would be great.
You're almost there - you already have groups with one tag, now just test if it is the tag you want:
select
distinct("name")
from data
group by "name"
having count(distinct tag) = 1 and max(tag)='A'
(Note max could be min as well - SQL just doesn't have single() aggregate function but that's different story.)
You can use not exists here:
select distinct "name"
from data d
where "tag" = 'A'
and not exists (
select * from data d2
where d2."name" = d."name" and d2."tag" != d."tag"
);
This is one possible way of solving it:
select
distinct("name")
from data
where "name" not in (
-- create list of names we want to exclude
select distinct name from data where "tag" != 'A'
)
But I don't know if it's the best or most efficient one.

Count unique within combination of json keys in BigQuery

In BigQuery I do have a json stored in 1 column like this:
{"key1": "value1", "key3":"value3"}
{"key2": "value2"}
{"key3": "value3"}
What I'd want to know is how to calculate number of unique combinations, paying attention that there can be up to 100+ different keys so avoiding listing them would be beneficial.
In example above end result of unique number will be 2, because first and third matched by "key3", while second didn't matched with anything.
I understand how to build this with writing an app that will calculate it, but would like to see if there is any solution possible with 1 query
If your JSON values are formatted with no spaces after the :, then you can treat this as string manipulations:
with t as (
select '{"key1":"value1", "key3":"value3"}' as kv union all
select '{"key2":"value2"}' union all
select '{"key3":"value3"}'
)
select x, count(*)
from t cross join
unnest(regexp_extract_all(t.kv, '"[^,]+"')) x
group by x
having count(*) = 1;
With the spaces, you can use replace() to get rid of them:
with t as (
select '{"key1": "value1", "key3":"value3"}' as kv union all
select '{"key2": "value2"}' union all
select '{"key3": "value3"}'
)
select replace(x, '": "', '":"'), count(*)
from t cross join
unnest(regexp_extract_all(t.kv, '"[^,]+"')) x
group by 1
having count(*) = 1;

BigQuery use the where clause to filter on a column that not always exists in the table

I need to create some kind of a uniform query for multiple tables. Some tables contain a certain column with a type. If this is the case, I need to apply filtering to it. I don't know how to do this.
I have for example two tables
table_customer_1
CustomerId, CustomerType
1, 1
2, 1
3, 2
Table_customer_2
Customerid
4
5
6
The query needs to be something like the one below and should work for both tables (the table name wil be replaced by the customer that uses the query):
With input1 as(
SELECT
(CASE WHEN exists(customerType) THEN customerType ELSE "0" END) as customerType, *
FROM table_customer_1)
SELECT * from input1
WHERE customerType != 2
Below is for BigQuery Standard SQL
#standardSQL
SELECT *
FROM `project.dataset.table` t
WHERE SAFE_CAST(IFNULL(JSON_EXTRACT_SCALAR(TO_JSON_STRING(t), '$.CustomerType'), '0') AS INT64) != 2
or as a simplification you can ignore casting to INT64 and use comparison to STRING
#standardSQL
SELECT *
FROM `project.dataset.table` t
WHERE IFNULL(JSON_EXTRACT_SCALAR(TO_JSON_STRING(t), '$.CustomerType'), '0') != '2'
above will work for whatever table you put instead of project.dataset.table: either project.dataset.table_customer_1 or project.dataset.table_customer_2 - so quite generic I think
I can think of no good reason for doing this. However, it is possible by playing with the scoping rules for subqueries:
SELECT t.*
FROM (SELECT t.*,
(SELECT customerType -- will choose from tt if available, otherwise x
FROM table_customer_1 tt
WHERE tt.Customerid = t.Customerid
) as customerType
FROM (SELECT t.* EXCEPT (Customerid)
FROM table_customer_1 t
) t CROSS JOIN
(SELECT 0 as customerType) x
) t
WHERE customerType <> 2

Extracting Key-worlds out of string and show them in another column

I need to write a query to extract specific names out of String and have them show in another column for example a column has this field
Column:
Row 1: jasdhj31e31jh123hkkj,12l1,3jjds,Amin,02323rdcsnj
Row 2:jasnasc8918212,ahsahkdjjMina67,
Row 3:kasdhakshd,asda,asdasd,121,121,Sina878788kasas
Key Words: Amin,Mina,Sina
How could I have these key words in another column? I dont want to insert another column but if that's the only solution let me know.
Any help appreciated!
Below is for BigQuery Standard SQL
#standardSQL
WITH keywords AS (
SELECT keyword
FROM UNNEST(SPLIT('Amin,Mina,Sina')) keyword
)
SELECT str, STRING_AGG(keyword) keywords_in_str
FROM `project.dataset.table`
CROSS JOIN keywords
WHERE REGEXP_CONTAINS(str, CONCAT(r'(?i)', keyword))
GROUP BY str
You can test, play with above using dummy data from your question as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'jasdhMINAj31e31jh123hkkj,12l1,3jjds,Amin,02323rdcsnj' str UNION ALL
SELECT 'jasnasc8918212,ahsahkdjjMina67,' UNION ALL
SELECT 'kasdhakshd,asda,asdasd,121,121,Sina878788kasas'
), keywords AS (
SELECT keyword
FROM UNNEST(SPLIT('Amin,Mina,Sina')) keyword
)
SELECT str, STRING_AGG(keyword) keywords_in_str
FROM `project.dataset.table`
CROSS JOIN keywords
WHERE REGEXP_CONTAINS(str, CONCAT(r'(?i)', keyword))
GROUP BY str
with results as
Row str keywords_in_str
1 jasdhMINAj31e31jh123hkkj,12l1,3jjds,Amin,02323rdcsnj Amin,Mina
2 jasnasc8918212,ahsahkdjjMina67, Mina
3 kasdhakshd,asda,asdasd,121,121,Sina878788kasas Sina
to count the no of keywords
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'jasdhMINAj31e31jh123hkkj,12l1,3jjds,Amin,02323rdcsnj' str UNION ALL
SELECT 'jasnasc8918212,ahsahkdjjMina67,' UNION ALL
SELECT 'kasdhakshd,asda,asdasd,121,121,Sina878788kasas'
)
select str,array(select as struct countif(lower(x) ="amin") amin,countif(lower(x) ="mina") mina,countif(lower(x)="sina") sina from unnest(x)x)keyword from
(select str,regexp_extract_all(str,"(?i)(Amin|Mina|Sina)")x from `project.dataset.table`)

Using a case statement as an if statement

I am attempting to create an IF statement in BigQuery. I have built a concept that will work but it does not select the data from a table, I can only get it to display 1 or 0
Example:
SELECT --AS STRUCT
CASE
WHEN (
Select Count(1) FROM ( -- If the records are the same, then return = 0, if the records are not the same then > 1
Select Distinct ESCO, SOURCE, LDCTEXT, STATUS,DDR_DATE, TempF, HeatingDegreeDays, DecaTherms
from `gas-ddr.gas_ddr_outbound.LexingtonDDRsOutbound_onchange_Prior_Filtered`
Except Distinct
Select Distinct ESCO, SOURCE, LDCTEXT, STATUS,DDR_DATE, TempF, HeatingDegreeDays, DecaTherms
from `gas-ddr.gas_ddr_outbound.LexingtonDDRsOutbound_onchange_Latest_Filtered`
)
)= 0
THEN
(Select * from `gas-ddr.gas_ddr_outbound.LexingtonDDRsOutbound_onchange_Latest`) -- This Does not
work Scalar subquery cannot have more than one column unless using SELECT AS
STRUCT to build STRUCT values at [16:4] END
SELECT --AS STRUCT
CASE
WHEN (
Select Count(1) FROM ( -- If the records are the same, then return = 0, if the records are not the same then > 1
Select Distinct ESCO, SOURCE, LDCTEXT, STATUS,DDR_DATE, TempF, HeatingDegreeDays, DecaTherms
from `gas-ddr.gas_ddr_outbound.LexingtonDDRsOutbound_onchange_Prior_Filtered`
Except Distinct
Select Distinct ESCO, SOURCE, LDCTEXT, STATUS,DDR_DATE, TempF, HeatingDegreeDays, DecaTherms
from `gas-ddr.gas_ddr_outbound.LexingtonDDRsOutbound_onchange_Latest_Filtered`
)
)= 0
THEN 1 --- This does work
Else
0
END
How can I Get this query to return results from an existing table?
You question is still a little generic, so my answer same as well - and just mimic your use case at extend I can reverse engineer it from your comments
So, in below code - project.dataset.yourtable mimics your table ; whereas
project.dataset.yourtable_Prior_Filtered and project.dataset.yourtable_Latest_Filtered mimic your respective views
#standardSQL
WITH `project.dataset.yourtable` AS (
SELECT 'aaa' cols, 'prior' filter UNION ALL
SELECT 'bbb' cols, 'latest' filter
), `project.dataset.yourtable_Prior_Filtered` AS (
SELECT cols FROM `project.dataset.yourtable` WHERE filter = 'prior'
), `project.dataset.yourtable_Latest_Filtered` AS (
SELECT cols FROM `project.dataset.yourtable` WHERE filter = 'latest'
), check AS (
SELECT COUNT(1) > 0 changed FROM (
SELECT DISTINCT cols FROM `project.dataset.yourtable_Latest_Filtered`
EXCEPT DISTINCT
SELECT DISTINCT cols FROM `project.dataset.yourtable_Prior_Filtered`
)
)
SELECT t.* FROM `project.dataset.yourtable` t
CROSS JOIN check WHERE check.changed
the result is
Row cols filter
1 aaa prior
2 bbb latest
if you changed your table to
WITH `project.dataset.yourtable` AS (
SELECT 'aaa' cols, 'prior' filter UNION ALL
SELECT 'aaa' cols, 'latest' filter
) ......
the result will be
Row cols filter
Query returned zero records.
I hope this gives you right direction
Added more explanations:
I can be wrong - but per your question - it looks like you have one table project.dataset.yourtable and two views project.dataset.yourtable_Prior_Filtered and project.dataset.yourtable_Latest_Filtered which present state of your table prior and after some event
So, first three CTE in the answer above just mimic those table and views which you described in your question.
They are here so you can see concept and can play with it without any extra work before adjusting this to your real use-case.
For your real use-case you should omit them and use your real table and views names and whatever columns the have.
So the query for you to play with is:
#standardSQL
WITH check AS (
SELECT COUNT(1) > 0 changed FROM (
SELECT DISTINCT cols FROM `project.dataset.yourtable_Latest_Filtered`
EXCEPT DISTINCT
SELECT DISTINCT cols FROM `project.dataset.yourtable_Prior_Filtered`
)
)
SELECT t.* FROM `project.dataset.yourtable` t
CROSS JOIN check WHERE check.changed
It should be a very simple IF statement in any language.
Unfortunately NO! it cannot be done with just simple IF and if you see it fit you can submit a feature request to BigQuery team for whatever you think makes sense