How do I find elements in an array in BigQuery - sql

I am trying to search for a row that has certain key value pairs in an array. A row in my BigQuery table would look something like this.
{
"ip": "192.168.1.1",
"cookie" [
{
"key": "apple",
"value: "red"
},
{
"key": "orange",
"value: "orange"
},
{
"key": "grape",
"value: "purple"
}
]
}
I thought about using implicit UNNEST or CROSS JOIN like the following, but it didn't work because unnesting it would just create multiple different rows.
SELECT ip
FROM table t, t.cookie c
WHERE (c.key = "grape" AND c.value ="purple") AND (c.key = "orange" AND c.value ="orange")
This link is really close to what I want to do, except they are using legacy SQL and not standardSQL

#standardSQL
SELECT ip
FROM yourTable
WHERE (
SELECT COUNT(1)
FROM UNNEST(cookie) AS pair
WHERE pair IN (('grape', 'purple'), ('orange', 'orange'))
) >= 2
you can test it with below dummy data
#standardSQL
WITH yourTable AS (
SELECT '192.168.1.1' AS ip, [('apple', 'red'), ('orange', 'orange'), ('grape', 'purple')] AS cookie UNION ALL
SELECT '192.168.1.2', [('abc', 'xyz')]
)
SELECT ip
FROM yourTable
WHERE (
SELECT COUNT(1)
FROM UNNEST(cookie) AS pair
WHERE pair IN (('grape', 'purple'), ('orange', 'orange'))
) >= 2
In case if you need output ip if at least one pair is in array - you need to change >= 2 to >=1 in WHERE clause

Mikhail's solution is good if it is guaranteed that there are no duplicate pairs in the cookie array. But if there could be duplicates, here is the alternative solution:
#standardSQL
WITH yourTable AS (
SELECT
'192.168.1.1' AS ip,
[('apple', 'red'), ('orange', 'orange'), ('grape', 'purple')] AS cookie UNION ALL
SELECT
'192.168.1.2',
[('abc', 'xyz'), ('orange', 'orange'), ('orange', 'orange')]
)
SELECT ip
FROM yourTable t
WHERE (
('grape', 'purple') IN UNNEST(t.cookie) AND
('orange', 'orange') IN UNNEST(t.cookie) )
Results in only
ip
-----------
192.168.1.1

Related

sql how to select row based on json format field value?

I have a table ,one of the table filed is json format:
my_table
id json_field
1 { "to_status": 7, "to_status_name": "In Progress", "role": "admin"}
2 { "to_status": 3, "to_status_name": "Completed", "role": "admin"}
3 { "to_status": 2, "to_status_name": "Completed", "role": "customer"}
How can I select all rows that "to_status" is 3 or 2 ?
Any friend can help?
You can use
SELECT * FROM my_table
WHERE JSON_EXTRACT(json_field, '$.to_status') IN (3, 2)
or
SELECT * FROM my_table
WHERE json_field->>'$.to_status' IN (3, 2)
to select what you need.
When extracted Json values are strings. So you need to test for string values or cast to_status extracted values to an integer:
SELECT * FROM my_table WHERE json_field->>'to_status' in ('2','3');
or
SELECT * FROM my_table WHERE (json_field->>'to_status')::int in (2,3);
But then you can get the same results extracting the to_status_name:
SELECT * FROM my_table WHERE json_field->>'to_status_name' = 'Completed';

How to convert an array of key values to columns in BigQuery / GoogleSQL?

I have an array in BigQuery that looks like the following:
SELECT params FROM mySource;
[
{
key: "name",
value: "apple"
},{
key: "color",
value: "red"
},{
key: "delicious",
value: "yes"
}
]
Which looks like this:
params
[{ key: "name", value: "apple" },{ key: "color", value: "red" },{ key: "delicious", value: "yes" }]
How do I change my query so that the table looks like this:
name
color
delicious
apple
red
yes
Currently I'm able to accomplish this with:
SELECT
(
SELECT p.value
FROM UNNEST(params) AS p
WHERE p.key = "name"
) as name,
(
SELECT p.value
FROM UNNEST(params) AS p
WHERE p.key = "color"
) as color,
(
SELECT p.value
FROM UNNEST(params) AS p
WHERE p.key = "delicious"
) as delicious,
FROM mySource;
But I'm wondering if there is a way to do this without manually specifying the key name for each. We may not know all the names of the keys ahead of time.
Thanks!
Consider below approach
select * except(id) from (
select to_json_string(t) id, param.*
from mySource t, unnest(parameters) param
)
pivot (min(value) for key in ('name', 'color', 'delicious'))
if applied to sample data in your question - output is like below
As you can see - you still need to specify key names but whole query is much simpler and more manageable
Meantime, above query can be enhanced with use of EXECUTE IMMEDIATE where list of key names is auto generated. I have at least few answers with such technique, so search for it here on SO if you want (I just do not want to make a duplicates here)
Here is my try based on Mikhail's answer here
--DDL for sample view
create or replace view sample.sampleview
as
with _data
as
(
select 1 as id,
array (
select struct(
"name" as key,
"apple" as value
)
union all
select struct(
"color" as key,
"red" as value
)
union all
select struct(
"delicious" as key,
"yes" as value
)
) as _arr
union all
select 2 as id,
array (
select struct(
"name" as key,
"orange" as value
)
union all
select struct(
"color" as key,
"orange" as value
)
union all
select struct(
"delicious" as key,
"yes" as value
)
)
)
select * from _data
Execute immediate
declare sql string;
set sql =
(
select
concat(
"select id,",
string_agg(
concat("max(if (key = '",key,"',value,NULL)) as ",key)
),
' from sample.sampleview,unnest(_arr) group by id'
)
from (
select key from
sample.sampleview,unnest(_arr)
group by key
)
);
execute immediate sql;

SQL equivalent "$all" MongoDB operator on Repeated string

Suppose the following Data structure:
MongoDB: {id: ObjectId, colors: String[]}
SQL: Column ID (Integer), Column COLORS (Repeated String)
Suppose the following MongoDB query:
collection.find({colors: {$all: ["blue", "orange", "yellow"]} })
What is the equivalent operator/notation for "$all" in SQL? Notice that different from the $in, $all looks for documents (rows) having that field matching ALL the values, not only "some" of them.
Assuming there are no duplicates in the repeated values, you can use:
select s.*
from sql s
where (select count(*)
from unnest(s.colors) color
where color in ('blue', 'orange', 'yellow')
) = 3;
The "3" is the size of the list. If there are duplicates, then use count(distinct color) instead.
If you don't want to "remember" 3, you can use:
with color_list as (
select color
from unnest(array['blue', 'orange', 'yellow']) color
)
select s.*
from sql s
where (select count(*)
from unnest(s.colors) color join
color_list cl
using (color)
) = (select count(*) from color_list);
Or even:
select s.*
from sql s
where not exists (select 1
from unnest(array['blue', 'orange', 'yellow']) my_color left join
unnest(s.colors) color
on my_color = color
where color is null
);
Below is for BigQuery Standard SQL
#standardSQL
create temp function check_all(arr ANY TYPE, match ANY TYPE) as (
array_length(array(
select distinct m from unnest(match) m
join unnest(arr) m using(m)
)) = array_length(array(
select distinct m from unnest(match) m
))
);
select *
from `project.dataset.table`
where check_all(colors, ['blue', 'orange', 'yellow'])
if to apply to below dummy sample data
with `project.dataset.table` as (
select 1 id, ['blue', 'orange', 'yellow', 'black'] colors union all
select 2, ['blue', 'pink', 'yellow', 'green'] union all
select 3, ['red', 'orange', 'blue', 'pink', 'yellow', 'green']
)
the output is

why Snowflake changing the order of JSON values when converting into flatten list?

I have JSON objects stored in the table and I am trying to write a query to get the first element from that JSON.
Replication Script
create table staging.par.test_json (id int, val varchar(2000));
insert into staging.par.test_json values (1, '{"list":[{"element":"Plumber"},{"element":"Craft"},{"element":"Plumbing"},{"element":"Electrics"},{"element":"Electrical"},{"element":"Tradesperson"},{"element":"Home services"},{"element":"Housekeepings"},{"element":"Electrical Goods"}]}');
insert into staging.par.test_json values (2,'
{
"list": [
{
"element": "Wholesale jeweler"
},
{
"element": "Fashion"
},
{
"element": "Industry"
},
{
"element": "Jewelry store"
},
{
"element": "Business service"
},
{
"element": "Corporate office"
}
]
}');
with cte_get_cats AS
(
select id,
val as category_list
from staging.par.test_json
),
cats_parse AS
(
select id,
parse_json(category_list) as c
from cte_get_cats
),
distinct_cats as
(
select id,
INDEX,
UPPER(cast(value:element AS varchar)) As c
from
cats_parse,
LATERAL flatten(INPUT => c:"list")
order by 1,2
) ,
cat_array AS
(
SELECT
id,
array_agg(DISTINCT c) AS sds_categories
FROM
distinct_cats
GROUP BY 1
),
sds_cats AS
(
select id,
cast(sds_categories[0] AS varchar) as sds_primary_category
from cat_array
)
select * from sds_cats;
Values: Categories
{"list":[{"element":"Plumber"},{"element":"Craft"},{"element":"Plumbing"},{"element":"Electrics"},{"element":"Electrical"},{"element":"Tradesperson"},{"element":"Home services"},{"element":"Housekeepings"},{"element":"Electrical Goods"}]}
Flattening it to a list gives me
["Plumber","Craft","Plumbing","Electrics","Electrical","Tradesperson","Home services","Housekeepings","Electrical Goods"]
Issue:
The order of this is not always same. Snowflake seems to change the ordering sometimes snowflake changes the order as per the alphabet.
How can I make this static. I do not want the order to be changed.
The problem is the way you're using ARRAY_AGG:
array_agg(DISTINCT c) AS sds_categories
Specifying it like that gives Snowflake no guidelines on how the contents of array should be arranged. You should not assume that the arrays will be created in the same order as their input records - it might, but it's not guaranteed. So you probably want to do
array_agg(DISTINCT c) within group (order by index) AS sds_categories
But that won't work, as if you use DISTINCT c, the value of index for each c is unknown. Perhaps you don't need DISTINCT, then this will work
array_agg(c) within group (order by index) AS sds_categories
If you do need DISTINCT, you need to somehow associate an index with a distinct c value. One way is to use a MIN function on index in the input. Here's a full query
with cte_get_cats AS
(
select id,
val as category_list
from staging.par.test_json
),
cats_parse AS
(
select id,
parse_json(category_list) as c
from cte_get_cats
),
distinct_cats as
(
select id,
MIN(INDEX) AS index,
UPPER(cast(value:element AS varchar)) As c
from
cats_parse,
LATERAL flatten(INPUT => c:"list")
group by 1,3
) ,
cat_array AS
(
SELECT
id,
array_agg(c) within group (order by index) AS sds_categories
FROM
distinct_cats
GROUP BY 1
),
sds_cats AS
(
select id,
cast(sds_categories[0] AS varchar) as sds_primary_category
from cat_array
)
select * from cat_array;

How to use a subset of the row columns when converting to JSON?

I have a table t with some columns a, b and c. I use the following query to convert rows to a JSON array of objects:
SELECT COALESCE(JSON_AGG(t ORDER BY c), '[]'::json)
FROM t
This returns as expected:
[
{
"a": ...,
"b": ...,
"c": ...
},
{
"a": ...,
"b": ...,
"c": ...
}
]
Now I want the same result, but with only columns a and b in the output. I will still use column c for ordering. The best I came up with is as following:
SELECT COALESCE(JSON_AGG(JSON_BUILD_OBJECT('a', a, 'b', b) ORDER BY c), '[]'::json)
FROM t
[
{
"a": ...,
"b": ...
},
{
"a": ...,
"b": ...
}
]
Although this works fine, I am wondering if there is a more elegant way to do this. It frustrates me that I have to manually define the JSON properties. Of course, I understand that I have to enumerate the columns a and b, but it's odd that I have to copy/paste the corresponding JSON property name, which is exactly the same as the column name anyway.
Is there a another way to do this?
You can use row_to_json instead of manually building object:
CREATE TABLE foobar (a text, b text, c text);
INSERT INTO foobar VALUES
('1', 'LOREM', 'A'),
('2', 'LOREM', 'B'),
('3', 'LOREM', 'C');
--Using CTE
WITH tmp AS (
SELECT a, b FROM foobar ORDER BY c
)
SELECT json_agg(row_to_json(t)) FROM tmp t
--Using subquery
SELECT
json_agg(row_to_json(t))
FROM
(SELECT a, b FROM foobar ORDER BY c) t;
EDIT: As you stated, result order is a strict requirement. In this case you could use a row constructor to build json object:
--Using a type to build json with desired keys
CREATE TYPE mytype AS (a text, b text);
SELECT
json_agg(
to_json(
CAST(
ROW(a, b) AS mytype
)
)
ORDER BY c)
FROM
foobar;
--Ignoring column names...
SELECT
json_agg(
to_json(
ROW(a, b)
)
ORDER BY c)
FROM
foobar;
SQL Fiddle here.
perform the ordering in a subquery or cte and then apply json_agg
SELECT COALESCE(JSON_AGG(t2), '[]'::json)
FROM (SELECT a, b FROM t ORDER BY c) t2
alternatively use jsonb. The jsonb type allows deletion of items by specifying their key
SELECT coalesce(jsonb_agg(row_to_json(t)::jsonb - 'c'
order by c), '[]'::jsonb)
FROM t