BigQuery: Select entire repeated field with group

BigQuery: Select entire repeated field with group - google-bigquery

I'm using LegacySQL, but am not strictly limited to it. (though it does have some methods I find useful, "HASH" for example).
Anyhow, the simple task is that I want to group by one top level column, while still keeping the first instance of a nested+repeated set of data alongside.
So, the following "works", and produces nested output:
SELECT
cd,
subarray.*
FROM [magicalfairy.land]
And now I attempt to just grab the entire first subarray (honestly, I don't expect this to work of course)
The following is what doesn't work:
SELECT
cd,
FIRST(subarray.*)
FROM [magicalfairy.land]
GROUP BY cd
Any alternate approaches would be appreciated.
Edit, for data behaviour example.
If Input data was roughly:
[
{
"cd": "something",
"subarray": [
{
"hello": 1,
"world": 1
},
{
"hello": 2,
"world": 2
}
]
},
{
"cd": "something",
"subarray": [
{
"hello": 1,
"world": 1
},
{
"hello": 2,
"world": 2
}
]
}
]
Would expect to get out:
[
{
"cd": "something",
"subarray": [
{
"hello": 1,
"world": 1
},
{
"hello": 2,
"world": 2
}
]
}
]

You'll have a much better time preserving the structure using standard SQL, e.g.:
WITH T AS (
SELECT
cd,
ARRAY<STRUCT<x INT64, y BOOL>>[
STRUCT(off, MOD(off, 2) = 0),
STRUCT(off - 1, false)] AS subarray
FROM UNNEST([1, 2, 1, 2]) AS cd WITH OFFSET off)
SELECT
cd,
ANY_VALUE(subarray) AS subarray
FROM T
GROUP BY cd;
ANY_VALUE will return some value of subarray for each group. If you wanted to concatenate the arrays instead, you could use ARRAY_CONCAT_AGG.
to run this against your table - try below
SELECT
cd,
ANY_VALUE(subarray) AS subarray
FROM `magicalfairy.land`
GROUP BY cd

Try below (BigQuery Standard SQL)
SELECT cd, subarray
FROM (
SELECT cd, subarray,
ROW_NUMBER() OVER(PARTITION BY cd) AS num
FROM `magicalfairy.land`
) WHERE num = 1
This gives you expected result - equivalent of "ANY ARRAY"
This solution can be extended to "FIRST ARRAY" by adding ORDER BY sort_col into OVER() clause - assuming that sort_col defines the logical order

Related

In BigQuery, how do I check if two ARRAY of STRUCTs are equal

I have a query that outputs two array of structs:
SELECT modelId, oldClassCounts, newClassCounts
FROM `xyz`
GROUP BY 1
How do I create another column that is TRUE if oldClassCounts = newClassCounts?
Here is a sample result in JSON:
[
{
"modelId": "FBF21609-65F8-4076-9B22-D6E277F1B36A",
"oldClassCounts": [
{
"id": "A041EBB1-E041-4944-B231-48BC4CCE025B",
"count": "33"
},
{
"id": "B8E4812B-A323-47DD-A6ED-9DF877F501CA",
"count": "82"
}
],
"newClassCounts": [
{
"id": "A041EBB1-E041-4944-B231-48BC4CCE025B",
"count": "33"
},
{
"id": "B8E4812B-A323-47DD-A6ED-9DF877F501CA",
"count": "82"
}
]
}
]
I want the equality column to be TRUE if oldClassCounts and newClassCounts are exactly the same like the output above.
Anything else should be false.

I would go about with this solution
#standardSQL
WITH xyz AS (
SELECT "FBF21609-65F8-4076-9B22-D6E277F1B36A" AS modelId,
[STRUCT("A041EBB1-E041-4944-B231-48BC4CCE025B" as id, "33" as count),
STRUCT("B8E4812B-A323-47DD-A6ED-9DF877F501CA" as id, "82" as count)] AS oldClassCounts,
[STRUCT("A041EBB1-E041-4944-B231-48BC4CCE025B" as id, "33" as count),
STRUCT("B8E4812B-A323-47DD-A6ED-9DF877F501CA" as id, "82" as count)] as newClassCounts),
o as (SELECT modelId, id, count, array_length(oldClassCounts) as len FROM xyz, UNNEST(oldClassCounts) as old_c),
n as (SELECT modelId, id, count, array_length(newClassCounts) as len FROM xyz, UNNEST(newClassCounts) as new_c),
uneq as (select * from o except distinct select * from n)
select xyz.*, IF(uneq.modelId is not null, false, true) as equal from xyz left join (select distinct modelId from uneq) uneq on xyz.modelId = uneq.modelId
It works regardless of the order or having duplicates within the arrays. The idea is that we treat each of the arrays as a separate temporary table removing all elements that exist in one but not the other (using except distinct) and then have an extra check for the length of the arrays in case there are duplicates e.g.
"FBF21609-65F8-4076-9B22-D6E277F1B36A" AS modelId,
[STRUCT("A041EBB1-E041-4944-B231-48BC4CCE025B" as id, "33" as count),
STRUCT("B8E4812B-A323-47DD-A6ED-9DF877F501CA" as id, "82" as count),
STRUCT("B8E4812B-A323-47DD-A6ED-9DF877F501CA" as id, "82" as count)]

I would consider comparing the result of TO_JSON_STRING function applied on both of these arrays.
In the query it would be done in the following way:
SELECT modelId,
oldClassCounts,
newClassCounts,
CASE WHEN TO_JSON_STRING(oldClassCounts) = TO_JSON_STRING(newClassCounts)
THEN true
ELSE false
END
FROM `xyz`;
I'm not sure about GROUP BY 1 part, because non of the fields are grouped or aggregated.
It is not going to work, if the order of elements in the array is going to be different. This solution is not perfect, but worked for the data you provided.

Can't hide node "data"

Column data is jsonb
SELECT
json_agg(shop_order)
FROM (
SELECT data from shop_order
WHERE data->'contacts'->'customer'->>'phone' LIKE '%1234567%' LIMIT 3 OFFSET 3
) shop_order
and here result as array:
[
{
"data": {
"id": 211111,
"cartCount": 4,
"created_at": "2020-10-28T12:58:33.387Z",
"modified_at": "2020-10-28T12:58:33.387Z"
}
}
]
Nice. But... I need to hide node data.
The result must be
[
{
"id": 211111,
"cartCount": 4,
"created_at": "2020-10-28T12:58:33.387Z",
"modified_at": "2020-10-28T12:58:33.387Z"
}
]
Is it possible?

you should be able to perform a second select on the result. then specificaly select data
SELECT (result->>'data') as result,
FROM result
example

Get data from any items in array

PosgreSQL 9.5
Field type: jsonb
Here json
{
"options": [
{
"name": "method"
},
{
"name": "flavor"
},
{
"name": "weight",
"value": {
"name": "300g"
}
}
]
}
And here query that get value of item (weight) with index = 2 from array:
SELECT
id,
product.data #>'{title,en}' AS title_en,
product.data #>>'{options, 2, value, name }' as options_weight_value
FROM product
Nice. It's work fine.
But the problem that weight can be in any index in array. First or second and so on.
So I need to get value of name (300g) in node "weight" .
I need smt like this:
SELECT
id,
product.data #>'{title,en}' AS title_en,
product.data #>>'{options, *, value, name, weight }' as options_weight_value
FROM product
Is it possible ?

I think I found solution:
SELECT
id,
p.data #>'{title,en}' AS title_en,
p.data #>'{weight,qty}' AS weight_qty,
(select *
from jsonb_array_elements(p.data -> 'options') AS options_array
where
options_array ->> 'name' = 'weight'
) #>'{value,name}' as options_weight
from product p
And now find value of weight(if exist) in any array's item. In this example it = 300g

Postgresql search if exists in nested jsonb

I'm new with jsonb request and i got a problem. Inside an 'Items' table, I have 'id' and 'data' jsonb. Here is what can look like a data:
[
{
"paramId": 3,
"value": "dog"
},
{
"paramId": 4,
"value": "cat"
},
{
"paramId": 5,
"value": "fish"
},
{
"paramId": 6,
"value": "",
"fields": [
{
"paramId": 3,
"value": "cat"
},
{
"paramId": 4,
"value": "dog"
}
]
},
{
"paramId": 6,
"value": "",
"fields": [
{
"paramId": 5,
"value": "cat"
},
{
"paramId": 3,
"value": "dog"
}
]
}
]
The value in data is always an array with object inside but sometimes the object can have a 'fields' value with objects inside. It is maximum one level deep.
How can I select the id of the items which as for example an object containing "paramId": 3 and "value": "cat" and also have an object with "paramId": 5 and "value" LIKE '%ish%'.
I already have found a way to do that when the object is on level 0
SELECT i.*
FROM items i
JOIN LATERAL jsonb_array_elements(i.data) obj3(val) ON obj.val->>'paramId' = '3'
JOIN LATERAL jsonb_array_elements(i.data) obj5(val) ON obj2.val->>'paramId' = '5'
WHERE obj3.val->>'valeur' = 'cat'
AND obj5.val->>'valeur' LIKE '%ish%';
but I don't know how to search inside the fields array if fields exists.
Thank you in advance for you help.
EDIT:
It looks like my question is not clear. I will try to make it better.
What I want to do is to find all the 'item' having in the 'data' column objects who match my search criteria. This without looking if the objects are at first level or inside a 'fields' key of an object.
Again for example. This record should be selected if I search:
'paramId': 3 AND 'value': 'cat
'paramId': 4 AND 'value': LIKE '%og%'
the matching ones are in the 'fields' key of the object with 'paramId': 6 and I don't know how to do that.

This can be expressed using a JSON/Path expression without the need for unnesting everything
To search for paramId = 3 and value = 'cat'
select *
from items
where data #? '$[*] ? ( (#.paramId == 3 && #.value == "cat") || exists( #.fields[*] ? (#.paramId == 3 && #.value == "cat")) )'
The $[*] part iterates over all elements of the first level array. To check the elements in the fields array, the exists() operator is used to nest the expression. #.fields[*] iterates over all elements in the fields array and applies the same expression again. I don't see a way how repeating the values could be avoided though.
For a "like" condition, you can use like_regex:
select *
from items
where data #? '$[*] ? ( (#.paramId == 4 && #.value like_regex ".*og.*") || exists( #.fields[*] ? (#.paramId == 4 && #.value like_regex ".*og.*")) )'

For now I have found a solution but it is not really clean and I don't know how it will perform in production with 10M records.
SELECT i.id, i.data
FROM ( -- A;
select it.id, it.data, i as value
from items it,
jsonb_array_elements(it.data) i
union
select it.id, it.data, f as value
from items it,
jsonb_array_elements(it.data) i,
jsonb_array_elements(i -> 'fields') f
) as i
WHERE (i.value ->> 'paramId' = '5' -- B1;
AND i.value ->> 'value' LIKE '%ish%')
OR (i.value ->> 'paramId' = '3' -- B2;
AND i.value ->> 'value' = 'cat')
group by i.id, i.data
having COUNT(*) >= 2; -- C;
A: I "flatten" the first and second level (second level is in 'fields' key)
B1, B2: These are my search criteria
C: I make sure the fields have all the criteria matching. If 3 criteria --> COUNT(*) >=3
It really doesn't look clean to me. It is working for dev purpose but I think there is a better way to do it.
If somebody have an idea Big thanks to him/her!

jsonb LIKE query on nested objects in an array

My JSON data looks like this:
[{
"id": 1,
"payload": {
"location": "NY",
"details": [{
"name": "cafe",
"cuisine": "mexican"
},
{
"name": "foody",
"cuisine": "italian"
}
]
}
}, {
"id": 2,
"payload": {
"location": "NY",
"details": [{
"name": "mbar",
"cuisine": "mexican"
},
{
"name": "fdy",
"cuisine": "italian"
}
]
}
}]
given a text "foo" I want to return all the tuples that have this substring. But I cannot figure out how to write the query for the same.
I followed this related answer but cannot figure out how to do LIKE.
This is what I have working right now:
SELECT r.res->>'name' AS feature_name, d.details::text
FROM restaurants r
, LATERAL (SELECT ARRAY (
SELECT * FROM json_populate_recordset(null::foo, r.res#>'{payload,
details}')
)
) AS d(details)
WHERE d.details #> '{cafe}';
Instead of passing the whole text of cafe I want to pass ca and get the results that match that text.

Your solution can be simplified some more:
SELECT r.res->>'name' AS feature_name, d.name AS detail_name
FROM restaurants r
, jsonb_populate_recordset(null::foo, r.res #> '{payload, details}') d
WHERE d.name LIKE '%oh%';
Or simpler, yet, with jsonb_array_elements() since you don't actually need the row type (foo) at all in this example:
SELECT r.res->>'name' AS feature_name, d->>'name' AS detail_name
FROM restaurants r
, jsonb_array_elements(r.res #> '{payload, details}') d
WHERE d->>'name' LIKE '%oh%';
db<>fiddle here
But that's not what you asked exactly:
I want to return all the tuples that have this substring.
You are returning all JSON array elements (0-n per base table row), where one particular key ('{payload,details,*,name}') matches (case-sensitively).
And your original question had a nested JSON array on top of this. You removed the outer array for this solution - I did the same.
Depending on your actual requirements the new text search capability of Postgres 10 might be useful.

I ended up doing this(inspired by this answer - jsonb query with nested objects in an array)
SELECT r.res->>'name' AS feature_name, d.details::text
FROM restaurants r
, LATERAL (
SELECT * FROM json_populate_recordset(null::foo, r.res#>'{payload, details}')
) AS d(details)
WHERE d.details LIKE '%oh%';
Fiddle here - http://sqlfiddle.com/#!15/f2027/5

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BigQuery: Select entire repeated field with group - google-bigquery

Related

In BigQuery, how do I check if two ARRAY of STRUCTs are equal

Can't hide node "data"

Get data from any items in array

Postgresql search if exists in nested jsonb

jsonb LIKE query on nested objects in an array

Categories

Resources