BigQuery JSON Array extraction

BigQuery JSON Array extraction - google-bigquery

I have this JSON
"type": "list",
"data": [
{
"id": "5bc7a3396fbc71aaa1f744e3",
"type": "company",
"url": "/companies/5bc7a3396fbc71aaa1f744e3"
},
{
"id": "5b0aa0ac6e378450e980f89a",
"type": "company",
"url": "/companies/5b0aa0ac6e378450e980f89a"
}
],
"url": "/contacts/5802b14755309dc4d75d184d/companies",
"total_count": 2,
"has_more": false
}
I want to dynamically create columns as the number of the companies with their Ids, for example:
company_0
comapny_1
5bc7a3396fbc71aaa1f744e3
5b0aa0ac6e378450e980f89a
Tried to use BigQuery's JSON functions but I didn't get along with it.
Thank you.

Consider below approach
select * except(json) from (
select json, json_extract_scalar(line, '$.id') company, offset
from your_table t, unnest(json_extract_array(json, '$.data')) line with offset
where json_extract_scalar(line, '$.type') = 'company'
)
pivot (any_value(company) company for offset in (0, 1))
if applied to sample data in your question - output is

Related

How to extract a value in a JSON table on BigQuery?

I have a JSON table which has over 30.000 rows. There are different rows like this:
JSON_columns
------------
{
"level": 20,
"nickname": "ABCDE",
"mission_name": "take_out_the_trash",
"mission_day": "150",
"duration": "0",
"properties": []
}
{
"nickname": "KLMNP",
"mission_name": "recycle",
"mission_day": "180",
"properties": [{
"key": "bottle",
"value": {
"string_value": "blue_bottle"
}
}, {
"key": "bottleRecycle",
"value": {
"string_value": "true"
}
}, {
"key": "price",
"value": {
"float_value": 21.99
}
}, {
"key": "cost",
"value": {
"float_value": 15.39
}
}]
}
I want to take the sum of costs the table. But firtsly, I want to extract the cost from the table.
I tried the code below. It returns null:
SELECT JSON_VALUE('$.properties[3].value.float_value') AS profit
FROM `missions.missions_study`
WHERE mission_name = "recycle"
My question is, how can I extract the cost values right, and sum them?

Common way to extract cost from your json is like below.
WITH sample_table AS (
SELECT '{"level":20,"nickname":"ABCDE","mission_name":"take_out_the_trash","mission_day":"150","duration":"0","properties":[]}' json
UNION ALL
SELECT '{"nickname":"KLMNP","mission_name":"recycle","mission_day":"180","properties":[{"key":"bottle","value":{"string_value":"blue_bottle"}},{"key":"bottleRecycle","value":{"string_value":"true"}},{"key":"price","value":{"float_value":21.99}},{"key":"cost","value":{"float_value":15.39}}]}' json
)
SELECT SUM(cost) AS total FROM (
SELECT CAST(JSON_VALUE(prop, '$.value.float_value') AS FLOAT64) AS cost
FROM sample_table, UNNEST(JSON_QUERY_ARRAY(json, '$.properties')) prop
WHERE JSON_VALUE(json, '$.mission_name') = 'recycle'
AND JSON_VALUE(prop, '$.key') = 'cost'
);

Update one key value in JSON using Presto

I have a JSON column (_col0) like below and wanted to update only the 'name' part of json to new value.
{
"id":"1234",
"name":"Demo 1",
"attributes":[
{
"id": "1122",
"name": "affiliate",
"type": "number"
}
],
"behaviors": [
{
"id": "246685",
"name": "Email Send",
"scheduleOption": null,
"defaultTimeFilterEnabled": true,
"schema": []
}
]
}
I wanted to only change value of the outer "name" parameter from 'Demo 1' to 'Demo 2'. The SQL I tried does change the name parameter but makes the rest all to null.
select transform_values(cast(json_parse(_col0) as MAP(varchar, json)) , (k, v) -> if(k='name','Demo 2')) from table1

if has overload with 3 parameters, the 3rd being value for false case, use it to return the current value (you will need to transform either you varchar literal to json or json value to varchar):
-- sample data
WITH dataset (json_str) AS (
VALUES ('{
"id":"1234",
"name":"Demo 1",
"attributes":[
{
"id": "1122",
"name": "affiliate",
"type": "number"
}
],
"behaviors": [
{
"id": "246685",
"name": "Email Send",
"scheduleOption": null,
"defaultTimeFilterEnabled": true,
"schema": []
}
]
}')
)
-- query
select transform_values(
cast(json_parse(json_str) as MAP(varchar, json)),
(k, v)->if(k = 'name', cast('Demo 2' as json), v)
)
from dataset
Output:
_col0
{behaviors=[{"id":"246685","name":"Email Send","scheduleOption":null,"defaultTimeFilterEnabled":true,"schema":[]}], name="Demo 2", attributes=[{"id":"1122","name":"affiliate","type":"number"}], id="1234"}

Bigquery update / insert in nested arrays and arrays of structs

Editing the question to have a better view ..
There are 2 tables - Staging and Core.
I am having trouble copying the data from Staging to Core.
Conditions
If id, Year and local_id matches in both staging and core -> the data for that specific Array row should be updated from staging to core
If id does not match in staging and core -> A new Row should be inserted in CORE with values from STAGING
If id matches but either of local_id/Year do not match, then a new row should be inserted in the data array.
BigQuery schema for STAGING
[
{
"name": "id",
"type": "STRING"
},
{
"name": "content",
"type": "STRING"
},
{
"name": "createdAt",
"type": "TIMESTAMP"
},
{
"name": "sourceFileName",
"type": "STRING"
},
{
"name": "data",
"type": "record",
"fields": [
{
"name": "local_id",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "year",
"type": "INTEGER",
"mode": "NULLABLE"
},
{
"name": "country",
"type": "STRING",
"mode": "NULLABLE"
}
]
}
]
BigQuery schema for CORE
[
{
"name": "id",
"type": "STRING"
},
{
"name": "content",
"type": "STRING"
},
{
"name": "createdAt",
"type": "TIMESTAMP"
},
{
"name": "data",
"type": "record",
"mode": "REPEATED",
"fields": [
{
"name": "local_id",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "year",
"type": "INTEGER",
"mode": "NULLABLE"
},
{
"name": "country",
"type": "STRING",
"mode": "NULLABLE"
}
]
}
]
Big Query content for staging -
{"id":"1","content":"content1","createdAt":"2020-07-23 12:46:15.054410 UTC","sourceFileName":"abc.json","data":{"local_id":"123","year":2018,"country":"PL"}}
{"id":"1","content":"content3","createdAt":"2020-07-23 12:46:15.054410 UTC","sourceFileName":"abc.json","data":{"local_id":"123","year":2021,"country":"SE"}}
{"id":"2","content":"content4","createdAt":"2020-07-23 12:46:15.054410 UTC","sourceFileName":"abc.json","data":{"local_id":"334","year":2021,"country":"AZ"}}
{"id":"2","content":"content5","createdAt":"2020-07-23 12:46:15.054410 UTC","sourceFileName":"abc.json","data":{"local_id":"337","year":2021,"country":"NZ"}}
Example Content structure
Big Query content for core -
{"id":"1","content":"content1","createdAt":"2020-07-23 12:46:15.054410 UTC","data":[{"local_id":"123","year":2018,"country":"SE"},{"local_id":"33","year":2019,"country":"PL"},{"local_id":"123","year":2020,"country":"SE"}]}
Example Content structure

Try using the MERGE statement:
MERGE `dataset.destination` D
USING (select id, array(select data) data from `dataset.source`) S
ON D.id = S.id
WHEN MATCHED THEN
UPDATE SET data = S.data
WHEN NOT MATCHED THEN
INSERT (id, data) VALUES(S.id, S.data)

I was finally able to nail the problem.
To merge 2 records, I had to resort to subqueries pushing in some work. Although, I still think there are chances of improvement to this code.
-- INSERT IDs
INSERT `deep_test.main_table` (people_id)
(
SELECT distinct(people_id) FROM `deep_test.staging_test`
WHERE people_id NOT IN ( SELECT people_id FROM `deep_test.main_table` )
);
-- UPDATE TALENT RECORD
UPDATE
`deep_test.main_table` gold
SET
talent = B.talent
FROM
(
SELECT
gold.people_id as people_id,
ARRAY_AGG(aggregated_stage.talent) as talent
FROM
`deep_test.main_table` gold
JOIN
(
SELECT
A.people_id,
A.talent
FROM
(
SELECT
ARRAY_AGG( t
ORDER BY
t.createdAt DESC LIMIT 1 )[OFFSET(0)] A
FROM
`deep_test.staging_test` t
GROUP BY
t.people_id,
t.talent.people_l_id,
t.talent.fiscalYear
)
) as aggregated_stage
ON gold.people_id = aggregated_stage.people_id
WHERE aggregated_stage.talent is not null
GROUP BY people_id
)
B
WHERE
B.people_id = gold.people_id;
-- UPDATE COUNTRY CODE
UPDATE `deep_test.core` core
set core.country_code = countries.number
FROM
(
select people_id , (select country from UNNEST(talent) as d order by d.fiscalYear DESC limit 1) as country FROM `deep_test.core`
) B, `deep_test.countries` countries
WHERE
core.people_id = B.people_id
AND countries.code = B.country;
This creates a subquery and assigns the results to a variable. This variable can be used as a table in for querying and joining the results with another table.

to create an array field, use the ARRAY() function.
to append to an array field, use the ARRAY_CONCAT() function.
this query can be used to do "Updated if present" requirement:
UPDATE `destenation` d
SET
d.data = ARRAY_CONCAT( d.data, ARRAY(
SELECT
s.data
FROM
`source` s
WHERE
d.id = s.id) )
WHERE d.id in (SELECT id from `source` s)
https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#update_using_joins
https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#creating_arrays_from_subqueries
https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#combining_arrays

Trying to construct PostgreSQL Query to extract from JSON a text value in an object, in an array, in an object, in an array, in an object

I am constructing an interface between a PostgreSQL system and a SQL Server system and am attempting to "flatten" the structure of the JSON data to facilitate this. I'm very experienced in SQL Server but I'm new to both PostgreSQL and JSON.
The JSON contains essentially two types of structure: those of type "text" or "textarea" where the value I want is in an object named value (the first two cases below) and those of type "select" where the value object points to an id object in a lower-level options array (the third case below).
{
"baseGroupId": {
"fields": [
{
"id": "1f53",
"name": "Location",
"type": "text",
"options": [],
"value": "Over the rainbow"
},
{
"id": "b547",
"name": "Description",
"type": "textarea",
"options": [],
"value": "A place of wonderful discovery"
},
{
"id": "c12f",
"name": "Assessment",
"type": "select",
"options": [
{
"id": "e5fd",
"name": "0"
},
{
"id": "e970",
"name": "1"
},
{
"id": "0ff4",
"name": "2"
},
{
"id": "2db3",
"name": "3"
},
{
"id": "241f",
"name": "4"
},
{
"id": "3f52",
"name": "5"
}
],
"value": "241f"
}
]
}
}
Those with a sharp eye will see that the value of the last value object "241f" can also be seen within the options array against one of the id objects. When nested like this I need to extract the value of the corresponding name, in this case "4".
The JSON-formatted information is in table customfield field textvalue. It's datatype is text but I'm coercing it to json. I was originally getting array set errors when trying to apply the criteria in a WHERE clause and then I read about using a LATERAL subquery instead. It now runs but returns all the options, not just the one matching the value.
I'm afraid I couldn't get an SQL Fiddle working to reproduce my results, but I would really appreciate an examination of my query to see if the problem can be spotted.
with cte_custombundledfields as
(
select
textvalue
, cfname
, json_array_elements(textvalue::json -> 'baseGroupId'->'fields') ->> 'name' as name
, json_array_elements(textvalue::json -> 'baseGroupId'->'fields') ->> 'value' as value
, json_array_elements(textvalue::json -> 'baseGroupId'->'fields') ->> 'type' as type
from
customfield
)
, cte_custombundledfieldsoptions as
(
select *
, json_array_elements(json_array_elements(textvalue::json -> 'baseGroupId'->'fields') -> 'options') ->> 'name' as value2
from
cte_custombundledfields x
, LATERAL json_array_elements(x.textvalue::json -> 'baseGroupId'->'fields') y
, LATERAL json_array_elements(y -> 'options') z
where
type = 'select'
and z ->> 'id' = x.value
)
select *
from
cte_custombundledfieldsoptions

I posted a much-simplified rewrite of this question which was answered by Bergi.
How do I query a string from JSON based on another string within the JSON in PostgreSQL?

Query to extract ids from a deeply nested json array object in Presto

I'm using Presto and trying to extract all 'id' from 'source'='dd' from a nested json structure as following.
{
"results": [
{
"docs": [
{
"id": "apple1",
"source": "dd"
},
{
"id": "apple2",
"source": "aa"
},
{
"id": "apple3",
"source": "dd"
}
],
"group": 99806
}
]
}
expected to extract the ids [apple1, apple3] into a column in Presto
I am wondering what is the right way to achieve this in Presto Query?

If your data has a regular structure as in the example you posted, you can use a combination of parsing the value as JSON, casting it to a structured SQL type (array/map/row) and the using array processing functions to filter, transform and extract the elements you want:
WITH data(value) AS (VALUES '{
"results": [
{
"docs": [
{
"id": "apple1",
"source": "dd"
},
{
"id": "apple2",
"source": "aa"
},
{
"id": "apple3",
"source": "dd"
}
],
"group": 99806
}
]
}'),
parsed(value) AS (
SELECT cast(json_parse(value) AS row(results array(row(docs array(row(id varchar, source varchar)), "group" bigint))))
FROM data
)
SELECT
transform( -- extract the id from the resulting docs
filter( -- filter docs with source = 'dd'
flatten( -- flatten all docs arrays into a single doc array
transform(value.results, r -> r.docs) -- extract the docs arrays from the result array
),
doc -> doc.source = 'dd'),
doc -> doc.id)
FROM parsed
The query above produces:
_col0
------------------
[apple1, apple3]
(1 row)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BigQuery JSON Array extraction - google-bigquery

Related

How to extract a value in a JSON table on BigQuery?

Update one key value in JSON using Presto

Bigquery update / insert in nested arrays and arrays of structs

Trying to construct PostgreSQL Query to extract from JSON a text value in an object, in an array, in an object, in an array, in an object

Query to extract ids from a deeply nested json array object in Presto

Categories

Resources