Transform JSON to to ARRAY<MAP> in Athena/Presto

Transform JSON to to ARRAY<MAP> in Athena/Presto - sql

I have a table available in Athena which has one column with JSON structured as following:
{
"455a9410-29a8-48a3-ad22-345afa3cd295":
{
"legacy_id": 1599677886,
"w_ids":
[
"845254682",
"831189092"
]
},
"5e74c911-0b63-4b84-8ad4-77dd9bed7b53":
{
"legacy_id": 1599707069,
"w_ids":
[
"1032024432"
]
},
"7b988890-20ff-4279-94df-198369a58848":
{
"legacy_id": 1601097861,
"w_ids":
[
"1032024432"
]
}
}
I would like to convert this into an ARRAY in the following format:
[
{"new_id"="455a9410-29a8-48a3-ad22-345afa3cd295","legacy_id"=1599677886,"w_ids"=["845254682","831189092"]},
{"new_id"="5e74c911-0b63-4b84-8ad4-77dd9bed7b53","legacy_id"=1599707069,"w_ids"=["1032024432"]},
{"new_id"="7b988890-20ff-4279-94df-198369a58848","legacy_id"=1601097861,"w_ids"=["1032024432"]}
]
I already was able to extract legacy_idand w_ids with the following statement but I struggle to add the original key as value:
with example_data as
(
select * from (
VALUES('{ "455a9410-29a8-48a3-ad22-345afa3cd295": { "legacy_id": 1599677886, "w_ids": [ "845254682", "831189092" ] }, "5e74c911-0b63-4b84-8ad4-77dd9bed7b53": { "legacy_id": 1599707069, "w_ids": [ "1032024432" ] }, "7b988890-20ff-4279-94df-198369a58848": { "legacy_id": 1601097861, "w_ids": [ "1032024432" ] }}')
) as t(col)
)
select *
,transform(map_values(cast(json_parse(col) AS map(varchar, json))),entry -> MAP_FROM_ENTRIES(ARRAY[('legacy_id',json_extract(entry,'$.legacy_id')),('w_ids',json_extract(entry,'$.w_ids'))]))
from example_data;

One approach can be using map_values over transform_values instead of transform over map_values:
select map_values(
transform_values(
cast(json_parse(col) AS map(varchar, json)),
(key, entry)->MAP_FROM_ENTRIES(
ARRAY [('new_id', cast(key as json)),
('legacy_id', json_extract(entry, '$.legacy_id')),
('w_ids', json_extract(entry, '$.w_ids')) ]
)
)
)
from example_data;
Output:
_col0
[{new_id='455a9410-29a8-48a3-ad22-345afa3cd295', legacy_id=1599677886, w_ids=['845254682','831189092']}, {new_id='5e74c911-0b63-4b84-8ad4-77dd9bed7b53', legacy_id=1599707069, w_ids=['1032024432']}, {new_id='7b988890-20ff-4279-94df-198369a58848', legacy_id=1601097861, w_ids=['1032024432']}]

Related

Query an array element in an JSONB Object

I have a jsonb column called data in a table called reports. Here is what report.id = 1 looks like
[
{
"Product": [
{
"productIDs": [
"ABC1",
"ABC2"
],
"groupID": "Food123"
},
{
"productIDs": [
"EFG1"
],
"groupID": "Electronic123"
}
],
"Package": [
{
"groupID": "Electronic123"
}
],
"type": "Produce"
},
{
"Product": [
{
"productIDs": [
"ABC1",
"ABC2"
],
"groupID": "Clothes123"
}
],
"Package": [
{
"groupID": "Food123"
}
],
"type": "Wearables"
}
]
and here is what report.id = 2 looks like:
[
{
"Product": [
{
"productIDs": [
"XYZ1",
"XYZ2"
],
"groupID": "Food123"
}
],
"Package": [],
"type": "Wearable"
},
{
"Product": [
{
"productIDs": [
"ABC1",
"ABC2"
],
"groupID": "Clothes123"
}
],
"Package": [
{
"groupID": "Food123"
}
],
"type": "Wearables"
}
]
I am trying to get a list of all entries in reports table where at least one of data column's element has following:
type = Produce AND
where any elements of Product array OR any elements of Product array's groupID start with Food
So from the example above this query will only return the first index since
The type = Produce
groupID starts with Food for first element of Product array
The second index will be filtered out because type is not Produce.
I am not sure how to query to do AND query for groupID. Here is what I have tried to get all entries for type Produce
select * from reports r, jsonb_to_recordset(r.data) as items(type text) where items.type like 'Produce';

Sample structure and result: dbfiddle
select r.*
from reports r
cross join jsonb_array_elements(r.data) l1
cross join jsonb_array_elements(l1.value -> 'Product') l2
where l1 ->> 'type' = 'Produce'
and l2.value ->> 'groupID' ~ '^Food';

How to unpack Array to Rows in Snowflake?

I have a table that looks like the following in Snowflake:
ID | CODES
2 | [ { "list": [ { "item": "CODE1" }, { "item": "CODE2" } ] } ]
And I want to make it into:
ID | CODES
2 | 'CODE1'
2 | 'CODE2'
So far I've tried
SELECT ID,CODES[0]:list
FROM MY_TABLE
But that only gets me as far as:
ID | CODES
2 | [ { "item": "CODE1" }, { "item": "CODE2" } ]
How can I break out every 'item' element from every index of this list into its own row with each CODE as a string?
Update: Here is the answer I got working at the same time as the answer below, looks like we both used FLATTEN:
SELECT ID,f.value:item
FROM MY_TABLE,
lateral flatten(input => MY_TABLE.CODES[0]:list) f

So as you note you have hard coded your access into the codes, via codes[0] which gives you the first item from that array, if you use FLATTEN you can access all of the objects of the first array.
WITH my_table(id,codes) AS (
SELECT 2, parse_json('[ { "list": [ { "item": "CODE1" }, { "item": "CODE2" } ] } ]')
)
SELECT ID, c.*
FROM my_table,
table(flatten(codes)) c;
gives:
2 1 [0] 0 { "list": [ { "item": "CODE1" }, { "item": "CODE2" }]} [ { "list": [{"item": "CODE1"}, { "item": "CODE2" }]}]
so now you want to loop across the items in list, so we use another FLATTEN on that:
WITH my_table(id,codes) AS (
SELECT 2, parse_json('[ { "list": [ { "item": "CODE1" }, { "item": "CODE2" } ] } ]')
)
SELECT ID, c.value, l.value
FROM my_table,
table(flatten(codes)) c,
table(flatten(c.value:list)) l;
gives:
2 {"list":[{"item": "CODE1"},{"item":"CODE2"}]} {"item":"CODE1"}
2 {"list":[{"item": "CODE1"},{"item":"CODE2"}]} {"item":"CODE2"}
so you can pull apart that l.value how you need to access the parts you need.

Postgres/JSON Updating nested array elements

Given the input JSON from the 'table' under a column named '_value'. I would like to replace the field "sc" as text from object to value of name under sc.
The json before updating looks like this.
{
"iProps": [
{
"value": {
"rules": [
{
"ao": {
"sc": {
"web_link": "abc.com",
"name": "name"
}
}
},
{
"ao": {
"sc": ""
}
}
]
}
}
]
}
The json after updating should look like this.
{
"iProps": [
{
"value": {
"rules": [
{
"ao": {
"sc": "name"
}
},
{
"ao": {
"sc": ""
}
}
]
}
}
]
}
I tried the below query to get to 'rules' array but having difficulty to proceed further in parsing and updating.
WITH values AS (
SELECT iprop -> 'value' -> 'rules' AS value FROM
table t,jsonb_array_elements(t._value->'iProps') AS
iprop )
SELECT *
from values, jsonb_array_elements(values.ao)
throws following error
ERROR: column values.ao does not exist
LINE 26: from values, jsonb_array_elements(values.ao)
^
SQL state: 42703
Character: 1396

You can try below mentioned query considering your structure is constant and the data type of your column is JSONB.
with cte as (
select
vals2->'ao'->'sc'->'name' as namevalue,
('{iProps,'||index1-1||',value,rules,'||index2-1||',ao,sc}')::text[] as json_path
from
table_,
jsonb_array_elements(value_->'iProps')
with ordinality arr1(vals1,index1),
jsonb_array_elements(vals1->'value'->'rules')
with ordinality arr2(vals2,index2)
)
update table_
set value_ = jsonb_set(value_,cte.json_path,cte.namevalue,false)
from cte
WHERE cte.namevalue IS NOT NULL
DEMO

BigQuery concat nested array json

I have data that looks like
{
"Attributes": [
{
"values": [
{
"value": "20003"
},
{
"value": "30075"
},
{
"value": "40060"
}
],
"name": "price"
}
],
"attr2" : "val"
}
The output I want is concat all the values in the nested json array
price, "20003, 30075, 40060"
I tried some queries but failed to get the correct output.

You can use JSON_EXTRACT_ARRAY and ARRAY_TO_STRING:
WITH test_json AS (
SELECT
'''{
"Attributes": [
{
"values": [
{
"value": "20003"
},
{
"value": "30075"
},
{
"value": "40060"
}
],
"name": "price"
}
],
"attr2" : "val"
}''' AS json_string
),
values_concatenated AS (
SELECT ARRAY_TO_STRING(
ARRAY(
SELECT JSON_VALUE(json_values, '$.value')
FROM UNNEST((SELECT JSON_EXTRACT_ARRAY(json_string, '$.Attributes[0].values') AS json_values FROM test_json)) as json_values
),
', '
) as values
)
SELECT
(select json_value(json_string, '$.Attributes[0].name') from test_json),
(select values from values_concatenated)

Invalid column name on stream analytics, when column exists

Given this dataset:
[
{
"dataChannelId": 8516,
"measures": [
{
"dateTime": "2019-01-01T12:00:00",
},
{
"dateTime": "2019-01-02T12:00:00",
}
}]
And this query:
WITH
temp AS
(
SELECT
dataChannelId,
arrayElement.ArrayValue as element
FROM GriegInputStream
CROSS APPLY GetArrayElements([mesurasdfes]) AS arrayElement
)
SELECT
temp.dataChannelId as sensorId, temp.element.dateTime, temp.element.value,temp.element.unit,temp.element.maxValue, temp.element.minValue
INTO
Sensoroutput
FROM
temp
I get invalid column name, does not exist on dataChannelId, but measures seem to work fine. How can I access this value without stream analytics complaining?

Your sample json data lack a square bracket ].
sample data:
[
{
"dataChannelId": 8516,
"measures": [
{
"dateTime": "2019-01-01T12:00:00",
},
{
"dateTime": "2019-01-02T12:00:00",
}
]
}
]
Query sql:
WITH
temp AS
(
SELECT
jsoninput.dataChannelId,
arrayElement.ArrayValue as element
FROM jsoninput
CROSS APPLY GetArrayElements(jsoninput.measures) AS arrayElement
)
SELECT
temp.dataChannelId as sensorId, temp.element.dateTime
INTO
Sensoroutput
FROM
temp
Output:

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Transform JSON to to ARRAY<MAP> in Athena/Presto - sql

Related

Query an array element in an JSONB Object

How to unpack Array to Rows in Snowflake?

Postgres/JSON Updating nested array elements

BigQuery concat nested array json

Invalid column name on stream analytics, when column exists

Categories

Resources