How to PARSE_JSON from Snowflake Field? - sql

I have a table that looks like:
ID|FIELD1
1|[ { "list": [ {} ] } ]
2|[ { "list": [ { "item": "" } ] } ]
3|[ { "list": [ { "item": "Tag1" }, { "item": "Tag2" } ] } ]
And I want to get all the tags associated to this specific query such that I can just get a list:
Tag1,Tag2
I've tried
SELECT PARSE_JSON(FIELD1[0]['list'][0]['item']) FROM MY_TABLE
WHERE PARSE_JSON(FIELD1[0]['list'][0]) != '{}'
But I get
JSON: garbage in the numeric literal: 65-310 , pos 7
How can I properly unpack these values in SQL?
UPDATE: Clumsy Solution
SELECT LISTAGG(CODES,'\',\'') AS PROMO_CODES
FROM
(SELECT DISTINCT FIELD1[0]['list'][0]['item'] AS CODES FROM MY_TABLE
WHERE FIELD1[0]['list'][0] IS NOT NULL
AND FIELD1[0]['list'][0] != '{}'
AND FIELD1[0]['list'][0]['item'] != ''
)

Please have a look into below knowledge article, if this helps in your case:
https://community.snowflake.com/s/article/Dynamically-extracting-JSON-using-LATERAL-FLATTEN

As I see, the Clumsy Solution does not provide the correct result. It shows only Tag1. So here's my solution:
select LISTAGG( v.VALUE:item, ',' ) from MY_TABLE,
lateral flatten (parse_json(FIELD1[0]):list) v
WHERE v.VALUE:item <> '';
I would recommend to add DISTINCT to prevent duplicate tags in the output:
select LISTAGG( DISTINCT v.VALUE:item, ',' ) from MY_TABLE,
lateral flatten (parse_json(FIELD1[0]):list) v
WHERE v.VALUE:item <> '';
If there are more items in the FIELD1 array (ie 0,1,2), you may use this one:
select LISTAGG( DISTINCT v.VALUE:item, ',' ) from MY_TABLE,
lateral flatten(FIELD1) f,
lateral flatten (parse_json(f.VALUE):list) v
WHERE v.VALUE:item <> '';

Related

Parsing nested JSON fields in Snowflake

I am pretty new to Snowflake and I am now trying to parse a JSON field and pull its attributes to return in the response.
I tried a few variations but every time, the attribute is populating as null.
attributes column in my table has this JSON:
{
"Status": [
"ACTIVE"
],
"Coverence": [
{
"Sub": [
{
"EndDate": [
"2020-06-22"
],
"Source": [
"Test"
],
"Id": [
"CovId1"
],
"Type": [
"CovType1"
],
"StartDate": [
"2019-06-22"
],
"Status": [
"ACTIVE"
]
}
]
}
]
}
What I tried:
SELECT DISTINCT *
from
(
TRIM(mt."attributes":Status, '[""]')::string as STATUS,
TRIM(r.value:"Sub"."Id", '[""]')::string as ID,
TRIM(r.value:"Sub"."Source", '[""]')::string as SOURCE
from "myTable" mt,
lateral flatten ( input => mt."attributes":"Coverence", outer => true) r
)
GROUP BY
STATUS,
ID,
SOURCE;
Later I tried:
SELECT DISTINCT *
from
(
TRIM(mt."attributes":Status, '[""]')::string as STATUS,
TRIM(r.value:"Id", '[""]')::string as ID,
TRIM(r.value:"Source", '[""]')::string as SOURCE
from "myTable" mt,
lateral flatten ( input => mt."attributes":"Coverence":"Sub", outer => true) r
)
GROUP BY
STATUS,
ID,
SOURCE;
But nothing worked. The STATUS is populating as expected. But ID and SOURCE are populating null.
Am I missing something or have I done something dumb? Please shed some light.
Assuming that Coverence could contain multiple Sub, therefore FLATTEN twice. At lowest level only first element is chosen (EndDate[0], Source[0] etc):
SELECT
mt."attributes":Status[0]::TEXT AS Status
,r2.value:EndDate[0]::TEXT AS EndDate
,r2.value:Source[0]::TEXT AS Source
,r2.value:Id[0]::TEXT AS Id
FROM myTable AS mt,
LATERAL FLATTEN(input => mt."attributes",
path => 'Coverence',
outer => true) r1,
LATERAL FLATTEN(input => r1.value,
path => 'Sub',
outer => true) r2;
Output:
All your elements are array type and the overall JSON is not making much sense... so to access all your individual elements, you have to use [] notation and then you can access the element values. You don't need to use flatten also, if you just have to access individual elements via index.

how can I make result of ARRAY_AGG() function json parsable

I have a query that selects the rows from joined table as an array using ARRAY_AGG() function.
select
entity_number,
ARRAY_AGG('{"property_id":"'||property_id||'","value":"'||value||'"}') entity_properties from entities
join entity_properties
on entities.id = entity_properties.entity_id
where entities.id in (
select entity_id from entity_properties
where value = '6258006d824a25dabdb39a79.pdf'
)
group by entities.id;
what I get is:
[
{
"entity_number":"P1718238009-1",
"entity_properties":"[
\"{\"property_id\":\"006109cd-a100-437c-a683-f13413b448e6\",\"value\":\"Rozilik berildi\"}\",
\"{\"property_id\":\"010f5e23-d66f-4414-b54b-9647afc6762b\",\"value\":\"6258006d824a25dabdb39a79.pdf\"}\",
\"{\"property_id\":\"0a01904e-1ca0-40ef-bbe1-c90eaddea3fc\",\"value\":\"6260c9e9b06e4c2cc492c470_2634467.pdf\"}\"
]"
}
]
As you can see, it is not json parsable
To parse entity_properties as array of objects I need the data in this format
[
{
"entity_number":"P1718238009-1",
"entity_properties":[
{"property_id":"006109cd-a100-437c-a683-f13413b448e6","value":"Rozilik berildi"},
{"property_id":"010f5e23-d66f-4414-b54b-9647afc6762b","value":"6258006d824a25dabdb39a79.pdf"},
{"property_id":"0a01904e-1ca0-40ef-bbe1-c90eaddea3fc","value":"6260c9e9b06e4c2cc492c470_2634467.pdf"}
]
}
]
Can I achieve what I want with ARRAY_AGG()? How?
If not, what approach should I take?
Try using json_agg and json_build_object function
like this:
select
entity_number,
json_agg(json_build_object('property_id', property_id, 'value', value)) entity_properties from entities
join entity_properties
on entities.id = entity_properties.entity_id
where entities.id in (
select entity_id from entity_properties
where value = '6258006d824a25dabdb39a79.pdf'
)
group by entities.id;
Using a simplified sample data this query provides the first step of the aggregation
with tab as (
select * from (values
(1,'a','x'),
(1,'b','y'),
(2,'c','z')
) tab(entity_number,property_id,value)
)
select
entity_number,
json_agg( json_build_object('property_id', property_id, 'value', value)) entity_properties
from tab
group by 1
;
entity_number|entity_properties |
-------------+----------------------------------------------------------------------------+
1|[{"property_id" : "a", "value" : "x"}, {"property_id" : "b", "value" : "y"}]|
2|[{"property_id" : "c", "value" : "z"}]
Additional aggregation returns the final json array
with tab as (
select * from (values
(1,'a','x'),
(1,'b','y'),
(2,'c','z')
) tab(entity_number,property_id,value)
),
tab2 as (
select
entity_number,
json_agg( json_build_object('property_id', property_id, 'value', value)) entity_properties
from tab
group by 1
)
select
json_agg(
json_build_object(
'entity_number',
entity_number,
'entity_properties',
entity_properties
)
)
from tab2
[
{
"entity_number": 1,
"entity_properties": [
{
"value": "x",
"property_id": "a"
},
{
"value": "y",
"property_id": "b"
}
]
},
{
"entity_number": 2,
"entity_properties": [
{
"value": "z",
"property_id": "c"
}
]
}
]
Note that I used jsonb_pretty to format the output.

SQL Generate JSON string using FOR JSON PATH with 2 ROOTs

I have a very simple table containing 5 columns and the table will only hold 1 record at a time. I'm to generate a JSON string from the record and send it to an endpoint.
This is how the JSON string are to be formatted. As you can see it contains 2 'roots' and this is giving me a hard time getting the correct format
{
"fields": [
{
"fieldName": "Brand",
"values": [
"FORD"
]
},
{
"fieldName": "Engine",
"values": [
"V12"
]
},
{
"fieldName": "Location",
"values": [
"Monaco"
]
}
],
"categories": [
{
"fieldName": "Colour",
"values": [
[
{
"name": "Blue"
}
]
]
},
{
"fieldName": "Interior",
"values": [
[
{
"name": "Red"
}
]
]
}
]
}
This is my table containing the 5 columns
I have managed to create 2 separate SQL queries to get the JSON string. But I can't figure out how do it in one select.
SELECT (
SELECT X.* FROM (
SELECT CASE WHEN CarName IS NOT NULL THEN 'Brand' ELSE NULL END AS fieldName,
CarName AS [value]
FROM [dbo].[JSONBODY]
UNION
SELECT CASE WHEN Engine IS NOT NULL THEN 'Engine' ELSE NULL END AS fieldName,
Engine AS [value]
FROM [dbo].[JSONBODY]
UNION
SELECT CASE WHEN [location] IS NOT NULL THEN 'Location' ELSE NULL END AS fieldName,
[Location] AS [value]
FROM [dbo].[JSONBODY] ) X
FOR JSON PATH, ROOT('fields'))
SELECT (
SELECT Y.* FROM (
SELECT CASE WHEN Colour IS NOT NULL THEN 'Colour' ELSE NULL END AS fieldName,
JSON_QUERY('[["' + Colour + '"]]') AS 'value.name'
FROM [dbo].[JSONBODY]
UNION
SELECT CASE WHEN Interior IS NOT NULL THEN 'Interior' ELSE NULL END AS fieldName,
JSON_QUERY('[["' + Interior + '"]]') AS 'value.name'
FROM [dbo].[JSONBODY]) Y
FOR JSON PATH, ROOT('categories'))
And here are the 2 JSON strings:
{"fields":[{"fieldName":"Brand","value":"Ford"},{"fieldName":"Engine","value":"V6"},{"fieldName":"Location","value":"Boston"}]}
{"categories":[{"fieldName":"Colour","value":{"name":[["Blue"]]}},{"fieldName":"Interior","value":{"name":[["Black"]]}}]}
Question 1:
Is it possible to create the JSON string through a single SQL Select? And how can I do it?
Question 2:
If a column value is NULL it is excluded automatically from the JSON string. But I had to add the fieldName to the select and had hoped it would have exclude it from the JSON string if the corresponding field was NULL. However it creates a {}, in the JSON string. And this is not accepted when calling the endpoint. So is there another way to do it when a column value is NULL? I can of course delete it from the JSON string afterwards....
Hope the above makes sense
To do it as a single SELECT you can just UNION ALL the two results together
You can unpivot the values, then check them afterwards for nulls.
Unfortunately, SQL Server does not have JSON_AGG, so you have to bodge it with STRING_AGG and STRING_ESCAPE
SELECT
v.fieldName,
value = JSON_QUERY('[' + STRING_AGG('"' + STRING_ESCAPE(v.value, 'json') + '"', ',') + ']')
FROM [dbo].[JSONBODY] jb
CROSS APPLY (VALUES
('Brand', jb.Brand),
('Engine', jb.Engine),
('Location', jb.Location)
) v(fieldName, value)
GROUP BY
v.fieldName
FOR JSON PATH, ROOT('fields');
UNION ALL
SELECT
v.fieldName,
[value.name] = JSON_QUERY('[[' + STRING_AGG('"' + STRING_ESCAPE(v.value, 'json') + '"', ',') + ']]')
FROM [dbo].[JSONBODY] jb
CROSS APPLY (VALUES
('Colour', jb.Colour),
('Interior', jb.Interior)
) v(fieldName, value)
GROUP BY
v.fieldName
FOR JSON PATH, ROOT('categories');
If you know you will only ever have one row, you can simplify it by removing the GROUP BY
SELECT (
SELECT
v.fieldName,
value = JSON_QUERY('["' + STRING_ESCAPE(v.value, 'json') + '"]')
FROM [dbo].[JSONBODY] jb
CROSS APPLY (VALUES
('Brand', jb.Brand),
('Engine', jb.Engine),
('Location', jb.Location)
) v(fieldName, value)
WHERE v.value IS NOT NULL
FOR JSON PATH, ROOT('fields')
)
UNION ALL
SELECT (
SELECT
v.fieldName,
[value.name] = JSON_QUERY('[["' + STRING_ESCAPE(v.value, 'json') + '"]]')
FROM [dbo].[JSONBODY] jb
CROSS APPLY (VALUES
('Colour', jb.Colour),
('Interior', jb.Interior)
) v(fieldName, value)
WHERE v.value IS NOT NULL
FOR JSON PATH, ROOT('categories')
);
db<>fiddle

In BigQuery, how do I check if two ARRAY of STRUCTs are equal

I have a query that outputs two array of structs:
SELECT modelId, oldClassCounts, newClassCounts
FROM `xyz`
GROUP BY 1
How do I create another column that is TRUE if oldClassCounts = newClassCounts?
Here is a sample result in JSON:
[
{
"modelId": "FBF21609-65F8-4076-9B22-D6E277F1B36A",
"oldClassCounts": [
{
"id": "A041EBB1-E041-4944-B231-48BC4CCE025B",
"count": "33"
},
{
"id": "B8E4812B-A323-47DD-A6ED-9DF877F501CA",
"count": "82"
}
],
"newClassCounts": [
{
"id": "A041EBB1-E041-4944-B231-48BC4CCE025B",
"count": "33"
},
{
"id": "B8E4812B-A323-47DD-A6ED-9DF877F501CA",
"count": "82"
}
]
}
]
I want the equality column to be TRUE if oldClassCounts and newClassCounts are exactly the same like the output above.
Anything else should be false.
I would go about with this solution
#standardSQL
WITH xyz AS (
SELECT "FBF21609-65F8-4076-9B22-D6E277F1B36A" AS modelId,
[STRUCT("A041EBB1-E041-4944-B231-48BC4CCE025B" as id, "33" as count),
STRUCT("B8E4812B-A323-47DD-A6ED-9DF877F501CA" as id, "82" as count)] AS oldClassCounts,
[STRUCT("A041EBB1-E041-4944-B231-48BC4CCE025B" as id, "33" as count),
STRUCT("B8E4812B-A323-47DD-A6ED-9DF877F501CA" as id, "82" as count)] as newClassCounts),
o as (SELECT modelId, id, count, array_length(oldClassCounts) as len FROM xyz, UNNEST(oldClassCounts) as old_c),
n as (SELECT modelId, id, count, array_length(newClassCounts) as len FROM xyz, UNNEST(newClassCounts) as new_c),
uneq as (select * from o except distinct select * from n)
select xyz.*, IF(uneq.modelId is not null, false, true) as equal from xyz left join (select distinct modelId from uneq) uneq on xyz.modelId = uneq.modelId
It works regardless of the order or having duplicates within the arrays. The idea is that we treat each of the arrays as a separate temporary table removing all elements that exist in one but not the other (using except distinct) and then have an extra check for the length of the arrays in case there are duplicates e.g.
"FBF21609-65F8-4076-9B22-D6E277F1B36A" AS modelId,
[STRUCT("A041EBB1-E041-4944-B231-48BC4CCE025B" as id, "33" as count),
STRUCT("B8E4812B-A323-47DD-A6ED-9DF877F501CA" as id, "82" as count),
STRUCT("B8E4812B-A323-47DD-A6ED-9DF877F501CA" as id, "82" as count)]
I would consider comparing the result of TO_JSON_STRING function applied on both of these arrays.
In the query it would be done in the following way:
SELECT modelId,
oldClassCounts,
newClassCounts,
CASE WHEN TO_JSON_STRING(oldClassCounts) = TO_JSON_STRING(newClassCounts)
THEN true
ELSE false
END
FROM `xyz`;
I'm not sure about GROUP BY 1 part, because non of the fields are grouped or aggregated.
It is not going to work, if the order of elements in the array is going to be different. This solution is not perfect, but worked for the data you provided.

Postgres: How to alter jsonb value type for each element in an array?

I have a jsonb column named items in Postgres 10.12 like this:
{
"items": [
{
"itemQty": 2,
"itemName": "snake"
},
{
"itemQty": 1,
"itemName": "x kodiyum"
}
]
}
Now I want to convert itemQty type to string for every array element so that the new values are like this:
{
"items": [
{
"itemQty": "2",
"itemName": "snake"
},
{
"itemQty": "1",
"itemName": "x kodiyum"
}
]
}
How do I do this? I have gone through the documentation for Postgres jsonb and couldn't figure out.
On the server-side, I am using Spring boot and Hibernate with com.vladmihalcea.hibernate.type.json (Hibernate Types 52) if it helps.
Thanks
You could unnest the array, modify the elements, and then rebuild it. Assuming that the primary key of your table is id, that would be:
select jsonb_build_object(
'items', jsonb_agg(
jsonb_build_object(
'itemQty', (x.obj ->> 'itemQty')::text,
'itemName', x.obj ->> 'Name'
)
)
)new_items
from mytable t
cross join lateral jsonb_array_elements(t.items -> 'items') as x(obj)
group by id
Note that the explicit cast to ::text is not really needed here, as ->> extract text values anyway: I kept it because it makes the intent clearer.
If you want an update statement:
update mytable t
set items = (
select jsonb_build_object(
'items', jsonb_agg(
jsonb_build_object(
'itemQty', (x.obj ->> 'itemQty')::text,
'itemName', x.obj ->> 'Name'
)
)
)
from jsonb_array_elements(t.items -> 'items') as x(obj)
)
Demo on DB Fiddle