How to parse this JSON file in Snowflake? - sql

So I have a column in a Snowflake table that stores JSON data but the column is of a varchar data type.
The JSON looks like this:
{
"FLAGS": [],
"BANNERS": {},
"TOOLS": {
"game.appConfig": {
"type": [
"small",
"normal",
"huge"
],
"flow": [
"control",
"noncontrol"
]
}
},
"PLATFORM": {}
}
I want to filter only the data inside TOOLS and want to get the following result:
TOOLS_ID
TOOLS
game.appConfig
type
game.appConfig
flow
How can I achieve this?

I assumed that the TOOLs can have more than one tool ID, so I wrote this query:
with mydata as ( select
'{
"FLAGS": [],
"BANNERS": {},
"TOOLS": {
"game.appConfig": {
"type": [
"small",
"normal",
"huge"
],
"flow": [
"control",
"noncontrol"
]
}
},
"PLATFORM": {}
}' as v1 )
select main.KEY TOOLS_ID, sub.KEY TOOLS
from mydata,
lateral flatten ( parse_json(v1):"TOOLS" ) main,
lateral flatten ( main.VALUE ) sub;
+----------------+-------+
| TOOLS_ID | TOOLS |
+----------------+-------+
| game.appConfig | flow |
| game.appConfig | type |
+----------------+-------+

Assuming the column name is C1 and table name T1:
select a.t:"TOOLS":"game.appConfig"::string from (select
parse_json(to_variant(C1))t from T1) a

Related

Issues creating Json object on snowflake using Sql

I am new to Snowflake and I am trying to create a new table from an existing table and converting some rows into json format.
what I have on snowflake (table name: lex)
area
source
type
date
One
modak
good
2021
what I want to achieve on snowflake
area
sources
One
[{"source": "modak","period": {"type": "good","date": "2021"} }]
Any direction on how to go about it using SQl will be appreciated.
You can use object_construct to produce JSON objects:
select
area, array_construct( object_construct( 'source', source, 'period', object_construct( 'type',type, 'date',date )) ) sources
from
values ('One','modak','good','2021') tmp (area, source,type,date);
+------+-------------------------------------------------------------------------+
| AREA | SOURCES |
+------+-------------------------------------------------------------------------+
| One | [ { "period": { "date": "2021", "type": "good" }, "source": "modak" } ] |
+------+-------------------------------------------------------------------------+
I also used array_construct to add the brackets ([])
OBJECT_CONSTRUCT:
https://docs.snowflake.com/en/sql-reference/functions/object_construct.html
ARRAY_CONSTRUCT:
https://docs.snowflake.com/en/sql-reference/functions/array_construct.html
PS: Order of the JSON elements are not important when accessing them:
select parse_json(j):source, parse_json(j):period.type
from
values
('{ "period": { "date": "2021", "type": "good" }, "source": "modak" }'),
('{"source": "modak","period": {"type": "good","date": "2021"} }') tmp(j);
+----------------------+---------------------------+
| PARSE_JSON(J):SOURCE | PARSE_JSON(J):PERIOD.TYPE |
+----------------------+---------------------------+
| "modak" | "good" |
| "modak" | "good" |
+----------------------+---------------------------+
So the complete answer is:
WITH lex AS (
SELECT * FROM VALUES
('One', 'modak', 'good', 2021)
v(area, source, type, date)
)
SELECT l.area,
ARRAY_AGG(OBJECT_CONSTRUCT('source', l.source, 'period', OBJECT_CONSTRUCT('type', l.type, 'date', l.date))) as sources
FROM lex AS l
GROUP BY 1
ORDER BY 1;
which gives:
AREA
SOURCES
One
[ { "period": { "date": 2021, "type": "good" }, "source": "modak" } ]
Gokhan's answer show how to build a static array with ARRAY_CONSTRUCT, but if you have a dynamic input like:
WITH lex AS (
SELECT * FROM VALUES
('One', 'modak', 'good', 2021),
('One', 'modak', 'bad', 2022),
('Two', 'modak', 'good', 2021)
v(area, source, type, date)
)
And you want, the below, you will need to use ARRAY_AGG
AREA
SOURCES
One
[ { "period": { "date": 2021, "type": "good" }, "source": "modak" }, { "period": { "date": 2022, "type": "bad" }, "source": "modak" } ]
Two
[ { "period": { "date": 2021, "type": "good" }, "source": "modak" } ]

BigQuery : best use of UNNEST Arrays

I really need some help, I have a big file JSON that I ingested into BigQuery, I want to write a query that uses UNNEST twice, namely I have this like :
{
"categories": [
{
"id": 1,
"name" : "C0",
"properties": [
{
"name": "Property_1",
"value": {
"type": "String",
"value": "11111"
}
},
{
"name": "Property_2",
"value": {
"type": "String",
"value": "22222"
}
}
]}
]}
And I want to do a query that give's me something like this result
---------------------------------------------------------------------
| Category_ID | Name_ID | Property_1 | Property_2 |
------------------------------------------------------------------
| 1 | C0 | 11111 | 22222 |
---------------------------------------------------------------------
I already made something like but it's not working :
SELECT
c.id as Category_ID,
c.name as Name_ID,
p.value.value as p.name
From `DataBase-xxxxxx` CROSS JOIN
UNNEST(categories) AS c,
UNNEST(c.properties) AS p;
Thank you more 🙏

List comprehension equivalent in Kusto

I recently started to use the Azure Resource Graph Explorer to obtain resource information. KQL is a new thing I'm figuring out along the way, and one of the problems I need help with is a means to manipulate an array of dictionaries into just an array of string values.
As an example:
Consider the following data
{
"customerId": "201",
"orders": [
{
"dept": "/packaging/fruits"
},
{
"dept": "/packaging/vegetables"
}
]
}
With the following query:
Customers
| where customerId == 201
| project customerId, orders
The result would be as follows:
My question is, how can I modify the query to produce the following result:
Tried to go through the KQL documentation, but can't seem to find the correct method to achieve the above. Any help would be much appreciated!
in Kusto, you could use mv-apply:
datatable(customerId:int, orders:dynamic)
[
201, dynamic([
{
"dept": "/packaging/fruits"
},
{
"dept": "/packaging/vegetables"
}
]),
201,
dynamic([
{
"dept": "/packaging2/fruits2"
},
{
"dept": "/packaging2/vegetables2"
}
])
]
| where customerId == 201
| mv-apply orders on (
summarize orders = make_list(orders.dept)
)
customerId
orders
201
[ "/packaging/fruits", "/packaging/vegetables"]
201
[ "/packaging2/fruits2", "/packaging2/vegetables2"]
In ARG, mv-apply isn't supported, so you can use mv-expand:
datatable(customerId:int, orders:dynamic)
[
201, dynamic([
{
"dept": "/packaging/fruits"
},
{
"dept": "/packaging/vegetables"
}
]),
201,
dynamic([
{
"dept": "/packaging2/fruits2"
},
{
"dept": "/packaging2/vegetables2"
}
])
]
| where customerId == 201
| extend rn = rand()
| mv-expand orders
| summarize orders = make_list(orders.dept) by rn, customerId
| project-away rn
customerId
orders
201
[ "/packaging/fruits", "/packaging/vegetables"]
201
[ "/packaging2/fruits2", "/packaging2/vegetables2"]

How to unpack Array to Rows in Snowflake?

I have a table that looks like the following in Snowflake:
ID | CODES
2 | [ { "list": [ { "item": "CODE1" }, { "item": "CODE2" } ] } ]
And I want to make it into:
ID | CODES
2 | 'CODE1'
2 | 'CODE2'
So far I've tried
SELECT ID,CODES[0]:list
FROM MY_TABLE
But that only gets me as far as:
ID | CODES
2 | [ { "item": "CODE1" }, { "item": "CODE2" } ]
How can I break out every 'item' element from every index of this list into its own row with each CODE as a string?
Update: Here is the answer I got working at the same time as the answer below, looks like we both used FLATTEN:
SELECT ID,f.value:item
FROM MY_TABLE,
lateral flatten(input => MY_TABLE.CODES[0]:list) f
So as you note you have hard coded your access into the codes, via codes[0] which gives you the first item from that array, if you use FLATTEN you can access all of the objects of the first array.
WITH my_table(id,codes) AS (
SELECT 2, parse_json('[ { "list": [ { "item": "CODE1" }, { "item": "CODE2" } ] } ]')
)
SELECT ID, c.*
FROM my_table,
table(flatten(codes)) c;
gives:
2 1 [0] 0 { "list": [ { "item": "CODE1" }, { "item": "CODE2" }]} [ { "list": [{"item": "CODE1"}, { "item": "CODE2" }]}]
so now you want to loop across the items in list, so we use another FLATTEN on that:
WITH my_table(id,codes) AS (
SELECT 2, parse_json('[ { "list": [ { "item": "CODE1" }, { "item": "CODE2" } ] } ]')
)
SELECT ID, c.value, l.value
FROM my_table,
table(flatten(codes)) c,
table(flatten(c.value:list)) l;
gives:
2 {"list":[{"item": "CODE1"},{"item":"CODE2"}]} {"item":"CODE1"}
2 {"list":[{"item": "CODE1"},{"item":"CODE2"}]} {"item":"CODE2"}
so you can pull apart that l.value how you need to access the parts you need.

Convert table in SQL server into JSON string for migration into DocumentDB

I have a table called DimCompany in SQL Server like so:
+----+---------+--------+
| id | Company | Budget |
+----+---------+--------+
| 1 | abc | 111 |
| 2 | def | 444 |
+----+---------+--------+
I would like to convert this table into a json file like so:
{
"DimCompany":{
"id":1,
"companydetails": [{
"columnid": "1",
"columnfieldname": "Company",
"columnfieldvalue: "abc"
}
{
"columnid": "2",
"columnfieldname": "Budget",
"columnfieldvalue: "111"
}]
}
},
{
"DimCompany":{
"id":2,
"companydetails": [{
"columnid": "1",
"columnfieldname": "Company",
"columnfieldvalue: "def"
}
{
"columnid": "2",
"columnfieldname": "Budget",
"columnfieldvalue: "444"
}]
}
}
where columnid is a value from sys.columns against the column field name. I've tried doing this by unpivoting the table and joining sys.columns on fieldname where sys.objects.name=DimCompany and putting this in a view, then querying on the view to get json output for migration into DocumentDB.
However I would like to not use unpivot and just directly form a query to get desired output.
I'm just curious whether this is possible in SQL server or in any other tool.
Without using UNPIVOT and doing it yourself, the following SQL:
if object_id(N'dbo.DimCompany') is not null drop table dbo.DimCompany;
create table dbo.DimCompany (
id int not null identity(1,1),
Company nvarchar(50) not null,
Budget float not null
);
insert dbo.DimCompany (Company, Budget) values
('abc', 111),
('def', 444);
go
select id as 'DimCompany.id',
(
select columnid=cast(sc.column_id as nvarchar), columnfieldname, columnfieldvalue
from (
select N'Company', Company from dbo.DimCompany DC2 where DC2.id = DC1.id
union
select N'Budget', cast(Budget as nvarchar) from dbo.DimCompany DC2 where DC2.id = DC1.id
) keyValues (columnfieldname, columnfieldvalue)
join sys.columns sc on sc.object_id=object_id(N'dbo.DimCompany') and sc.name=columnfieldname
for json path
) as 'DimCompany.companydetails'
from dbo.DimCompany DC1
for json path, without_array_wrapper;
Produces the following JSON as per your example:
{
"DimCompany": {
"id": 1,
"companydetails": [
{
"columnid": "2",
"columnfieldname": "Company",
"columnfieldvalue": "abc"
},
{
"columnid": "3",
"columnfieldname": "Budget",
"columnfieldvalue": "111"
}
]
}
},
{
"DimCompany": {
"id": 2,
"companydetails": [
{
"columnid": "2",
"columnfieldname": "Company",
"columnfieldvalue": "def"
},
{
"columnid": "3",
"columnfieldname": "Budget",
"columnfieldvalue": "444"
}
]
}
}
Things to note:
The sys.columns columnid values start at 1 for the dbo.DimCompany.id column. Subtract 1 before casting if that's a requirement.
Using without_array_wrapper removes the surrounding [] characters, per your example, but isn't really valid JSON as a result.
I doubt this would be scalable for tables with large numbers of columns.