Extracting Array elements in Presto w/o using unnest function - sql

I have a requirement around this data where I need to extract array elements but I still want to keep them grouped, which means I can not use unnest function. Below is the sample data:
[
{ "emp_id": 8291828, "name": "bruce", },
{ "emp_id": 8291823, "name": "Rolli" }
]
My data is in the same format as above,i.e. (array(row(emp_id varchar, name varchar))) what I need is to get rid of the array, so that data look like
{ "emp_id": 8291828, "name": "bruce", },
{ "emp_id": 8291823, "name": "Rolli" }
Would appreciate if anyone can help me on this.

You could use element_at If you have a sequence table (1,2,3, ..).
with numbers as
(
select * from
(
Values
(1),(2),(3)
) as x(i)
)
,emp as
(
select *
from (
values
(ARRAY[cast(ROW(8291828,'bruce') as row(emp_id bigint, name varchar)), cast(row(8291823,'Rolli') as row(emp_id bigint, name varchar))])
) as emp (records)
)
select
element_at(emp.records,i) record
from numbers n
cross join emp
where n.i <= cardinality(emp.records);

Related

In Snowflake, how do I parse an unnamed JSON array and access each key with key value rather than array slicing methods?

I have a JSON array in Snowflake, where I need to parse into a table, i.e., convert all information into a row. Assume my table includes two columns: id and json_tag column.
An example row is as below:
{ [
json_tag
[
{
"key": "app.name",
"value": "myapp1"
},
{
"key": "device.name",
"value": "myiPhone11"
},
{
"key": "iOS.dist",
"value": "latestDist5"
}
]
This is not a standard example, another example can have 5 or even 6 with new key names. An example is below:
{ [
json_tag
[
{
"key": "app.name",
"value": "myapp2"
},
{
"key": "app.cost",
"value": "$2.99"
},
{
"key": "device.name",
"value": "myiPhoneX"
},
{
"key": "device.color",
"value": "gold"
},
{
"key": "iOS.dist",
"value": "latestDist4.9"
}
]
What I want is working on a table (without creating a new one) where the json row is split into columns as below:
id app.name app.cost device.name device.color iOS.dist
1 myapp1 null myiPhone11 null latestDist5
2 myapp2 $2.99 myiPhoneX gold latestDist4.9
I tried the following snippet:
with parsed_tb as (
select id,
to_variant(parse_json(json_tag)) as parsed_json_tag
from mytable
)
select parsed_json_tag[0]:value::varchar as app_name,
parsed_json_tag[1]:value::varchar as app_cost,
parsed_json_tag[2]:value::varchar as device_name
from tb;
As you can imagine, the snippet above does not work when there is no app.cost in key values or every row differs in number of keys and values.
I tried lateral flatten command in Snowflake, but out creates many rows and I cannot figure out how to put them in columns in the same row. I also tried using recursive command, and could not achieve it.
So my question is:
How can I access a key by its name rather than slicing an array as I do above? - this would solve my problem I guess.
If #1 solution I imagine does not fix, how can I attain the table above?
I have found a solution to this problem with the help of following thread:
Mysql, reshape data from long / tall to wide
Solution is first parsing json column with unique identifier, and then lateral flattening and then grouping the results by unique identifier [formatting table from long to wide as given in the link above].
So for the examples I provided above, the solution would be as follows:
with tb as (
select id,
to_variant(parse_json(json_tags)) as parsed_json_tags
from mytable
),
flattened as (
select id,
VALUE:value::string as value,
VALUE:key::string as key
from mytable, lateral flatten(input => (tb.parsed_json_tags))
)
select id,
MAX( IFF( key='app.name', value, NULL ) ) AS app_name,
MAX( IFF( key='app.cost', value, NULL ) ) AS app_cost,
MAX( IFF( key='device.name', value, NULL ) ) AS device_name,
MAX( IFF( key='device.color', value, NULL ) ) AS device_color,
MAX( IFF( key='iOS.dist', value, NULL ) ) AS ios_dist
from flattened
group by 1;

Add data type to JSON values query

I am building the following view on SQL Server. The data is extracted from Soap API via Data Factory and stored in SQL table. I union two pieces of the same code because I am getting outputs from API as a Objects or Arrays.
This query works, however when try to order or filter the view got this error:
JSON text is not properly formatted. Unexpected character '.' is found at position 12.
/*Object - invoiceDetails*/
SELECT
XML.[CustomerID],
XML.[SiteId],
XML.[Date],
JSON_VALUE(m.[value], '$.accountNbr') AS [AccountNumber],
JSON_VALUE(m.[value], '$.actualUsage') AS [ActualUsage]
FROM stage.Bill XML
CROSS APPLY openjson(XML.xmldata) AS n
CROSS APPLY openjson(n.value, '$.invoiceDetails') AS m
WHERE XML.XmlData IS NOT NULL
AND ISJSON (XML.xmldata) > 0
AND n.type = 5
AND m.type = 5
UNION ALL
/*Array - invoiceDetails*/
SELECT
XML.[CustomerID],
XML.[SiteId],
XML.[Date],
JSON_VALUE(o.[value], '$.accountNbr') AS [AccountNumber],
JSON_VALUE(o.[value], '$.actualUsage') AS [ActualUsage]
FROM stage.Bill XML
CROSS APPLY openjson(XML.xmldata) AS n
CROSS APPLY openjson(n.value) AS m
CROSS APPLY openjson(m.value, '$.invoiceDetails') AS o
WHERE XML.XmlData IS NOT NULL
AND ISJSON (XML.xmldata) > 0
AND n.type = 4
Just did a small exercise using the WITH clause in order to specify data types to the values and noticed I am able to order and filter the view. So I believe the way is to add data type to this query. My problem now is that I don't know how to add WITH clause to my query in order to make it work.
Any advice?
My apologies, you might find the XML naming and prefixes confusing. In my first tests, I supposed to received XML data from Data Factory so my table and columns includes XML prefixes untill I noticed Data Factory delivers JSON data, so started query data as JSON but I have not changed table name and prefixes.
Thank you for your comments to enrich this question. I took this example below from a YouTube channel (Not sure if I am allowed to mention channel's name:https://www.youtube.com/watch?v=yl9jKGgASTY&t=474s). I believe it is a similar approach. Could you please help how could I add the WITH clause in order to specify data types?
DECLARE #json NVARCHAR(MAX)
SET #json =
N'{
"OrderHeader": [
{
"OrderID": 100,
"CustomerID": 2000,
"OrderDetail": [
{
"ProductID": 2000,
"UnitPrice": 350
},
{
"ProductID": 5000,
"UnitPrice": 800
},
{
"ProductID": 9000,
"UnitPrice": 200
}
]
}
]
}'
SELECT
JSON_VALUE(a.value, '$.OrderID') AS OrderID,
JSON_VALUE(a.value, '$.CustomerID') AS CustomerID,
JSON_VALUE(a.value, '$.ProductID') AS ProductID,
JSON_VALUE(a.value, '$.UnitPrice') AS UnitPrice
FROM OPENJSON(#json, '$.OrderHeader') AS a
CROSS APPLY OPENJSON(a.value, 'OrderDetail') AS b
You have a couple of typos.
The second OPENJSON should have the path starting $.
The third and fourth SELECT columns should use b.value not a.value
DECLARE #json NVARCHAR(MAX)
SET #json =
N'{
"OrderHeader": [
{
"OrderID": 100,
"CustomerID": 2000,
"OrderDetail": [
{
"ProductID": 2000,
"UnitPrice": 350
},
{
"ProductID": 5000,
"UnitPrice": 800
},
{
"ProductID": 9000,
"UnitPrice": 200
}
]
}
]
}'
SELECT
JSON_VALUE(a.value, '$.OrderID') AS OrderID,
JSON_VALUE(a.value, '$.CustomerID') AS CustomerID,
JSON_VALUE(b.value, '$.ProductID') AS ProductID,
JSON_VALUE(b.value, '$.UnitPrice') AS UnitPrice
FROM OPENJSON(#json, '$.OrderHeader') AS a
CROSS APPLY OPENJSON(a.value, '$.OrderDetail') AS b;
An alternative syntax is to use OPENJSON on each object with an explicit schema
SELECT
oh.OrderID,
oh.CustomerID,
od.ProductID,
od.UnitPrice
FROM OPENJSON(#json, '$.OrderHeader')
WITH (
OrderID int,
CustomerID int,
OrderDetail nvarchar(max) AS JSON
) AS oh
CROSS APPLY OPENJSON(oh.OrderDetail)
WITH (
ProductID int,
UnitPrice decimal(18,9)
) AS od;
db<>fiddle

How to parse this array of dicts and extract key columns from a big query external table

I have this Gaint Array of (dicts) loaded from a Json in a date partitioned big query external table with table structure as below as
Field name
Type.
Mode
meta
Record
Nullable
Messages
String
Repeated
date
Integer
Nullable
Every "Messages" Field is in its own row/record in my Bigquery table (New_line_delimited_Json)
I am trying to parse the "messages" field/column to extract some fields Key1 and Key2 which happens to be inside an Array (of dicts). For sake of simplicity ,below is the snippet of json of which "messages" is a field that I am trying to unnest/explode.
Ignore this schema;updated schema below***
[
{
"meta": {
"table": "FEED",
"source": "CP1"
},
"Messages": [
"{
"Key1":"2022-01-10",
"Key2":"H21257061"
}"
],
"date": "20220110"
},
{
"meta": {
"table": "FEED",
"source": "CP1"
},
"Messages": [
"{
"Key1":"2022-01-11",
"Key2":"H21257062"
}"
],
"date": "20220111"
}
]
updated schema on 01/17
{
"meta": {
"table": "FEED",
"source": "CP1"
},
"Messages": [
"{
"Key1":"2022-01-10",
"Key2":"H21257061"
}",
"{
"Key1":"2022-01-10",
"Key2":"H21257062"
}"
],
"date": "20220110"
},
updated schema representation on 01/17:
so far I have tried this but I am getting sql output of key1 and Key2 as Nulls
WITH table AS (SELECT Messages as array_column FROM `project.dataset.table` )
SELECT
json_extract_scalar(flattened_array, '$.Messages.key1') as key1,
json_extract_scalar(flattened_array, '$.Messages.key2') as key2
FROM table t
CROSS JOIN UNNEST(t.array_column) AS flattened_array
Still a little ambiguous so I assume below correctly represents your table (at least it matches the structure/schema in your question)
If my assumption correct - consider below approach
select * except(id) from (
select to_json_string(t) id, kv[offset(0)] as key, kv[safe_offset(1)] as value
from your_table t,
t.messages as message,
unnest([struct( split(translate(message, '"', ''), ':') as kv)])
)
pivot (min(value) for key in ('Key1', 'Key2'))
If / when applied to above sample data - output is
Edit: Trying to help you further - Ok so, looks like your table looks like below
In this case - try below (quite light modification of previous version)
select * except(id) from (
select to_json_string(t) id, kv[offset(0)] as key, kv[safe_offset(1)] as value
from your_table t,
unnest(regexp_extract_all(messages, r'"[^"]+":"[^"]+"')) as message,
unnest([struct( split(translate(message, '"', ''), ':') as kv)])
)
pivot (min(value) for key in ('Key1', 'Key2'))
with output
But obviously, I would use below simplest approach
select
json_extract_scalar(messages, '$.Key1') as Key1,
json_extract_scalar(messages, '$.Key2') as Key2
from your_table
SELECT
JSON_QUERY(message,"$.Key1") as Key1,
JSON_QUERY(message,"$.Key2") as Key2
FROM
`project.dataset.table` as table
CROSS JOIN UNNEST(table.Messages) as message
CROSS JOIN for flattening the array,which will return a row for each message.
After that “JSON_QUERY” to extract the needed values from the JSON string.

BigQuery select expect double nested column

I am trying to remove a column from a BigQuery table and I've followed the instructions as stated here:
https://cloud.google.com/bigquery/docs/manually-changing-schemas#deleting_a_column_from_a_table_schema
This did not work directly as the column I'm trying to remove is nested twice in a struct. The following SO questions are relevant but none of them solve this exact case.
Single nested field:
BigQuery select * except nested column
Double nested field (solution has all fields in the schema enumerated, which is not useful for me as my schema is huge):
BigQuery: select * replace from multiple nested column
I've tried adapting the above solutions and I think I'm close but can't quite get it to work.
This one will remove the field, but returns only the nested field, not the whole table (for the examples I want to remove a.b.field_name. See the example schema at the end):
SELECT AS STRUCT * EXCEPT(a), a.* REPLACE (
(SELECT AS STRUCT a.b.* EXCEPT (field_name)) AS b
)
FROM `table`
This next attempt gives me an error: Scalar subquery produced more than one element:
WITH a_tmp AS (
SELECT AS STRUCT a.* REPLACE (
(SELECT AS STRUCT a.b.* EXCEPT (field_name)) AS b
)
FROM `table`
)
SELECT * REPLACE (
(SELECT AS STRUCT a.* FROM a_tmp) AS a
)
FROM `table`
Is there a generalised way to solve this? Or am I forced to use the enumerated solution in the 2nd link?
Example Schema:
[
{
"name": "a",
"type": "RECORD",
"fields": [
{
"name": "b",
"type": "RECORD"
"fields": [
{
"name": "field_name",
"type": "STRING"
},
{
"name": "other_field_name".
"type": "STRING"
}
]
},
]
}
]
I would like the final schema to be the same but without field_name.
Below is for BigQuery Standard SQL
#standardSQL
SELECT * REPLACE(
(SELECT AS STRUCT(SELECT AS STRUCT a.b.* EXCEPT (field_name)) b)
AS a)
FROM `project.dataset.table`
you can test, play with it using dummy data as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT STRUCT<b STRUCT<field_name STRING, other_field_name STRING>>(STRUCT('1', '2')) a
)
SELECT * REPLACE(
(SELECT AS STRUCT(SELECT AS STRUCT a.b.* EXCEPT (field_name)) b)
AS a)
FROM `project.dataset.table`

MS SQL json query/where clause nested array items

I have json data that i can query on using CROSS APPLY OPENJSON( which gets slow once you start adding multiple cross applies or once your json document get too large. So i wanted to add an index on the data im trying to filter on, but i cant get the syntax on nested array items to work with out using a cross apply. As such i cant create an index as you cant use a cross apply when making an index. According to the MS docs i should just be able to do
JSON_query(my_column, $.parentItem.nestedItemsArray1.nestedItemsArray2)
I should be able to get all the values of the nested, array items to then query on and improve performance by adding an index, something like this
ALTER TABLE mytable
ADD vdata AS JSON_query(my_column,
$.parentItem.nestedItemsArray1.nestedItemsArray2')
CREATE INDEX idx_json_my_column ON mytable(vdata)
but the above $.array.arrayitems syntax doesn't work ?
On a side note, I cant help but think in relational terms where normally in Sql you would index a column of data like so
col
---
1|
2|
3|
But json data seem to get flattened so when i use JSON_QUERY as per MS example i get "1,2,3" " I assume i want to incdex an array of values rather than a flattened version unless the index will return the inner data of the fattened data ?
my plug and play working example
declare #mydata table (
ID int NOT NULL,
jsondata varchar(max) NOT NULL
)
INSERT INTO #mydata (id, jsondata)
VALUES (789, '{ "Id": "12345", "FinanceProductResults": [ { "Term": 12, "AnnualMileage": 5000, "Deposits": 0, "ProductResults": [] }, { "Term": 18, "AnnualMileage": 30000, "Deposits": 15000, "ProductResults": [] }, { "Term": 24, "AnnualMileage": 5000, "Deposits": 0, "ProductResults": [ { "Key": "HP", "Payment": 460.28 } ] }, { "Term": 24, "AnnualMileage": 10000, "Deposits": 0, "ProductResults": [ { "Key": "HP", "Payment": 500.32 } ] }]}')
SELECT
j_Id
,JSON_query (c.value, '$.Term') as Term
,JSON_Value (c.value, '$.AnnualMileage') as AnnualMileage
,JSON_Value (c.value, '$.Deposits') as Deposits
,JSON_Value (p.value, '$.Key') as [Key]
,JSON_Value (p.value, '$.Payment') as Payment
--,c.value
FROM #mydata f
CROSS APPLY OPENJSON(f.jsondata)
WITH (j_Id nvarchar(100) '$.Id')
CROSS APPLY OPENJSON(f.jsondata, '$.FinanceProductResults') AS c
CROSS APPLY OPENJSON(c.value, '$."ProductResults"') AS p
where
ID = 789
AND JSON_Value (p.value, '$.Payment') = '460.28'
I'm using these MS docs to guide me :
How to create an index
How to get data
Update
I was able to improve performance slightly using the "with" method
SELECT
j_Id,
FinanceDetails.Term,
FinanceDetails.AnnualMileage,
FinanceDetails.Deposits,
Payments.Payment
FROM #mydata f
CROSS APPLY OPENJSON(f.jsondata)
WITH (j_Id nvarchar(100) '$.Id')
OUTER APPLY OPENJSON (f.jsondata, '$.FinanceProductResults' )
WITH (
Term INT '$.Term',
AnnualMileage INT '$.AnnualMileage',
Deposits INT '$.Deposits',
ProductResults NVARCHAR(MAX) '$.ProductResults' AS JSON
) AS FinanceDetails
OUTER APPLY OPENJSON(ProductResults, '$')
WITH (
Payment DECIMAL(19, 4) '$.Payment'
) AS Payments
WHERE
Payments.Payment = 460.28
but i still like to add an index on the sub array data to aid in improving performance ?
Currently, you cannot index nested properties.
Is Full-text search possible option? You might create FTS on JSON column and add predicate:
WHERE ....
AND CONTAINS( jsondata, 'NEAR(('Payments,460),1)')
Since JSON is text, this predicate will filter out all records that don't have something like "Payment" and 460 near to each other (this will identify key:value pairs), and you can apply CROSS APPLY on the reduced set of rows.