Extract complex nested JSON array in Presto

Extract complex nested JSON array in Presto - sql

I have a complex JSON object like this:
{
"item_detail": [
{
"itemid": "4702385896",
"modelid": "8307307322",
"quantity": "1"
},
{
"itemid": "3902478595",
"modelid": "8306561848",
"quantity": "1"
},
{
"itemid": "3409528897",
"modelid": "10922686275",
"quantity": "1"
},
{
"itemid": "4702385896",
"modelid": "8307307323",
"quantity": "1"
}
],
"shopid": "128449080"
},
{
"item_detail": [
{
"itemid": "7906381345",
"modelid": "9745718882",
"quantity": "1"
},
{
"itemid": "6710792892",
"modelid": "11474621623",
"quantity": "1"
}
],
"shopid": "36121097"
}
]
I am struggling in extracting all (itemid, shopid) into rows in Presto. My wanted outcomes are:
itemid | shopid
-----------+-------
4702385896 | 128449080
3902478595 | 128449080
3409528897 | 128449080
4702385896 | 128449080
7906381345 | 36121097
6710792892 | 36121097
I have used CROSS JOIN UNNEST and TRANSFORM to get the result with no luck. Does anyone have a solution for this?
So many thanks in advance!

Use json_extract with cast to array and unnest, like this:
presto:default> SELECT
-> json_extract_scalar(item_detail, '$.itemid') itemid,
-> json_extract_scalar(shopping, '$.shopid') shopid
-> FROM t
-> CROSS JOIN UNNEST(CAST(my_json AS array(json))) AS x(shopping)
-> CROSS JOIN UNNEST(CAST(json_extract(shopping, '$.item_detail') AS array(json))) AS y(item_detail)
-> ;
->
itemid | shopid
------------+-----------
4702385896 | 128449080
3902478595 | 128449080
3409528897 | 128449080
4702385896 | 128449080
7906381345 | 36121097
6710792892 | 36121097
(6 rows)
(verified on Presto 327)
BTW if any of the arrays may be empty or missing, I recommend using LEFT JOIN UNNEST ... ON true instead of CROSS JOIN UNNEST (this requires a decent Presto version):
SELECT
json_extract_scalar(item_detail, '$.itemid') itemid,
json_extract_scalar(shopping, '$.shopid') shopid
FROM t
LEFT JOIN UNNEST(CAST(my_json AS array(json))) AS x(shopping) ON true
LEFT JOIN UNNEST(CAST(json_extract(shopping, '$.item_detail') AS array(json))) AS y(item_detail) ON true;

Related

is there a way to extract duplicated row value in sql as the key/grouping value?

I have following two tables
users
id | name
1 | john
2 | ada
events
id | content | userId
1 | 'applied' | 1
2 | 'interviewed| 1
What would be the query that returns data in the following shape:
[
{name:'john', events:[{id:1, content:'applied'},{id:2, content:'interviewed'}]}
]
I have tried to run following queries
attempt 1
select events.id, content, users.name
from events
left join users
on users.id=events.userId
where events.userId = ?
but it return duplicated value for the name as following
[
{
"id": 1,
"content": "ronaldo",
"name": "Norman Zboncak"
},
{
"id": 2,
"content": "messi",
"name": "Norman Zboncak"
},
{
"id": 3,
"content": "messi",
"name": "Norman Zboncak"
}
]
attempt 2
I tried to use group_concat but apparently you cannot pas multiple arguments into it so couldn't get the result in the desired shape

You must do a LEFT join of users to events and aggregate with SQLite's JSON Functions:
SELECT json_object(
'name', u.name,
'events', json_group_array(json_object('id', e.id, 'content', e.content))
) result
FROM users u LEFT JOIN events e
ON e.userId = u.id
WHERE u.id = 1 -- remove this line to get results for all users
GROUP BY u.id;
See the demo.

SQL select from array in JSON

I have a table with a json field with the following json:
[
{
productId: '1',
other : [
otherId: '2'
]
},
{
productId: '3',
other : [
otherId: '4'
]
}
]
I am trying to select the productId and otherId for every array element like this:
select JSON_EXTRACT(items, $.items[].productId) from order;
But this is completely wrong since it takes only the first element in the array
Do I need to write a loop or something?

First of all, the data you show is not valid JSON. It has multiple mistakes that make it invalid.
Here's a demo using valid JSON:
mysql> create table orders ( items json );
mysql> insert into orders set items = '[ { "productId": "1", "other": { "otherId": "2" } }, { "productId": "3", "other" : { "otherId": "4" } } ]'
mysql> SELECT JSON_EXTRACT(items, '$[*].productId') AS productIds FROM orders;
+------------+
| productIds |
+------------+
| ["1", "3"] |
+------------+
If you want each productId on a row by itself as a scalar value instead of a JSON array, you'd have to use JSON_TABLE() in MySQL 8.0:
mysql> SELECT j.* FROM orders CROSS JOIN JSON_TABLE(items, '$[*]' COLUMNS(productId INT PATH '$.productId')) AS j;
+-----------+
| productId |
+-----------+
| 1 |
| 3 |
+-----------+
This is tested in MySQL 8.0.23.
You also tagged your question MariaDB. I don't use MariaDB, and MariaDB has its own incompatible implementation of JSON support, so I can't predict how it will work.

How to Get First_Value(), Last_Value() and previous Date action for an Array object inside a VARIANT column SnowflakeSQL

I have a VARIANT column call 'REQUEST' in the table 'QWERTY' that contains an Array object inside a JSON like
{
"ID": "123123",
"workflowHistory": [
{
"id": "666",
"workflowType": "CCC",
"entityId": "123123",
"creator": {
"id": "503081",
"displayName": "AGENT2",
"email": "AGENT2#SOMETHING.com",
"userAvatarUrl": "XXXXXXX"
},
"createdDate": "2020-04-30T21:58:09Z",
"deletor": null,
"deletedDate": null,
"clientId": "000000000",
"value": "00000000"
},
{
"id": "555",
"workflowType": "AAA",
"entityId": "123123",
"creator": {
"id": "503080",
"displayName": "AGENT1",
"email": "AGENT1#SOMETHING.com",
"userAvatarUrl": "XXXXXXX"
},
"createdDate": "2020-04-30T21:55:09Z",
"deletor": null,
"deletedDate": null,
"clientId": "000000000",
"value": "00000000"
},
{
"id": "444",
"workflowType": "xyz",
"entityId": "123123",
"creator": {
"id": "503080",
"displayName": "AGENT1",
"email": "AGENT1#SOMETHING.com",
"userAvatarUrl": "XXXXXXX"
},
"createdDate": "2020-04-30T21:19:09Z",
"deletor": null,
"deletedDate": null,
"clientId": "000000000",
"value": "00000000"
},
{
"id": "333",
"workflowType": "BBB",
"entityId": "123123",
"creator": {
"id": "503079",
"displayName": "AGENT0",
"email": "AGENT0#SOMETHING.com",
"userAvatarUrl": "XXXXXXX"
},
"createdDate": "2020-04-30T21:10:09Z",
"deletor": null,
"deletedDate": null,
"clientId": "000000000",
"value": "00000000"
},
{
"id": "222",
"workflowType": "ZZZ",
"entityId": "123123",
"creator": {
"id": "503079",
"displayName": "AGENT0",
"email": "AGENT0#SOMETHING.com",
"userAvatarUrl": "XXXXXXX"
},
"createdDate": "2020-04-30T21:08:09Z",
"deletor": null,
"deletedDate": null,
"clientId": "000000000",
"value": "00000000"
}
]
}
Also, 'QWERTY' table has HAVERST_DATE and the PK ARTICLE_ID (the same as REQUEST:workflowHistory.ID), I am trying to get an output with the following columns:
ID
Last createdDate for an AGENTn
First createdDate for an AGENTn
the previous createdDate that is made BY AGENTn-1
the next createdDate that is made BY AGENTn+1
I would like an output like:
OUTPUT
For this I'm building A query as follows:
WITH WorkFlow_Parsed AS(
SELECT ARTICLE_ID,
HARVEST_DATE,
value:createdDate::timestamp_tz AS create_date,
value:creator:email AS email,
value:workflowType AS workflowType,
value:value AS value
FROM 'QWERTY', lateral flatten( input => REQUEST:workflowHistory )
),
lag_Agent_timing AS
(SELECT
WorkFlow_Parsed.ARTICLE_ID AS ARTICLE_ID,WorkFlow_Parsed.email,LAG(WorkFlow_Parsed.create_date) IGNORE NULLS over (partition by WorkFlow_Parsed.email,WorkFlow_Parsed.ARTICLE_ID order by WorkFlow_Parsed.create_date) AS lag_date_value
FROM WorkFlow_Parsed),
lead_agent_timing AS
(SELECT
WorkFlow_Parsed.ARTICLE_ID AS ARTICLE_ID,WorkFlow_Parsed.email,LEAD(WorkFlow_Parsed.create_date) IGNORE NULLS over (partition by WorkFlow_Parsed.email,WorkFlow_Parsed.ARTICLE_ID order by WorkFlow_Parsed.create_date) AS lead_date_value
FROM WorkFlow_Parsed)
SELECT
DISTINCT
WorkFlow_Parsed.ARTICLE_ID AS _ARTICLE_ID,
WorkFlow_Parsed.email AS _email,
last_value(WorkFlow_Parsed.create_date) over (partition by WorkFlow_Parsed.email,WorkFlow_Parsed.ARTICLE_ID order by WorkFlow_Parsed.create_date) AS last_date_value,
first_value(WorkFlow_Parsed.create_date) over (partition by WorkFlow_Parsed.email,WorkFlow_Parsed.ARTICLE_ID order by WorkFlow_Parsed.create_date) AS first_date_value,
MAX(lag_Agent_timing.lag_date_value),
MIN(lead_agent_timing.lead_date_value)
FROM WorkFlow_Parsed
JOIN lag_Agent_timing ON WorkFlow_Parsed.ARTICLE_ID=lag_Agent_timing.ARTICLE_ID AND lag_Agent_timing.email=WorkFlow_Parsed.email
JOIN lead_agent_timing ON WorkFlow_Parsed.ARTICLE_ID=lead_agent_timing.ARTICLE_ID AND lead_agent_timing.email=WorkFlow_Parsed.email
GROUP BY _ARTICLE_ID,_email
But I Got the error: "[SYS_VW.CREATE_DATE_1] is not a valid group by expression"`
How could I Fix it?

[SYS_VW.CREATE_DATE_1] is not a valid group by expression
The error is coming from your use of GROUP BY in the final SELECT query. It is pointing out that you are referencing/using Workflow_Parsed.create_date in the query as a non-group column but it isn't part of the GROUP BY _ARTICLE_ID, _email expression, i.e. it is the same as [Workflow_Parsed.create_date] is not a valid group by expression that you will receive if you simplify the query a bit.
Snowflake does not permit aggregating over a window function expression and if you'd like to mix a GROUP BY with a window function, try nesting the query in an structure such as SELECT cols, aggregate(cols) FROM (SELECT cols, window(cols)) GROUP BY cols to separate the two (i.e. apply window functions over all rows first, then group the entire result it produces).
I'm unsure what the window functions are attempting in your sample query because I do not see the agent's n ± 1 relations anywhere in them, but going by your described requirement and the sample output included, the following should work (it just uses scalar subqueries, no window functions):
WITH workflows AS (
SELECT PARSE_JSON('{"ID":"123123","workflowHistory":[{"id":"666","workflowType":"CCC","entityId":"123123","creator":{"id":"503081","displayName":"AGENT2","email":"AGENT2#SOMETHING.com","userAvatarUrl":"XXXXXXX"},"createdDate":"2020-04-30T21:58:09Z","deletor":null,"deletedDate":null,"clientId":"000000000","value":"00000000"},{"id":"555","workflowType":"AAA","entityId":"123123","creator":{"id":"503080","displayName":"AGENT1","email":"AGENT1#SOMETHING.com","userAvatarUrl":"XXXXXXX"},"createdDate":"2020-04-30T21:55:09Z","deletor":null,"deletedDate":null,"clientId":"000000000","value":"00000000"},{"id":"444","workflowType":"xyz","entityId":"123123","creator":{"id":"503080","displayName":"AGENT1","email":"AGENT1#SOMETHING.com","userAvatarUrl":"XXXXXXX"},"createdDate":"2020-04-30T21:19:09Z","deletor":null,"deletedDate":null,"clientId":"000000000","value":"00000000"},{"id":"333","workflowType":"BBB","entityId":"123123","creator":{"id":"503079","displayName":"AGENT0","email":"AGENT0#SOMETHING.com","userAvatarUrl":"XXXXXXX"},"createdDate":"2020-04-30T21:10:09Z","deletor":null,"deletedDate":null,"clientId":"000000000","value":"00000000"},{"id":"222","workflowType":"ZZZ","entityId":"123123","creator":{"id":"503079","displayName":"AGENT0","email":"AGENT0#SOMETHING.com","userAvatarUrl":"XXXXXXX"},"createdDate":"2020-04-30T21:08:09Z","deletor":null,"deletedDate":null,"clientId":"000000000","value":"00000000"}]}') AS request
), workflow_rows AS (
SELECT
w.request:ID::varchar AS article_id,
lf.value:createdDate::timestamp_tz AS created_date,
lf.value:creator.id::integer AS creator_id,
lf.value:creator.email::varchar AS creator_email,
lf.value:workflowType::varchar AS workflow_type,
lf.value:value::varchar AS workflow_value
FROM workflows w, LATERAL FLATTEN(REQUEST:workflowHistory) lf
), article_workflow_creators AS (
SELECT DISTINCT
article_id,
creator_id,
creator_email
FROM workflow_rows
)
SELECT
awc.article_id,
awc.creator_id,
awc.creator_email,
(SELECT MAX(wr.created_date) FROM workflow_rows wr WHERE wr.article_id = awc.article_id AND wr.creator_id = awc.creator_id) AS last_date_value,
(SELECT MIN(wr.created_date) FROM workflow_rows wr WHERE wr.article_id = awc.article_id AND wr.creator_id = awc.creator_id) AS first_date_value,
(SELECT MAX(wr.created_date) FROM workflow_rows wr WHERE wr.article_id = awc.article_id AND wr.creator_id = awc.creator_id - 1) AS previous_date,
(SELECT MAX(wr.created_date) FROM workflow_rows wr WHERE wr.article_id = awc.article_id AND wr.creator_id = awc.creator_id + 1) AS next_date
FROM article_workflow_creators awc;
For the single JSON row input included in the question, this produces:
+------------+------------+----------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+
| ARTICLE_ID | CREATOR_ID | CREATOR_EMAIL | LAST_DATE_VALUE | FIRST_DATE_VALUE | PREVIOUS_DATE | NEXT_DATE |
|------------+------------+----------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------|
| 123123 | 503081 | AGENT2#SOMETHING.com | 2020-04-30 21:58:09.000 +0000 | 2020-04-30 21:58:09.000 +0000 | 2020-04-30 21:55:09.000 +0000 | NULL |
| 123123 | 503080 | AGENT1#SOMETHING.com | 2020-04-30 21:55:09.000 +0000 | 2020-04-30 21:19:09.000 +0000 | 2020-04-30 21:10:09.000 +0000 | 2020-04-30 21:58:09.000 +0000 |
| 123123 | 503079 | AGENT0#SOMETHING.com | 2020-04-30 21:10:09.000 +0000 | 2020-04-30 21:08:09.000 +0000 | NULL | 2020-04-30 21:55:09.000 +0000 |
+------------+------------+----------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+

I share the code of how to use the recommended syntax
WITH WorkFlow_Parsed AS(
SELECT ARTICLE_ID,
HARVEST_DATE,
value:createdDate::timestamp_tz AS create_date,
value:creator:email AS email,
value:workflowType AS workflowType,
value:value AS value
FROM 'QWERTY', lateral flatten( input => REQUEST:workflowHistory )
)
SELECT _ARTICLE_ID, _email, last_date_value,first_date_value,
MIN(lag_value),
MAX(lead_value)
FROM (
SELECT
DISTINCT
WorkFlow_Parsed.ARTICLE_ID AS _ARTICLE_ID,
WorkFlow_Parsed.email AS _email,
last_value(WorkFlow_Parsed.create_date) over (partition by WorkFlow_Parsed.email,WorkFlow_Parsed.ARTICLE_ID order by WorkFlow_Parsed.create_date) AS last_date_value,
first_value(WorkFlow_Parsed.create_date) over (partition by WorkFlow_Parsed.email,WorkFlow_Parsed.ARTICLE_ID order by WorkFlow_Parsed.create_date) AS first_date_value,
COALESCE(LAG(WorkFlow_Parsed.create_date) IGNORE NULLS over (partition by WorkFlow_Parsed.ARTICLE_ID order by WorkFlow_Parsed.create_date),'1900-01-01 00:00:00') AS lag_value,
COALESCE(LEAD(WorkFlow_Parsed.create_date) IGNORE NULLS over (partition by WorkFlow_Parsed.ARTICLE_ID order by WorkFlow_Parsed.create_date),'2100-01-01 00:00:00') AS lead_value
FROM WorkFlow_Parsed) GROUP BY _ARTICLE_ID,_email,last_date_value,first_date_value

How to query nested arrays in a postgres json column?

I have some json similar to the json below stored in a postgres json column. I'm trying query it to identify some incorrectly entered data. I'm basically looking for addresses where the house description is the same as the house number. I can't quite work out how to do it.
{
"timestamp": "2014-10-23T16:15:28+01:00",
"schools": [
{
"school_id": "1",
"addresses": [
{
"town": "Birmingham",
"house_description": "1",
"street_name": "Parklands",
"addr_id": "4",
"postcode": "B5 8KL",
"house_no": "1",
"address_type": "UK"
},
{
"town": "Plymouth",
"house_description": "Flat a",
"street_name": "Fore Street",
"addr_id": "2",
"postcode": "PL9 8AY",
"house_no": "15",
"address_type": "UK"
}
]
},
{
"school_id": "2",
"addresses": [
{
"town": "Coventry",
"street_name": "Shipley Way",
"addr_id": "19",
"postcode": "CV8 3DL",
"house_no": "662",
"address_type": "UK"
}
]
}
]
}
I have written this sql which will find where the data matches:
select *
FROM title_register_data
where address_data->'schools'->0->'addresses'->0->>'house_description'=
address_data->'schools'->0->'addresses'->0->>'house_no'
This obviously only works on the first address on the first school. Is there a way of querying all of the addresses of every school?

Use jsonb_array_elements() in lateral, join as many times as the depth of a json array which elements you want to compare:
select
schools->>'school_id' school_id,
addresses->>'addr_id' addr_id,
addresses->>'house_description' house_description,
addresses->>'house_no' house_no
from title_register_data,
jsonb_array_elements(address_data->'schools') schools,
jsonb_array_elements(schools->'addresses') addresses
where addresses->>'house_description' = addresses->>'house_no';
school_id | addr_id | house_description | house_no
-----------+---------+-------------------+----------
1 | 4 | 1 | 1
(1 row)

Linq to XML query to SQL

UPDATE:
I've turned my xml into a query table in coldfusion, so this may help to solve this.
So my data is:
[id] | [code] | [desc] | [supplier] | [name] | [price]
------------------------------------------------------
1 | ABCDEF | "Tst0" | "XYZ" | "Test" | 123.00
2 | ABCDXY | "Tst1" | "XYZ" | "Test" | 130.00
3 | DCBAZY | "Tst2" | "XYZ" | "Tst2" | 150.00
Now what I need is what the linq to xml query outputs below. Output should be something like (i'll write it in JSON so it's easier for me to type) this:
[{
"code": "ABCD",
"name": "Test",
"products":
{
"id": 1,
"code": "ABCDEF",
"desc": "Tst0",
"price": 123.00
},
{
"id": 2,
"code": "ABCDXY",
"desc": "Tst1",
"price": 130.00
}
},
{
"code": "DCBA",
"name": "Tst2",
"products":
{
"id": 3,
"code": "DCBAZY",
"desc": "Tst2",
"price": 150.00
}
}]
As you can see, Group by the first 4 characters of 'CODE' and 'Supplier' code.
Thanks
How would i convert the following LINQ to XML query to SQL?
from q in query
group q by new { Code = q.code.Substring(0, 4), Supplier = q.supplier } into g
select new
{
code = g.Key.Code,
fullcode = g.FirstOrDefault().code,
supplier = g.Key.Supplier,
name = g.FirstOrDefault().name,
products = g.Select(x => new Product { id = x.id, c = x.code, desc = string.IsNullOrEmpty(x.desc) ? "Description" : x.desc, price = x.price })
}
Best i could come up with:
SELECT c, supplier, n
FROM products
GROUP BY C, supplier, n
Not sure how to get the subquery in there or get the substring of code.
ps: this is for coldfusion, so I guess their version of sql might be different to ms sql..

The easiest way is to attache a profiler to you database and see what query is generate by the linq-to-SQL engine.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Extract complex nested JSON array in Presto - sql

Related

is there a way to extract duplicated row value in sql as the key/grouping value?

SQL select from array in JSON

How to Get First_Value(), Last_Value() and previous Date action for an Array object inside a VARIANT column SnowflakeSQL

How to query nested arrays in a postgres json column?

Linq to XML query to SQL

Categories

Resources