Query to extract ids from a deeply nested json array object in Presto - sql

I'm using Presto and trying to extract all 'id' from 'source'='dd' from a nested json structure as following.
{
"results": [
{
"docs": [
{
"id": "apple1",
"source": "dd"
},
{
"id": "apple2",
"source": "aa"
},
{
"id": "apple3",
"source": "dd"
}
],
"group": 99806
}
]
}
expected to extract the ids [apple1, apple3] into a column in Presto
I am wondering what is the right way to achieve this in Presto Query?

If your data has a regular structure as in the example you posted, you can use a combination of parsing the value as JSON, casting it to a structured SQL type (array/map/row) and the using array processing functions to filter, transform and extract the elements you want:
WITH data(value) AS (VALUES '{
"results": [
{
"docs": [
{
"id": "apple1",
"source": "dd"
},
{
"id": "apple2",
"source": "aa"
},
{
"id": "apple3",
"source": "dd"
}
],
"group": 99806
}
]
}'),
parsed(value) AS (
SELECT cast(json_parse(value) AS row(results array(row(docs array(row(id varchar, source varchar)), "group" bigint))))
FROM data
)
SELECT
transform( -- extract the id from the resulting docs
filter( -- filter docs with source = 'dd'
flatten( -- flatten all docs arrays into a single doc array
transform(value.results, r -> r.docs) -- extract the docs arrays from the result array
),
doc -> doc.source = 'dd'),
doc -> doc.id)
FROM parsed
The query above produces:
_col0
------------------
[apple1, apple3]
(1 row)

Related

Transform JSON: select one row from array of json objects

I can't get a specific row from this JSON array.
So I want to get the object where filed 'type' is equal to 'No-Data'
Are there exist any functions in SQL to take the row or some expressions?
"metadata": { "value": "JABC" },
"force": false
"users": [
{ "id": "111", "comment": "aaa", type: "Data" },
{ "id": "222", "comment": "bbb" , type:"No-Data"},
{ "id": "333", "comment": "ccc", type:"Data" }
]
You can use a JSON path query:
select jsonb_path_query_first(the_column, '$.users[*] ? (#.type == "No-Data")')
from the_table
This assumes that the column is defined as jsonb (which it should be). If it's not you have to cast it: the_column::jsonb
Online example

I want to combine json rows in t-sql into single json row

I have a table
id
json
1
{"url":"url2"}
2
{"url":"url2"}
I want to combine these into a single statement where the output is :
{
"graphs": [
{
"id": "1",
"json": [
{
"url": "url1"
}
]
},
{
"id": "2",
"json": [
{
"url": "url2"
}
]
}
]
}
I am using T-SQL, I've notice there is some stuff in postgres but can't find much on tsql.
Any help would be greatly appreciated..
You need to use JSON_QUERY on the json column to ensure it is not escaped.
SELECT
id,
JSON_QUERY('[' + json + ']')
FROM YourTable t
FOR JSON PATH, ROOT('graphs');
db<>fiddle

PySpark / Spark SQL DataFrame - Error while parsing Struct Type when data is null

I am trying to parse a JSON file, selectively read only 50+ data elements (out of 800+) into DataFrame in PySpark. One of the data elements (issues.customfield_666) is a Struct Type (with 3 fields Id/Name/Tag under it). Sometimes data in this Struct field comes as null. When that happens, spark job execution fails with the below error. How to ignore/suppress this error for null values?
Error is happening only when parsing JSON file #1 (where customfield_66 is coming as null).
AnalysisException: Can't extract value from issues.customfield_666: need struct type but got string
JSON File 1 (Where customfield_666 has only null)
{
"startAt": 0,
"total": 1,
"issues": [
{
"id": "1",
"key": "BSE-444",
"issuetype": {
"id": "30",
"name": "Epic1",
},
"customfield_666": null
}
]
}
JSON File 2 (Where customfield_666 has both null and struct values)
{
"startAt": 0,
"total": 2,
"issues": [
{
"id": "1",
"key": "BSE-444",
"issuetype": {
"id": "30",
"name": "Epic1",
},
"customfield_666": null
},
{
"id": "2",
"key": "BSE-555",
"issuetype": {
"id": "40",
"name": "Epic2",
},
"customfield_666":
{
"tag": "Smoke Testing",
"id": "666-01",
},
}
]
}
Below is the PySpark code used to parse above JSON data.
from pyspark.sql.functions import *
rawDF = spark.read.json("abfss://users#mydlsgen2rk.dfs.core.windows.net/raw/MyData.json", multiLine = "true")
DF = rawDF.select(explode("issues").alias("issues")) \
.select(
col("issues.id").alias("IssueId"),
col("issues.key").alias("IssueKey"),
col("issues.fields").alias("IssueFields"),
col("issues.issuetype.name").alias("IssueTypeName"),
col("issues.customfield_666.tag").alias("IssueCust666Tag")
)
You may check if it is null first
from pyspark.sql import functions as F
DF = rawDF.select(F.explode("issues").alias("issues")) \
.select(
F.col("issues.id").alias("IssueId"),
F.col("issues.key").alias("IssueKey"),
F.col("issues.fields").alias("IssueFields"),
F.col("issues.issuetype.name").alias("IssueTypeName"),
F.when(
F.col("issues.customfield_666").isNull() | (F.trim(F.col("issues.customfield_666").cast("string"))==""), None
).otherwise(
F.col("issues.customfield_666.tag")
).alias("IssueCust666Tag")
)
Let me know if this works for you

Trying to construct PostgreSQL Query to extract from JSON a text value in an object, in an array, in an object, in an array, in an object

I am constructing an interface between a PostgreSQL system and a SQL Server system and am attempting to "flatten" the structure of the JSON data to facilitate this. I'm very experienced in SQL Server but I'm new to both PostgreSQL and JSON.
The JSON contains essentially two types of structure: those of type "text" or "textarea" where the value I want is in an object named value (the first two cases below) and those of type "select" where the value object points to an id object in a lower-level options array (the third case below).
{
"baseGroupId": {
"fields": [
{
"id": "1f53",
"name": "Location",
"type": "text",
"options": [],
"value": "Over the rainbow"
},
{
"id": "b547",
"name": "Description",
"type": "textarea",
"options": [],
"value": "A place of wonderful discovery"
},
{
"id": "c12f",
"name": "Assessment",
"type": "select",
"options": [
{
"id": "e5fd",
"name": "0"
},
{
"id": "e970",
"name": "1"
},
{
"id": "0ff4",
"name": "2"
},
{
"id": "2db3",
"name": "3"
},
{
"id": "241f",
"name": "4"
},
{
"id": "3f52",
"name": "5"
}
],
"value": "241f"
}
]
}
}
Those with a sharp eye will see that the value of the last value object "241f" can also be seen within the options array against one of the id objects. When nested like this I need to extract the value of the corresponding name, in this case "4".
The JSON-formatted information is in table customfield field textvalue. It's datatype is text but I'm coercing it to json. I was originally getting array set errors when trying to apply the criteria in a WHERE clause and then I read about using a LATERAL subquery instead. It now runs but returns all the options, not just the one matching the value.
I'm afraid I couldn't get an SQL Fiddle working to reproduce my results, but I would really appreciate an examination of my query to see if the problem can be spotted.
with cte_custombundledfields as
(
select
textvalue
, cfname
, json_array_elements(textvalue::json -> 'baseGroupId'->'fields') ->> 'name' as name
, json_array_elements(textvalue::json -> 'baseGroupId'->'fields') ->> 'value' as value
, json_array_elements(textvalue::json -> 'baseGroupId'->'fields') ->> 'type' as type
from
customfield
)
, cte_custombundledfieldsoptions as
(
select *
, json_array_elements(json_array_elements(textvalue::json -> 'baseGroupId'->'fields') -> 'options') ->> 'name' as value2
from
cte_custombundledfields x
, LATERAL json_array_elements(x.textvalue::json -> 'baseGroupId'->'fields') y
, LATERAL json_array_elements(y -> 'options') z
where
type = 'select'
and z ->> 'id' = x.value
)
select *
from
cte_custombundledfieldsoptions
I posted a much-simplified rewrite of this question which was answered by Bergi.
How do I query a string from JSON based on another string within the JSON in PostgreSQL?

How to select field values from array of objects?

I have a JSON column with following JSON
{
"metadata": { "value": "JABC" },
"force": false,
"users": [
{ "id": "111", "comment": "abc" },
{ "id": "222", "comment": "abc" },
{ "id": "333" }
]
}
I am expecting list of IDs from the query output ["111","222", "333"]. I tried following query but getting null value.
select colName->'users'->>'id' ids from tableName
How to get this specific field value from the array of object?
You need to extract the array as rows and then get the id:
select json_array_elements(colName->'users')->>'id' ids from tableName;
If you're using jsonb rather than json, the function is jsonb_array_elements.