SQL Server : parsing a JSON column with irregular values - sql

I have SQL Server 2016 (v13) installation where I am trying to parse a column with JSON data. The data in the column RequestData is in the following format:
[
{ "Name": "SourceSystem", "Value": "SSValue" },
{ "Name": "SourceSystemId", "Value": "XYZ" }
]
[
{ "Name": "SourceSystemId", "Value": "SSID" },
{ "Name": "SourceSystem", "Value": "SSVALUE2" }
]
What I need to get are the values for the SourceSystem element of the JSON object in each row. And here is my Select statement:
SELECT TOP 2
JSON_VALUE(RequestData, '$[0].Value') AS SourceSystem
FROM
RequestDetail
But, due to the order of the JSON elements in the column's data, the values being returned for the SourceSystem column are not correct.
SSValue, SSID
Please note, I need to be able to parse the JSON elements so that the SourceSystem column will have correct values, i.e SSValue and SSValue2.
I have also tried JSON_Query using some online examples but no luck so far.
Thank you!
Edit
The Question has been modified by someone after I posted, so I am adding this for clarification: Each row of data, as given in the Question, will have several 'Name' elements and those Name elements can be SourceSystem or SourceSystemId. The Question shows data from two rows from the database table's column, but, as you can see, the SourceID and SourceSystemId elements in each row are not in the same order between the first and the second row. I simply need to parse SourceSystem element per row.

Using openjson, to get all the data in columns you can use it as any othe table
SELECT
Value
FROM RequestDetail
CROSS APPLY OPENJSON(RequestDetail.RequestData)
WITH (Name nvarchar(20),
Value nvarchar(20))
WHERE Name = 'SourceSystem';
Value
SSValue
SSVALUE2
fiddle

Presumably you need OPENJSON here, not JSON_VALUE:
SELECT *
FROM (VALUES(N'[{"Name":"SourceSystem","Value":"SSValue"},{"Name":"SourceSystemId","Value":"XYZ"}]'),
(N'[{"Name":"SourceSystemId","Value":"SSID"},{"Name":"SourceSystem","Value":"SSVALUE2"}]'))V(YourJSON)
CROSS APPLY OPENJSON(V.YourJSON)
WITH (Value nvarchar(20));

When you want to use JSON_VALUE, just select the correct (needed) values:
SELECT
JSON_VALUE(RequestData, '$[0].Value') AS SourceSystem
FROM RequestDetail
UNION ALL
SELECT
JSON_VALUE(RequestData, '$[1].Value') AS SourceSystem
FROM RequestDetail
output:
SourceSystem
SSValue
SSID
XYZ
SSVALUE2
When you only need values from "SourceSystem", you can always do:
SELECT SourceSystem
FROM (
SELECT
JSON_VALUE(RequestData, '$[0].Name') AS Name,
JSON_VALUE(RequestData, '$[0].Value') AS SourceSystem
FROM RequestDetail
UNION ALL
SELECT
JSON_VALUE(RequestData, '$[0].Name') AS Name,
JSON_VALUE(RequestData, '$[1].Value') AS SourceSystem
FROM RequestDetail )x
WHERE Name='SourceSystem';
output:
SourceSystem
SSValue
XYZ
see: DBFIDDLE
EDIT:
SELECT
x,
MIN(CASE WHEN Name='SourceSystem' THEN SourceSystem END) as SourceSystem,
MIN(CASE WHEN Name='SourceSystemId' THEN SourceSystem END) as SourceSystemId
FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY RequestData) as x,
JSON_VALUE(RequestData, '$[0].Name') AS Name,
JSON_VALUE(RequestData, '$[0].Value') AS SourceSystem
FROM RequestDetail
UNION ALL
SELECT
ROW_NUMBER() OVER (ORDER BY RequestData) as x,
JSON_VALUE(RequestData, '$[1].Name') AS Name,
JSON_VALUE(RequestData, '$[1].Value') AS SourceSystem
FROM RequestDetail
)x
GROUP BY x
;
This will give:
x
SourceSystem
SourceSystemId
1
SSValue
XYZ
2
SSVALUE2
SSID

Related

BigQuery Standard SQL, get max value from json array

I have a BigQuery column which contains STRING values like
col1
[{"a":1,"b":2},{"a":2,"b":3}]
[{"a":3,"b":4},{"a":5,"b":6}]
Now when doing a SELECT for each I want to get just the max. value of "a" in each json array for example here I would want the output of the SELECT on the table to be
2
5
Any ideas please? Thanks!
Use JSON_EXTRACT_ARRAY() to retrieve each array element. Then JSON_EXTRACT_VALUE():
with t as (
select '[{"a":1,"b":2},{"a":2,"b":3}]' as col union all
select '[{"a":3,"b":4},{"a":5,"b":6}]'
)
select t.*,
(select max(json_value(el, '$.a'))
from unnest(JSON_QUERY_ARRAY(col, '$')) el
)
from t;

PostgreSQL Output to JSON format

I have data that need to result in JSON format. I am using the following code:
WITH cte AS (
SELECT 'docker' AS type, '564df5a6sdf4654f6da4sf56a' AS id, 1 AS segment, 1 AS value
UNION ALL
SELECT 'docker' AS type, '564df5a6sdf4654f6da4sf56a' AS id, 2 AS segment, 100 AS value
)
SELECT type
, id
, json_agg(json_build_object(segment, value)) AS json_result
FROM cte
GROUP BY type
, id
Result for json_result column is: [{"1" : 1}, {"2" : 100}]
But the desired result is: {"1" : 1, "2" : 100}
How to adjust the query so it returns the desired output?
Use json_object_agg():
SELECT type
, id
, json_object_agg(segment, value) AS json_result
FROM cte
GROUP BY type
, id

how to convert jsonarray to multi column from hive

example:
there is a json array column(type:string) from a hive table like:
"[{"filed":"name", "value":"alice"}, {"filed":"age", "value":"14"}......]"
how to convert it into :
name age
alice 14
by hive sql?
I've tried lateral view explode but it's not working.
thanks a lot!
This is working example of how it can be parsed in Hive. Customize it yourself and debug on real data, see comments in the code:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field_map['field'] = 'name' then field_map['value'] end) as name,
max(case when field_map['field'] = 'age' then field_map['value'] end) as age --do the same for all fields
from
(
select t.id,
t.str as original_string,
str_to_map(regexp_replace(regexp_replace(trim(a.field),', +',','),'\\{|\\}|"','')) field_map --remove extra characters and convert to map
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14
One more approach using get_json_object:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field = 'name' then value end) as name,
max(case when field = 'age' then value end) as age --do the same for all fields
from
(
select t.id,
get_json_object(trim(a.field),'$.field') field,
get_json_object(trim(a.field),'$.value') value
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14

why Snowflake changing the order of JSON values when converting into flatten list?

I have JSON objects stored in the table and I am trying to write a query to get the first element from that JSON.
Replication Script
create table staging.par.test_json (id int, val varchar(2000));
insert into staging.par.test_json values (1, '{"list":[{"element":"Plumber"},{"element":"Craft"},{"element":"Plumbing"},{"element":"Electrics"},{"element":"Electrical"},{"element":"Tradesperson"},{"element":"Home services"},{"element":"Housekeepings"},{"element":"Electrical Goods"}]}');
insert into staging.par.test_json values (2,'
{
"list": [
{
"element": "Wholesale jeweler"
},
{
"element": "Fashion"
},
{
"element": "Industry"
},
{
"element": "Jewelry store"
},
{
"element": "Business service"
},
{
"element": "Corporate office"
}
]
}');
with cte_get_cats AS
(
select id,
val as category_list
from staging.par.test_json
),
cats_parse AS
(
select id,
parse_json(category_list) as c
from cte_get_cats
),
distinct_cats as
(
select id,
INDEX,
UPPER(cast(value:element AS varchar)) As c
from
cats_parse,
LATERAL flatten(INPUT => c:"list")
order by 1,2
) ,
cat_array AS
(
SELECT
id,
array_agg(DISTINCT c) AS sds_categories
FROM
distinct_cats
GROUP BY 1
),
sds_cats AS
(
select id,
cast(sds_categories[0] AS varchar) as sds_primary_category
from cat_array
)
select * from sds_cats;
Values: Categories
{"list":[{"element":"Plumber"},{"element":"Craft"},{"element":"Plumbing"},{"element":"Electrics"},{"element":"Electrical"},{"element":"Tradesperson"},{"element":"Home services"},{"element":"Housekeepings"},{"element":"Electrical Goods"}]}
Flattening it to a list gives me
["Plumber","Craft","Plumbing","Electrics","Electrical","Tradesperson","Home services","Housekeepings","Electrical Goods"]
Issue:
The order of this is not always same. Snowflake seems to change the ordering sometimes snowflake changes the order as per the alphabet.
How can I make this static. I do not want the order to be changed.
The problem is the way you're using ARRAY_AGG:
array_agg(DISTINCT c) AS sds_categories
Specifying it like that gives Snowflake no guidelines on how the contents of array should be arranged. You should not assume that the arrays will be created in the same order as their input records - it might, but it's not guaranteed. So you probably want to do
array_agg(DISTINCT c) within group (order by index) AS sds_categories
But that won't work, as if you use DISTINCT c, the value of index for each c is unknown. Perhaps you don't need DISTINCT, then this will work
array_agg(c) within group (order by index) AS sds_categories
If you do need DISTINCT, you need to somehow associate an index with a distinct c value. One way is to use a MIN function on index in the input. Here's a full query
with cte_get_cats AS
(
select id,
val as category_list
from staging.par.test_json
),
cats_parse AS
(
select id,
parse_json(category_list) as c
from cte_get_cats
),
distinct_cats as
(
select id,
MIN(INDEX) AS index,
UPPER(cast(value:element AS varchar)) As c
from
cats_parse,
LATERAL flatten(INPUT => c:"list")
group by 1,3
) ,
cat_array AS
(
SELECT
id,
array_agg(c) within group (order by index) AS sds_categories
FROM
distinct_cats
GROUP BY 1
),
sds_cats AS
(
select id,
cast(sds_categories[0] AS varchar) as sds_primary_category
from cat_array
)
select * from cat_array;

How to parse JSON in Standard SQL BigQuery?

After streaming some json data into BQ, we have a record that looks like:
"{\"Type\": \"Some_type\", \"Identification\": {\"Name\": \"First Last\"}}"
How would I extract the type from this? E.g. I would like to get Some_type.
I tried all possible combinations shown in https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions without success, namely, I thought:
SELECT JSON_EXTRACT_SCALAR(raw_json , "$[\"Type\"]") as parsed_type FROM `table` LIMIT 1000
is what I need. However, I get:
Invalid token in JSONPath at: ["Type"]
Picture of rows preview
Below example is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, "{\"Type\": \"Some_type\", \"Identification\": {\"Name\": \"First Last\"}}" raw_json UNION ALL
SELECT 2, '{"Type": "Some_type", "Identification": {"Name": "First Last"}}'
)
SELECT id, JSON_EXTRACT_SCALAR(raw_json , "$.Type") AS parsed_type
FROM `project.dataset.table`
with result
Row id parsed_type
1 1 Some_type
2 2 Some_type
See below update example - take a look at third record which I think mimic your case
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, "{\"Type\": \"Some_type\", \"Identification\": {\"Name\": \"First Last\"}}" raw_json UNION ALL
SELECT 2, '''{"Type": "Some_type", "Identification": {"Name": "First Last"}}''' UNION ALL
SELECT 3, '''"{\"Type\": \"
null1\"}"
'''
)
SELECT id,
JSON_EXTRACT_SCALAR(REGEXP_REPLACE(raw_json, r'^"|"$', '') , "$.Type") AS parsed_type
FROM `project.dataset.table`
with result
Row id parsed_type
1 1 Some_type
2 2 Some_type
3 3 null1
Note: I use null1 instead of null so you can easily see that it is not a NULL but rather string null1