AWS Athena Extract Array in Json - sql

I have a json column and inside this json column is an array structure. I couldn't figure out how to get the array there.
I trÄ°ed to this
cast(json_extract(ise, '$.userservice') as varchar) as ise_user_service
Example Json
ise
{
"userpa":"****",
"userlo":"*****",
"sessi":"******",
"cl":{
"name":"****",
"id":"*****"
},
"usermains":"******",
"userservice":[
"1****",
"23***4**",
"124****",
"034****"
],
"usergeoloc":"********",
"userparty":"*******"
}

You can either use json_format to turn it into string or cast to ARRAY(VARCHAR) depending on the usecase:
WITH dataset AS (
SELECT * FROM (VALUES
( JSON ' {
"userpa":"****",
"userlo":"*****",
"sessi":"******",
"cl":{
"name":"****",
"id":"*****"
},
"usermains":"******",
"userservice":[
"1****",
"23***4**",
"124****",
"034****"
],
"usergeoloc":"********",
"userparty":"*******"
}')
) AS t (ise))
SELECT json_format(json_extract(ise, '$.userservice')) as ise_string,
cast(json_extract(ise, '$.userservice') as ARRAY(VARCHAR)) as ise_array
FROM dataset

Related

How to extract a field from an array of JSON objects in AWS Athena?

I have the following JSON data structure in a column in AWS Athena:
[
{
"event_type": "application_state_transition",
"data": {
"event_id": "-3368023833341021830"
}
},
{
"event_type": "application_state_transition",
"data": {
"event_id": "5692882176024811076"
}
}
]
I would like to somehow extract the values of event_id field, e.g. in the form of a list:
["-3368023833341021830", "5692882176024811076"]
(Though I don't insist on exactly this as long as I can get my event IDs.)
I wanted to use the JSON_EXTRACT function and thought it uses the very same syntax as jq. In jq, I can easily get what I want using the following query syntax:
.[].data.event_id
However, in AWS Athena this results in an error, as apparently the syntax is not entirely compatible with jq. Is there an alternative way to achieve the result I want?
JSON_EXTRACT supports quite limited set of json paths. Depending on Athena engine version you can either process column by casting it to array of maps and processing this array via array functions:
-- sample data
with dataset(json_col) as (
values ('[
{
"event_type": "application_state_transition",
"data": {
"event_id": "-3368023833341021830"
}
},
{
"event_type": "application_state_transition",
"data": {
"event_id": "5692882176024811076"
}
}
]')
)
-- query
select transform(
cast(json_parse(json_col) as array(map(varchar, json))),
m -> json_extract(m['data'], '$.event_id'))
from dataset;
Output:
_col0
["-3368023833341021830", "5692882176024811076"]
Or for 3rd Athena engine version you can try using Trino's json_query:
-- query
select JSON_QUERY(json_col, 'lax $[*].data.event_id' WITH ARRAY WRAPPER)
from dataset;
Note that return type of two will differ - in first case you will have array(json) and in the second one - just varchar.

Handle partial Stringified Partial Key-value JSON VARIANT on SQL

I have a variant object like the below:
{
"build_num": "111",
"city_name": "Paris",
"rawData": "\"sku\":\"AAA\",\"price\":19.98,\"currency\":\"USD\",\"quantity\":1,\"size\":\"\"}",
"country": "France"
}
So you can see that parts of it are regular key-value pairs like build num and city name, but then we have raw data which its value is a stringified version of a JSON
I would like to create a variant from this that will look like:
{
"build_num": "111",
"city_name": "Paris",
"rawData": {
"sku":"AAA",
"price":19.98,
"currency":"USD",
"quantity":1}
"country": "France"
}
AND I would like to do this all in SQL (Snowflake) - is that possible?
so to poke the "data" into I have used a CTE, and escaped the sub json, so it is in the DB as you describe. I also had to add the missing start of object token {, to make rawData valid.
WITH data AS (
SELECT parse_json(json) as json
FROM VALUES
('{"build_num": "111","city_name": "Paris","country": "France","rawData": "{\\"sku\\":\\"AAA\\",\\"price\\":19.98,\\"currency\\":\\"USD\\",\\"quantity\\":1,\\"size\\":\\"\\"}"}')
v( json)
)
SELECT
json,
json:rawData as raw_data,
parse_json(raw_data) as sub_json,
OBJECT_INSERT(json, 'rawData', sub_json, true) as all_json
FROM data;
so this show step by step transforming the data, parsing it via PARSE_JSON, and re inject via OBJECT_INSERT the result into the original object.
WITH data AS (
SELECT parse_json(json) as json
FROM VALUES
('{"build_num": "111","city_name": "Paris","country": "France","rawData": "{\\"sku\\":\\"AAA\\",\\"price\\":19.98,\\"currency\\":\\"USD\\",\\"quantity\\":1,\\"size\\":\\"\\"}"}')
v( json)
)
SELECT
OBJECT_INSERT(json, 'rawData', parse_json(json:rawData), true) as all_json
FROM data;
TRY_PARSE_JSON will clear off additional slashes from JSON element and below is reference:
select
column1 as n,
try_parse_json(column1) as v
from
values
(
'{
"build_num": "111",
"city_name": "Paris",
"rawData": "{\"sku\":\"AAA\",\"price\":19.98,\"currency\":\"USD\",\"quantity\":1,\"size\":\"\"}",
"country": "France"}'
) as vals;

Parsing JSON in Snowflake

I'm trying to parse a the below nested JSON in Snowflake using the latteral function in Snowflake but I wanted to each nested column in "GoalTime" to show up as a column. For example,
GoalTime_InDoorOpen
2020-03-26T12:58:00-04:00
GoalTime_InLastOff
null
GoalTime_OutStartBoarding
2020-03-27T14:00:00-04:00
"GoalTime": [
{
"GoalName": "GoalTime_InDoorOpen",
"GoalTime": "2020-03-26T12:58:00-04:00"
},
{
"GoalName": "GoalTime_InLastOff"
},
{
"GoalName": "GoalTime_InReadyToTow"
},
{
"GoalName": "GoalTime_OutTowAtGate"
},
{
"GoalName": "GoalTime_OutStartBoarding",
"GoalTime": "2020-03-27T14:00:00-04:00"
},
or if you have many rows (what appear to be flights) and thus you need to columns per flight this code be what you are after
with data as (
select flight_code, parse_json(json) as json from values ('nz101','{GoalTime:[{"GoalName": "GoalA", "GoalTime": "2020-03-26T12:58:00-04:00"}, {"GoalName": "GoalB"}]}'),
('nz201','{GoalTime:[{"GoalName": "GoalA"}, {"GoalName": "GoalB", "GoalTime": "2020-03-26T12:58:00-02:00"}]}')
j(flight_code, json)
), unrolled as (
select d.flight_code, f.value:GoalName as goal_name, f.value:GoalTime as goal_time
from data d,
lateral flatten (input => json:GoalTime) f
)
select *
from unrolled
pivot(min(goal_time) for goal_name in ('GoalA', 'GoalB'))
order by flight_code;
it gives the results:
FLIGHT_CODE 'GoalA' 'GoalB'
nz101 "2020-03-26T12:58:00-04:00" null
nz201 null "2020-03-26T12:58:00-02:00"
create or replace function JSON_STRING()
returns string
language javascript
as
$$
return `
[
{
"GoalName": "GoalTime_InDoorOpen",
"GoalTime": "2020-03-26T12:58:00-04:00"
},
{
"GoalName": "GoalTime_InLastOff"
},
{
"GoalName": "GoalTime_InReadyToTow"
},
{
"GoalName": "GoalTime_OutTowAtGate"
},
{
"GoalName": "GoalTime_OutStartBoarding",
"GoalTime": "2020-03-27T14:00:00-04:00"
}
]
`;
$$;
select value:GoalName::string as GoalName, value:GoalTime::timestamp as GoalTime
from lateral flatten(input => parse_json(JSON_STRING()));
-- See how the lateral flatten combination works on a JSON variant:
select * from lateral flatten(input => parse_json(JSON_STRING()));
I wrote this to run in any Snowflake worksheet, no tables needed. The function on top simply allows the JSON to be written as a multi-line string in the SQL statement below it. It has no other use than representing a string holding your JSON.
Step 1 is to PARSE_JSON, which converts a string into a variant data type formatted as a JSON object.
Step 2 is the lateral flatten. If you do a select star on that, it will return a number of columns. One of them is "value".
Step 3 is to extract the properties you want using single : notation for the property name and dots to traverse down the nodes from there (if there are any).
Step 4 is to cast the property to the data type you want using double :: notation. This is especially important if you're doing comparisons on the column particularly in join keys.
Note that there's a slight invalid part of the JSON that did not allow it to parse. In the top level the array had a property, which did not parse. I removed that to allow parsing.
Probably close to what you seek is using a standard SQL UNION statement.
Given the following are true to recreate the solution:
Created a table 'JSON_GOALS' with one column for raw JSON called, GOALS_RAW
You have loaded JSON data into a table as the raw JSON, with compliant JSON object array syntax, and a parent, GoalTimeGroup, ex: {[{}]}, so
{
"GoalTimeGroup": [{
"GoalName": "GoalTime_InDoorOpen",
"GoalTime": "2020-03-26T12:58:00-04:00"
},
{
"GoalName": "GoalTime_InLastOff"
},
{
"GoalName": "GoalTime_InReadyToTow"
},
{
"GoalName": "GoalTime_OutTowAtGate"
},
{
"GoalName": "GoalTime_OutStartBoarding",
"GoalTime": "2020-03-27T14:00:00-04:00"
}
]
}
Doing so allows you to write a fairly standard JSON retrieve in Snowflake with the following syntax:
SELECT GOALS_RAW:GoalTimeGroup[0].GoalName, GOALS_RAW:GoalTimeGroup[1].GoalName, GOALS_RAW:GoalTimeGroup[2].GoalName
FROM JSON_GOALS
UNION
SELECT GOALS_RAW:GoalTimeGroup[0].GoalTime, GOALS_RAW:GoalTimeGroup[1].GoalTime, GOALS_RAW:GoalTimeGroup[2].GoalName
FROM JSON_GOALS
;
This gives you closer to the answer you are looking for and seems to provide a simpler solution. You can also control how many rows you'd want based on your JSON object attributes for each GOAL object.
Recommendations to enhance this would be to create a function that could detect the depth of each nested element and perhaps auto generate the indexes for 'n' number of columns.
The library below provides a method called "ExecuteAll" which one of the params is "tags", so if you provide an array of tags and values, all of them will be parsed and validated plus keeping the features of the sql injection protection from Snowflake.
snowflake-multisql

Convert string with pipe delimited values to a JSON value in SQL Server

I have a Parameters_A column in table A which has data like this :
Number,3771|ScheduleTime,0.00:00:00|LastData,|DP_AddPaymentDetails_URL,NULL|DP_URL,https://facebook.com
I need to move it into Parameters_B column in table B with foreign key (ID_A) from table A like this:
[
{
"Name":"Number",
"Value":"3771"
},
{
"Name":"ScheduleTime",
"Value":"0.00:00:00"
},
{
"Name":"LastData",
"Value":""
},
{
"Name":"DP_AddPaymentDetails_URL",
"Value":"NULL"
},
{
"Name":"DP_URL",
"Value":"https://facebook.com"
}
]
JSON is just a string. Assuming the input has no single or double quotes, you can convert it to a JSON string in any SQL Server version. To do so you need to replace :
, with ", "Value":"
| with "},{"Name":"
Add [{"Name":" at the front and
"}] at the end
to get a proper JSON string.
This query :
select
'[{"Name":"' +
replace(
replace('Number,3771|ScheduleTime,0.00:00:00|LastData,|DP_AddPaymentDetails_URL,NULL|DP_URL,https://facebook.com',',','", "Value":"')
,'|','"},{"Name":"')
+'"}]'
Produces :
[{"Name":"Number", "Value":"3771"},{"Name":"ScheduleTime", "Value":"0.00:00:00"},{"Name":"LastData", "Value":""},{"Name":"DP_AddPaymentDetails_URL", "Value":"NULL"},{"Name":"DP_URL", "Value":"https://facebook.com"}]
After formatting :
[
{"Name":"Number", "Value":"3771"},
{"Name":"ScheduleTime", "Value":"0.00:00:00"},
{"Name":"LastData", "Value":""},
{"Name":"DP_AddPaymentDetails_URL", "Value":"NULL"},
{"Name":"DP_URL", "Value":"https://facebook.com"}
]
Parsing this to extract individual values though requires SQL Server 2016 and later

SQL Server - "for json path" statement does not return more than 2984 lines of JSON string

I'm trying to generate huge amount of data in a complex and nested JSON string using "for json path" statement, and I'm using multiple functions to create different parts of this JSON string, as follow:
declare #queue nvarchar(max)
select #queue = (
select x.ID as layoutID
, l.Title as layoutName
, JSON_QUERY(queue_objects (#productID, x.ID)) as [objects]
from Layouts x
inner join LayoutLanguages l on l.LayoutID = x.ID
where x.ID = #layoutid
group by x.ID, l.Title
for json path
)
select #queue as JSON
Thus far, JSON would be:
{
"root": [{
"layouts": [{
"layoutID": 5
, "layoutName": "foo"
, "objects": []
}]
}]
}
and the "queue_objects" function then would be called to fill out 'objects' array:
queue_objects
select 0 as objectID
, case when (select inherited_counter(#layoutID,0)) > 0 then 'false' else 'true' end as editable
, JSON_QUERY(queue_properties (p.Table2ID)) as propertyObjects
, JSON_QUERY('[]') as inherited
from productList p
where p.Table1ID = #productID
group by p.Table2ID
for json path
And then JSON would be:
{
"root": [{
"layouts": [{
"layoutID": 5
, "layoutName": "foo"
, "objects": [{
"objectID": 1000
, "editable": "true"
, "propertyObjects": []
, "inherited": []
}, {
"objectID": 2000
, "editable": "false"
, "propertyObjects": []
, "inherited": []
}]
}]
}]
}
Also "inherited_counter" and "queue_properties" functions would be called to fill corresponding keys.
This is just a sample, the code won't work as I'm not putting functions here.
But my question is: is it the functions that simultaneously call each other, makes the server return broken JSON string? or it's the server itself that can't handle JSON strings more than 2984 lines?
EDIT: what I mean by 2984 lines, is that I use beautifier on JSON, the server won't return the string line by line, it returns JSON broken, but after beautifying it happens to be 2984 lines of string.
As I wrote in my comment to the OP, this is probably due to SSMS has a limit of how many characters to display in a column in the result grid. It has no impact on the actual result, e.g. the result has all data, it is just that SSMS doesn't display it all.
To fix this, you can increase the number of characters SSMS retrieves:
I would not recommend that - "how long is a piece of string", but instead select the result into a nvarchar(max) variable, and PRINT that variable. That should give you the whole text.
Hope this helps!