How do I extract all of the values from a JSON element?

How do I extract all of the values from a JSON element? - sql

How can I extract all of the values for the element account_code? The below SELECT statement lets me extract any single value associated with index [x] but I want to extract all the values (each in it's own row) such that the output is:
account_codes
------------
1
2
3
SELECT
JSON_EXTRACT_SCALAR(v, '$.accounting[0].account_code') AS account_codes
FROM (VALUES JSON '
{"accounting":
[
{"account_code": "1", "account_name": "Travel"},
{"account_code": "2", "account_name": "Salary"},
{"account_code": "3", "account_name": "Equipment"},
]
}'
) AS t(v)

The operator you need to use is unnest which will flatten the array and fetch you all the column values. Below is the query and DDL in hive catalog I used to create table and fetch all account codes
DDL :
CREATE EXTERNAL TABLE `sf_73515497`(
`accounting` array<struct<account_code:string,account_name:string>> COMMENT 'from deserializer')
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'paths'='accounting')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://path-to-json-prefix/'
SQL with unnest:
WITH dataset AS (
SELECT accounting from "sf_73515497"
)
SELECT t.accounts.account_code FROM dataset
CROSS JOIN UNNEST(accounting) as t(accounts)

Related

How would I convert this JSON stored in a database column into a table with columns and rows?

Given the below JSON data and table that holds the data, how can I query the values and write to a new table with rows and columns split?
Basic table that contains the JSON:
CREATE TABLE BI.DataTable
(
JsonDataText VARCHAR(MAX)
);
JSON data:
{
"datatable": {
"data": [
[
"ZSFH",
"99999",
"2022-08-31",
571106
],
[
"ZSFH",
"99999",
"2022-07-31",
578530
],
[
"ZSFH",
"99999",
"2022-06-30",
582233
],
[
"ZSFH",
"99999",
"2022-05-31",
581718
]
]
}
}
When I use the JSON_VALUE function, I get a null for each column.
SELECT
JSON_VALUE (JsonDataText, '$.data[0]') AS MetricCode,
JSON_VALUE (JsonDataText, '$.data[1]') AS RegionID,
JSON_VALUE (JsonDataText, '$.data[2]') AS ReportingPeriod,
JSON_VALUE (JsonDataText, '$.data[3]') AS MetricValue
FROM BI.DataTable
WHERE ISJSON(JsonDataText) > 0

Using OPENJSON() with the appropriate columns definitions and an additioinal APPLY operator is a possible approach:
SELECT j.*
FROM DataTable d
OUTER APPLY OPENJSON(d.JsonDataText, '$.datatable.data') WITH (
MetricCode varchar(4) '$[0]',
RegionID varchar(5) '$[1]',
ReportingPeriod varchar(10) '$[2]',
MetricValue int '$[3]'
) j
WHERE ISJSON(d.JsonDataText) = 1
Result:
MetricCode
RegionID
ReportingPeriod
MetricValue
ZSFH
99999
2022-08-31
571106
ZSFH
99999
2022-07-31
578530
ZSFH
99999
2022-06-30
582233
ZSFH
99999
2022-05-31
581718
Note, that the reasons for the NULL values returned from your current statement are:
JSON_VALUE() extracts a scalar value from a JSON string, but the $.datatable.data part of the stored JSON is a JSON array with JSON arrays as items, so in this situation you need to use JSON_QUERY().
The $.data[0] path expression is wrong.
The following example (using JSON_QUERY() and the correct path) extracts an item from the $.datatable.data JSON array:
SELECT
JSON_QUERY(JsonDataText, '$.datatable.data[0]') AS MetricCode,
JSON_QUERY(JsonDataText, '$.datatable.data[1]') AS RegionID,
JSON_QUERY(JsonDataText, '$.datatable.data[2]') AS ReportingPeriod,
JSON_QUERY(JsonDataText, '$.datatable.data[3]') AS MetricValue
FROM DataTable
WHERE ISJSON(JsonDataText) > 0

Json Arrays of objects PostgreSQL Table format

I have a JSON file (array of objects) which I have to convert into a table format using a PostgreSQL query.
Follow Sample Data.
"b", "c", "d", "e" are to be extracted as separate tables as they are arrays and in these arrays, there are objects
I have tried using json_populate_recordset() but it only works if I have a single array.
[{a:"1",b:"2"},{a:"10",b:"20"}]
I have referred to some links and codes.
jsonb_array_element example
postgreSQL functions
Expected Output
Sample Data:
{
"b":[
{columnB1:value, columnB2:value},
{columnB1:value, columnB2:value},
],
"c":[
{columnC1:value, columnC2:value, columnC3:value},
{columnC1:value, columnC2:value, columnC3:value},
{columnC1:value, columnC2:value, columnC3:value}
],
"d":[
{columnD1:value, columnD2:value},
{columnD1:value, columnD2:value},
],
"e":[
{columnE1:value, columnE2:value},
]
}
expected output
b should be one table in which columnA1 and columnA2 are displayed with their values.
Similarly table c, d, e with their respective columns and values.
Expected Output

You can use jsonb_to_recordset() but you need to unnest your JSON. You need to do this inline as this is a JSON Processing Function which cannot used derived values.
I am using validated JSON as simplified and formatted at end of this answer
To unnest your JSON use below notation which extracts JSON object field with the given key.
--one level
select '{"a":1}'::json->'a'
result : 1
--two levels
select '{"a":{"b":[2]}}'::json->'a'->'b'
result : [2]
We now expand this to include json_to_recordset()
select * from
json_to_recordset(
'{"a":{"b":[{"f1":2,"f2":4},{"f1":3,"f2":6}]}}'::json->'a'->'b' --inner table b
)
as x("f1" int, "f2" int); --fields from table b
or using json_array_elements. Either way we need to list our fields. With second solution type will be json not int so you cant sum etc
with b as (select json_array_elements('{"a":{"b":[{"f1":2,"f2":4},{"f1":3,"f2":6}]}}'::json->'a'->'b') as jx)
select jx->'f1' as f1, jx->'f2' as f2 from b;
Output
f1 f2
2 4
3 6
We now use your data structure in jsonb_to_recordset()
select * from jsonb_to_recordset( '{"a":{"b":[{"columnname1b":"value1b","columnname2b":"value2b"},{"columnname1b":"value","columnname2b":"value"}],"c":[{"columnname1":"value","columnname2":"value"},{"columnname1":"value","columnname2":"value"},{"columnname1":"value","columnname2":"value"}]}}'::jsonb->'a'->'b') as x(columnname1b text, columnname2b text);
Output:
columnname1b columnname2b
value1b value2b
value value
For table c
select * from jsonb_to_recordset( '{"a":{"b":[{"columnname1b":"value1b","columnname2b":"value2b"},{"columnname1b":"value","columnname2b":"value"}],"c":[{"columnname1":"value","columnname2":"value"},{"columnname1":"value","columnname2":"value"},{"columnname1":"value","columnname2":"value"}]}}'::jsonb->'a'->'c') as x(columnname1 text, columnname2 text);
Output
columnname1 columnname2
value value
value value
value value
Sample JSON
{
"a": {
"b": [
{
"columnname1b": "value1b",
"columnname2b": "value2b"
},
{
"columnname1b": "value",
"columnname2b": "value"
}
],
"c": [
{
"columnname1": "value",
"columnname2": "value"
},
{
"columnname1": "value",
"columnname2": "value"
},
{
"columnname1": "value",
"columnname2": "value"
}
]
}
}

Well, I came up with some ideas, here is one that worked. I was able to get one table at a time.
https://www.postgresql.org/docs/9.5/functions-json.html
I am using json_populate_recordset.
The column used in the first select statement comes from a table whose column is a JSON type which we are trying to extract into a table.
The 'tablename from column' in the json_populate_recordset function, is the table we are trying to extract followed with b its columns and datatypes.
WITH input AS(
SELECT cast(column as json) as a
FROM tablename
)
SELECT b.*
FROM input c,
json_populate_recordset(NULL::record,c.a->'tablename from column') as b(columnname1 datatype, columnname2 datatype)

amazon athena create request with partitions

I create a table with partitions as follow : first by year, month, and day.
Question : I hope get data of 12/2017 and 03/2018, how I can do this?
What I think do :
where (year='2017' and month='12') and ( year ='2018' and month='03')
Is it correct? I will not have a confusion so Amazon Athena get data of:
12/2017 and 03/2018 and 03/2017 and 12/2018
because of the and operator ?
PS: I can't test, I have only free account.
Thanks.

Anyway, I tried in a mini set of data and I found that Amazon Athena take into account the parenthesis.
My test is as follow :
The DDl of table as générated :
CREATE EXTERNAL TABLE `manyands`(
`years` int COMMENT 'from deserializer',
`months` int COMMENT 'from deserializer',
`days` int COMMENT 'from deserializer')
PARTITIONED BY (
`year` string,
`month` string)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION
's3://mybucket/'
My set of data test:
My tests:
1- SELECT * FROM "atlasdatabase"."manyands" where month='1';
I got in CSV format :
"years","months","days","year","month"
"2017","1","21","2017","1"
"2018","1","81","2018","1"
2- SELECT * FROM "atlasdatabase"."manyands" where month='1' and year='2017';
"years","months","days","year","month"
"2017","1","21","2017","1"
3- SELECT * FROM "atlasdatabase"."manyands" where (month='1' and year='2018') and (month='3' and year='2017') ;
empty (Zéro enregistrements renvoyés)
4- SELECT * FROM "atlasdatabase"."manyands" where (month='1' and year='2018') or (month='3' ) ;
"years","months","days","year","month"
"2018","1","81","2018","1"
"2017","3","73","2017","3"
"2018","3","73","2018","3"
Conclusion : add OR operator between many instances of the partitions.

How to querying data from Amazon S3

I am looking to create a Tableau Dashboard with data originated on Amazon DynamoDB. Right now I am sending the data to a bucket on Amazon S3 using Amazon Lambda and I am getting this file on the S3 bucket,
{
"Items": [
{
"payload": {
"phase": "T",
"tms_event": "2017-03-16 18:19:50",
"id_UM": 0,
"num_severity_level": 0,
"event_value": 1,
"int_status": 0
},
"deviceId": 6,
"tms_event": "2017-03-16 18:19:50"
}
]
}
I trying to use Amazon Athena to create a connection with Tableau but the payload attribute is giving me problems and I am not getting any results when I do the SELECT query.
This is the Athena Table,
CREATE EXTERNAL TABLE IF NOT EXISTS default.iot_table_test (
`payload` map<string,string>,
`deviceId` int,
`tms_event` string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1'
) LOCATION 's3://iot-logging/'
TBLPROPERTIES ('has_encrypted_data'='false')
Thanks,
Alejandro

Your table does not look like it matches your data, because your data has a top-level Items array. Without restructing the JSON data files, I think you would need a table definition like this:
CREATE EXTERNAL TABLE IF NOT EXISTS default.iot_table_test_items (
`Items` ARRAY<
STRUCT<
`payload`: MAP<string, string>,
`deviceId`: int,
`tms_event`: string
>
>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1'
) LOCATION 's3://iot-logging/'
TBLPROPERTIES ('has_encrypted_data'='false')
and then query it unnesting the Items array:
SELECT
item.deviceId,
item.tms_event,
item.payload
FROM
default.iot_table_test_items
CROSS JOIN UNNEST (Items) AS i (item)
LIMIT 10;

Store multiple elements in json files in AWS Athena

I have some json files stored in a S3 bucket , where each file has multiple elements of same structure. For example,
[{"eventId":"1","eventName":"INSERT","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"New item!","Id":101}},{"eventId":"2","eventName":"MODIFY","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}},{"eventId":"3","eventName":"REMOVE","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}}]
I want to create a table in Athena corresponding to above data.
The query I wrote for creating the table:
CREATE EXTERNAL TABLE IF NOT EXISTS sampledb.elb_logs2 (
`eventId` string,
`eventName` string,
`eventVersion` string,
`eventSource` string,
`awsRegion` string,
`image` map<string,string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'field.delim' = ' '
) LOCATION 's3://<bucketname>/';
But if I do a SELECT query as follows,
SELECT * FROM sampledb.elb_logs4;
I get the following result:
1 {"eventid":"1","eventversion":"1.0","image":{"id":"101","message":"New item!"},"eventsource":"aws:dynamodb","eventname":"INSERT","awsregion":"us-west-2"} {"eventid":"2","eventversion":"1.0","image":{"id":"101","message":"This item has changed"},"eventsource":"aws:dynamodb","eventname":"MODIFY","awsregion":"us-west-2"} {"eventid":"3","eventversion":"1.0","image":{"id":"101","message":"This item has changed"},"eventsource":"aws:dynamodb","eventname":"REMOVE","awsregion":"us-west-2"}
The entire content of the json file is picked as one entry here.
How can I read each element of json file as one entry?
Edit: How can I read each subcolumn of image, i.e., each element of the map?
Thanks.

Question1: Store multiple elements in json files for AWS Athena
I need to rewrite my json file as
{"eventId":"1","eventName":"INSERT","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"New item!","Id":101}}, {"eventId":"2","eventName":"MODIFY","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}}, {"eventId":"3","eventName":"REMOVE","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}}
That means
Remove the square brackets [ ] Keep each element in one line
{.....................}
{.....................}
{.....................}
Question2. Access nonlinear json attributes
CREATE EXTERNAL TABLE IF NOT EXISTS <tablename> (
`eventId` string,
`eventName` string,
`eventVersion` string,
`eventSource` string,
`awsRegion` string,
`image` struct <`Id` : string,
`Message` : string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
"dots.in.keys" = "true"
) LOCATION 's3://exampletablewithstream-us-west-2/';
Query:
select image.Id, image.message from <tablename>;
Ref:
http://engineering.skybettingandgaming.com/2015/01/20/parsing-json-in-hive/
https://github.com/rcongiu/Hive-JSON-Serde#mapping-hive-keywords

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How do I extract all of the values from a JSON element? - sql

Related

How would I convert this JSON stored in a database column into a table with columns and rows?

Json Arrays of objects PostgreSQL Table format

amazon athena create request with partitions

How to querying data from Amazon S3

Store multiple elements in json files in AWS Athena

Categories

Resources