Nested json data not captured by JSON_TABLE in oracle sql - sql

I'm using Oracle 12c(12.2) to read json data in a table.
SELECT jt.name,
jt.employee_id,
jt.company
FROM JSON_TABLE ( BFILENAME ('DB_DIR', 'vv.json')
i've nested data in json output. The key:value in nested data start with a value
"past_work": "N.A" for a record.
for other many records below it, have actual values like
"past_work": [{ "company": "XXXXX", "title": "XXXX"}]
but because first record done have value and start and end brackets [], oracle not capturing below records nested values.
any idea how to capture below records?
Example: Actual data like below
SELECT
jt.company,
jt.title
FROM
JSON_TABLE(
'{
"employee_data": [
{ "employee_id": "111",
"past_work": "N/A"
},
{ "employee_id": "222",
"past_work": [
{"company": "XXXXX", "title": "XXXX"},
{"company": "YYYYY", "title": "YYYY"}
]
},
{ "employee_id": "333",
"past_work": [
{"company": "XXXXX", "title": "XXXX"},
{"company": "YYYYY", "title": "YYYY"}
]
}
]
}',
'$.past_work[*]'
COLUMNS (
company VARCHAR2(100) PATH '$.company',
title VARCHAR2(100) PATH '$.title'
)
)
AS jt
now when i execute above statment, i'm getting null for company values for emplyee_id 333 and below.
Thanks

If past_work is supposed to be an array of past (company, title) pairs, then the proper way to encode "no history" is not to use a string value like "N/A", but instead you should use an empty array, as I show in the code below. If you do it your way, you can still extract the data, but it will be exceptionally messy. If you use JSON, use it correctly.
Also, you said you want to extract company and title. Just those? That makes no sense. Rather, you probably want to extract the employee id for each employee, along with the work history. In the work history, I add a column "for ordinality" (to show which company was first, which was second, etc.) If you don't need it, just leave it out.
To access nested columns, you must use the nested clause in the columns specification.
select employee_id, ord, company, title
from json_table(
'{
"employee_data": [
{ "employee_id": "111",
"past_work": [ ]
},
{ "employee_id": "222",
"past_work": [
{"company": "XXXXX", "title": "XXXX"},
{"company": "YYYYY", "title": "YYYY"}
]
},
{ "employee_id": "333",
"past_work": [
{"company": "XXXXX", "title": "XXXX"},
{"company": "YYYYY", "title": "YYYY"}
]
}
]
}', '$.employee_data[*]'
columns (
employee_id varchar2(10) path '$.employee_id',
nested path '$.past_work[*]'
columns (
ord for ordinality,
company varchar2(10) path '$.company',
title varchar2(10) path '$.title'
)
)
) jt
order by employee_id, ord;
Output:
EMPLOYEE_ID ORD COMPANY TITLE
----------- --- ------- -----
111
222 1 XXXXX XXXX
222 2 YYYYY YYYY
333 1 XXXXX XXXX
333 2 YYYYY YYYY

First, the json snippet is malformed, it MUST be surrounded by {} in order to be parsable as a json object...
{"past_work": [{ "company": "XXXXX", "title": "XXXX"}]}
Then, you can tell the json parser that you want to pull the rows from the past_work element...
JSON_TABLE(<yourJsonString>, '$.past_work[*]')
The [*] tells the parser that past_work is an array, and to process that array in to rows of json objects, rather than just return the whole array as a single json object.
That gives something like...
SELECT
jt.company,
jt.title
FROM
JSON_TABLE(
'{
"past_work": [
{"company": "XXXXX", "title": "XXXX"},
{"company": "YYYYY", "title": "YYYY"}
]
}',
'$.past_work[*]'
COLUMNS (
company VARCHAR2(100) PATH '$.company',
title VARCHAR2(100) PATH '$.title'
)
)
AS jt
db<>fiddle demo
For more details, I recommend reading the docs:
https://docs.oracle.com/database/121/SQLRF/functions092.htm#SQLRF56973
EDIT: Updated example, almost a copy and paste from the docs
Please Read The Docs!
SELECT
jt.*
FROM
JSON_TABLE(
'{
"XX_data":[
{
"employee_id": "E1",
"full_name": "E1 Admin",
"past_work": "N/A"
},
{
"employee_id": "E2",
"full_name": "E2 Admin",
"past_work": [
{"company": "E2 PW1 C", "title": "E2 PW1 T"},
{"company": "E2 PW2 C", "title": "E2 PW2 T"},
]
},
]
}',
'$.XX_data[*]'
COLUMNS (
employee_id VARCHAR2(100) PATH '$.employee_id',
full_name VARCHAR2(100) PATH '$.full_name',
past_work VARCHAR2(100) PATH '$.past_work',
NESTED PATH '$.past_work[*]'
COLUMNS (
past_work_company VARCHAR2(100) PATH '$.company',
past_work_title VARCHAR2(100) PATH '$.title'
)
)
)
AS jt
Another db<>fiddle demo

Thanks all for the Comments. Have asked product team to provide data in correct format.

Related

How to Add Multiple JSON_BUILD_OBJECT entries to a JSON_AGG

I am using PostgreSQL 9.4 and I have a requirement to have an 'addresses' array which contains closed JSON objects for different types of address (residential and correspondence). The structure should look like this:
[
{
"addresses": [
{
"addressLine1": "string",
"type": "residential"
},
{
"addressLine1": "string",
"type": "correspondence"
}
],
"lastName": "string"
}
]
...and here's some example data to illustrate the desired result:
[
{
"addresses": [
{
"addressLine1": "54 ASHFIELD PADDOCK",
"type": "residential"
},
{
"addressLine1": "135 MERRION HILL",
"type": "correspondence"
}
],
"lastName": "WRIGHT"
},
{
"addresses": [
{
"addressLine1": "13 BOAKES GROVE",
"type": "residential"
},
{
"addressLine1": "46 BEACONSFIELD GRANGE",
"type": "correspondence"
}
],
"lastName": "DOHERTY"
}
]
This is where I've gotten to with my SQL:
SELECT
json_agg(
json_build_object('addresses',(SELECT json_agg(json_build_object('addressLine1',c2.address_line_1,
'type',c2.address_type
)
)
FROM my_customer_table c2
WHERE c2.person_id=c.person_id
),
'addresses',(SELECT json_agg(json_build_object('addressLine1',c2.corr_address_line_1,
'type',c2.corr_address_type
)
)
FROM my_customer_table c2
WHERE c2.person_id=c.person_id
),
'lastName',c.surname
)
) AS customer_json
FROM
my_customer_table c
WHERE
c.corr_address_type IS NOT NULL /*exclude customers without correspondence addresses*/
...and this runs, however it repeats the 'addresses' object twice and has an array for each address variant, not around the overall array.
What I stupidly thought would work, is the following:
SELECT
json_agg(
json_build_object('addresses',(SELECT json_agg(json_build_object('addressLine1',c2.address_line_1,
'type',c2.address_type
),
json_build_object('addressLine1',c2.corr_address_line_1,
'type',c2.corr_address_type
)
)
FROM my_customer_table c2
WHERE c2.person_id=c.person_id
),
'lastName',c.surname
)
) AS customer_json
FROM
my_customer_table c
WHERE
c.corr_address_type IS NOT NULL /*exclude customers without correspondence addresses*/
...however this throws an error:
"ERROR: function json_agg(json, json) does not exist.
LINE 3: json_build_object('addresses',(SELECT json_agg(json_build_o...
HINT: No function matches the given name and argument types. You might need to add explicit type casts."
I've Googled this, however no posts found seem to relate to the same kind of result I'm trying to get.
Does anyone know if it's possible to have multiple JSON_BUILD_OBJECT entries inside of an array?
A colleague has found a solution to this. It's totally different to the way I was approaching it, but works nicely. Here's the working code:
SELECT
json_agg(subquery.customer_json) AS customer_json
FROM
(
SELECT
row_to_json(t) AS customer_json
FROM (
SELECT
(
SELECT array_to_json(array_agg(addresses_union))
FROM (
SELECT
c.address_line_1 AS "addressLine1",
c.address_type as type
UNION ALL
SELECT
c.corr_address_Line_1 AS "addressLine1",
c.corr_address_type as type
) AS addresses_union
) as addresses,
c.surname AS "lastName"
FROM
my_customer_table c
WHERE
c.corr_address_type IS NOT NULL /*exclude customers without correspondence address*/
) t
) subquery

Joining tables with filter condition on multiple columns in Oracle DBMS

{
"description": "test",
"id": "1",
"name": "test",
"prod": [
{
"id": "1",
"name": "name",
"re": [
{
"name": "name1",
"value": "1"
},
{
"name": "name2",
"value": "1"
},
{
"name": "name3",
"value": "0"
},
{
"name": "name4",
"value": "0"
}
]
}
]
}
Here is the best I can do with your JSON input and your sample output.
Note that your document has a unique "id" and "name" ("1" and "test" in your example). Then it has an array named "productSpecificationRelationship". Each element of this array is an object with its own "id" - in the query, I show this id with the column name PSR_ID (PSR for Product Specification Relationship). Also, each object in this first-level array contains a sub-array (second level), with objects with "name" ("name" again!) and "value" keys. (This looks very much like an entity-attribute-value model - very poor practice.) In the intermediate step in my query (before pivoting), I call these RC_NAME and RC_VALUE (RC for Relationship Characteristic).
In your sample output you have more than one value in the ID and NAME columns. I don't see how that is possible; perhaps from unpacking more than one document? The JSON document you shared with us has "id" and "name" as top-level attributes.
In the output, I understand (or rather, assume, since I didn't understand too much from your question) that you should also include the PSR_ID - there is only one in your document, with value "10499", but in principle there may be more than one, and the output will have one row per such id.
Also, I assume the "name" values are limited to the four you mentioned (or, if there can be more, you are only interested in those four in the output).
With all that said, here is the query. Note that I called the table ES for simplicity. Also, you will see that I had to go to nested path twice (since your document includes an array of arrays, and I wanted to pick up the PSR_ID from the outer array and the tokens from the nested arrays).
TABLE SETUP
create table es (payloadentityspecification clob
check (payloadentityspecification is json) );
insert into es (payloadentityspecification) values (
'{
"description": "test",
"id": "1",
"name": "test",
"productSpecificationRelationship": [
{
"id": "10499",
"relationshipType": "channelRelation",
"relationshipCharacteristic": [
{
"name": "out_of_home",
"value": "1"
},
{
"name": "out_of_home_ios",
"value": "1"
},
{
"name": "out_of_home_android",
"value": "0"
},
{
"name": "out_of_home_web",
"value": "0"
}
]
}
]
}');
commit;
QUERY
with
prep (id, name, psr_id, rc_name, rc_value) as (
select id, name, psr_id, rc_name, rc_value
from es,
json_table(payloadentityspecification, '$'
columns (
id varchar2(10) path '$.id',
name varchar2(40) path '$.name',
nested path '$.productSpecificationRelationship[*]'
columns (
psr_id varchar2(10) path '$.id',
nested path '$.relationshipCharacteristic[*]'
columns (
rc_name varchar2(50) path '$.name',
rc_value varchar2(50) path '$.value'
)
)
)
)
)
select id, name, psr_id, ooh, ooh_android, ooh_ios, ooh_web
from prep
pivot ( min(case rc_value when '1' then 'TRUE'
when '0' then 'FALSE' else 'UNDEFINED' end)
for rc_name in ( 'out_of_home' as ooh,
'out_of_home_android' as ooh_android,
'out_of_home_ios' as ooh_ios,
'out_of_home_web' as ooh_web
)
)
;
OUTPUT
ID NAME PSR_ID OOH OOH_ANDROID OOH_IOS OOH_WEB
-- ---- ------ ----------- ----------- ----------- -----------
1 test 10499 TRUE FALSE TRUE FALSE
Conditional aggregation might be used in order to pivot the result set after extracting the values by using JSON_TABLE() and JSON_VALUE() functions such as
SELECT JSON_VALUE(payloadentityspecification, '$.name') AS channel_map_name,
MAX(CASE WHEN name = 'out_of_home' THEN
DECODE(value,1,'TRUE',0,'FALSE','UNDEFINED')
END) AS ooh,
MAX(CASE WHEN name = 'out_of_home_android' THEN
DECODE(value,1,'TRUE',0,'FALSE','UNDEFINED')
END) AS ooh_android,
MAX(CASE WHEN name = 'out_of_home_ios' THEN
DECODE(value,1,'TRUE',0,'FALSE','UNDEFINED')
END) AS ooh_ios,
MAX(CASE WHEN name = 'out_of_home_web' THEN
DECODE(value,1,'TRUE',0,'FALSE','UNDEFINED')
END) AS ooh_web
FROM EntitySpecification ES,
JSON_TABLE (payloadentityspecification, '$.productSpecificationRelationship[*]'
COLUMNS ( NESTED PATH '$.relationshipCharacteristic[*]'
COLUMNS (
description VARCHAR2(250) PATH '$.description',
name VARCHAR2(250) PATH '$.name',
value VARCHAR2(250) PATH '$.value'
)
)) jt
WHERE payloadentityspecification IS JSON
GROUP BY JSON_VALUE(payloadentityspecification, '$.name')
Demo

JSON Parsing in Snowflake - Square Brackets At Start

I'm trying to parse out some JSON files in snowflake. In this case, I'd like to extract the "gift card" from the line that has "fulfillment_service": "gift_card". I've had success querying one dimensional JSON data, but this - with the square brackets - is confounding me.
Here's my simple query - I've created a small table called "TEST_WEEK"
select line_items:fulfillment_service
from TEST_WEEK
, lateral flatten(FULFILLMENTS:line_items) line_items;
Hopefully this isn't too basic a question. I'm very new with parsing JSON.
Thanks in advance!
Here's the start of the FULLFILLMENTS field with the info I want to get at.
[
{
"admin_graphql_api_id": "gid://shopify/Fulfillment/2191015870515",
"created_at": "2020-08-10T14:54:38Z",
"id": 2191015870515,
"line_items": [
{
"admin_graphql_api_id": "gid://shopify/LineItem/5050604355635",
"discount_allocations": [],
"fulfillable_quantity": 0,
"fulfillment_service": "gift_card",
"fulfillment_status": "fulfilled",
"gift_card": true,
"grams": 0,
"id": 5050604355635,
"name": "Gift Card - $100.00",
"origin_location": {
"address1": "100 Indian Road",
"address2": "",
"city": "Toronto",
"country_code": "CA",
Maybe you can use two lateral flatten to process values in line_items array:
Sample table:
create table TEST_WEEK( FULFILLMENTS variant ) as
select parse_json(
'[
{
"admin_graphql_api_id": "gid://shopify/Fulfillment/2191015870515",
"created_at": "2020-08-10T14:54:38Z",
"id": 2191015870515,
"line_items": [
{
"admin_graphql_api_id": "gid://shopify/LineItem/5050604355635",
"discount_allocations": [],
"fulfillable_quantity": 0,
"fulfillment_service": "gift_card",
"fulfillment_status": "fulfilled",
"gift_card": true,
"grams": 0,
"id": 5050604355635,
"name": "Gift Card - $100.00",
"origin_location": {
"address1": "100 Indian Road",
"address2": "",
"city": "Toronto",
"country_code": "CA"
}
}
]
}
]');
Sample query:
select s.VALUE:fulfillment_service
from TEST_WEEK,
lateral flatten( FULFILLMENTS ) f,
lateral flatten( f.VALUE:line_items ) s;
The output:
+-----------------------------+
| S.VALUE:FULFILLMENT_SERVICE |
+-----------------------------+
| "gift_card" |
+-----------------------------+
Those square brackets indicate that you have an array of JSON objects in your FULLFILLMENTS field. Unless there is a real need to have an array of objects in one field you should have a look at the STRIP_OUTER_ARRAY property of the COPY command. An example can be found here in the Snowflake documentation:
copy into <table>
from #~/<file>.json
file_format = (type = 'JSON' strip_outer_array = true);
In case others are stuck with same data issue (all json data in one array), I have this solution:
select f.VALUE:fulfillment_service::string
from TEST_WEEK,
lateral flatten( FULFILLMENTS[0].line_items ) f;
With this, you just grab the first element of the array (which is the only element).
If you have nested array elements, just add this to the lateral flatten:
, RECURSIVE => TRUE, mode => 'array'

I am trying to access the data stored in a snowflake table using python sql. Below is the columns given below i want to access

Below is the data-sample and i want to access columns value,start. This data i dumped in one column(DN) of a table (stg)
{
"ok": true,
"metrics": [
{
"name": "t_in",
"data": [{"value": 0, "group": {"start": "00:00"}}]
},
{
"name": "t_out",
"data": [{"value": 0,"group": {"start": "00:00"}}]
}
]
}
##consider many lines stored in same column in different rows.
Below query only fetched data for name. I want to access other columns value also. This query is a part of python script.
select
replace(DN : metrics[0].name , '"' , '')as metrics_name, #able to get
replace(DN : metrics[2].data , '"' , '')as metrics_data_value,##suggestion needed
replace(DN : metrics.data.start, '"','') as metrics_start, ##suggestion needed
replace(DN : metrics.data.group.finish, '"','') as metrics_finish, ##suggestion needed
from stg
Do i need to iterate over data and group? If yes, please suggest the code.
Here is an example of how to query that data.
Set up sample data:
create or replace transient table test_db.public.stg (DN variant);
insert overwrite into test_db.public.stg (DN)
select parse_json('{
"ok": true,
"metrics": [
{
"name": "t_in",
"data": [
{"value": 0, "group": {"start": "00:00"}}
]
},
{
"name": "t_out",
"data": [
{"value": 0,"group": {"start": "00:00"}}
]
}
]
}');
Select statement example:
select
DN:metrics[0].name::STRING,
DN:metrics[1].data,
DN:metrics[1].data[0].group.start::TIME,
DN:metrics[1].data[0].group.finish::TIME
from test_db.public.stg;
Instead of querying individual indexes of the JSON arrays, I think you'll want to use the flatten function which is documented here.
Here is how you do it with the flatten which is what I am guessing you want:
select
mtr.value:name::string,
dta.value,
dta.value:group.start::string,
dta.value:group.finish::string
from test_db.public.stg stg,
lateral flatten(input => stg.DN:metrics) mtr,
lateral flatten(input => mtr.value:data) dta

How to query nested arrays in a postgres json column?

I have some json similar to the json below stored in a postgres json column. I'm trying query it to identify some incorrectly entered data. I'm basically looking for addresses where the house description is the same as the house number. I can't quite work out how to do it.
{
"timestamp": "2014-10-23T16:15:28+01:00",
"schools": [
{
"school_id": "1",
"addresses": [
{
"town": "Birmingham",
"house_description": "1",
"street_name": "Parklands",
"addr_id": "4",
"postcode": "B5 8KL",
"house_no": "1",
"address_type": "UK"
},
{
"town": "Plymouth",
"house_description": "Flat a",
"street_name": "Fore Street",
"addr_id": "2",
"postcode": "PL9 8AY",
"house_no": "15",
"address_type": "UK"
}
]
},
{
"school_id": "2",
"addresses": [
{
"town": "Coventry",
"street_name": "Shipley Way",
"addr_id": "19",
"postcode": "CV8 3DL",
"house_no": "662",
"address_type": "UK"
}
]
}
]
}
I have written this sql which will find where the data matches:
select *
FROM title_register_data
where address_data->'schools'->0->'addresses'->0->>'house_description'=
address_data->'schools'->0->'addresses'->0->>'house_no'
This obviously only works on the first address on the first school. Is there a way of querying all of the addresses of every school?
Use jsonb_array_elements() in lateral, join as many times as the depth of a json array which elements you want to compare:
select
schools->>'school_id' school_id,
addresses->>'addr_id' addr_id,
addresses->>'house_description' house_description,
addresses->>'house_no' house_no
from title_register_data,
jsonb_array_elements(address_data->'schools') schools,
jsonb_array_elements(schools->'addresses') addresses
where addresses->>'house_description' = addresses->>'house_no';
school_id | addr_id | house_description | house_no
-----------+---------+-------------------+----------
1 | 4 | 1 | 1
(1 row)