JSON Parsing in Snowflake - Square Brackets At Start - sql

I'm trying to parse out some JSON files in snowflake. In this case, I'd like to extract the "gift card" from the line that has "fulfillment_service": "gift_card". I've had success querying one dimensional JSON data, but this - with the square brackets - is confounding me.
Here's my simple query - I've created a small table called "TEST_WEEK"
select line_items:fulfillment_service
from TEST_WEEK
, lateral flatten(FULFILLMENTS:line_items) line_items;
Hopefully this isn't too basic a question. I'm very new with parsing JSON.
Thanks in advance!
Here's the start of the FULLFILLMENTS field with the info I want to get at.
[
{
"admin_graphql_api_id": "gid://shopify/Fulfillment/2191015870515",
"created_at": "2020-08-10T14:54:38Z",
"id": 2191015870515,
"line_items": [
{
"admin_graphql_api_id": "gid://shopify/LineItem/5050604355635",
"discount_allocations": [],
"fulfillable_quantity": 0,
"fulfillment_service": "gift_card",
"fulfillment_status": "fulfilled",
"gift_card": true,
"grams": 0,
"id": 5050604355635,
"name": "Gift Card - $100.00",
"origin_location": {
"address1": "100 Indian Road",
"address2": "",
"city": "Toronto",
"country_code": "CA",

Maybe you can use two lateral flatten to process values in line_items array:
Sample table:
create table TEST_WEEK( FULFILLMENTS variant ) as
select parse_json(
'[
{
"admin_graphql_api_id": "gid://shopify/Fulfillment/2191015870515",
"created_at": "2020-08-10T14:54:38Z",
"id": 2191015870515,
"line_items": [
{
"admin_graphql_api_id": "gid://shopify/LineItem/5050604355635",
"discount_allocations": [],
"fulfillable_quantity": 0,
"fulfillment_service": "gift_card",
"fulfillment_status": "fulfilled",
"gift_card": true,
"grams": 0,
"id": 5050604355635,
"name": "Gift Card - $100.00",
"origin_location": {
"address1": "100 Indian Road",
"address2": "",
"city": "Toronto",
"country_code": "CA"
}
}
]
}
]');
Sample query:
select s.VALUE:fulfillment_service
from TEST_WEEK,
lateral flatten( FULFILLMENTS ) f,
lateral flatten( f.VALUE:line_items ) s;
The output:
+-----------------------------+
| S.VALUE:FULFILLMENT_SERVICE |
+-----------------------------+
| "gift_card" |
+-----------------------------+

Those square brackets indicate that you have an array of JSON objects in your FULLFILLMENTS field. Unless there is a real need to have an array of objects in one field you should have a look at the STRIP_OUTER_ARRAY property of the COPY command. An example can be found here in the Snowflake documentation:
copy into <table>
from #~/<file>.json
file_format = (type = 'JSON' strip_outer_array = true);

In case others are stuck with same data issue (all json data in one array), I have this solution:
select f.VALUE:fulfillment_service::string
from TEST_WEEK,
lateral flatten( FULFILLMENTS[0].line_items ) f;
With this, you just grab the first element of the array (which is the only element).
If you have nested array elements, just add this to the lateral flatten:
, RECURSIVE => TRUE, mode => 'array'

Related

Looping JSON Array in JSONB field

I want to loop over JSONB column and get certain values (price, discount_price, and currency) of relevant JSON objects to my filter. But I get this error:
syntax error at or near "FOR"
Value of the parts column which is JSONB:
[
{
"item_tags": ["black", "optional"],
"name": "Keyboard",
"price": 50,
"currency": "USD",
"discount_price": 40
},
{
"item_tags": ["white", "optional"],
"name": "Mouse",
"price": 40,
"currency": "USD",
"discount_price": 30
}
]
My query ($1 is the user input. Can be 'optional' or 'required'):
SELECT
id,
title,
FOR element IN SELECT * FROM jsonb_array_elements(parts)
LOOP
CASE
WHEN element->'item_tags' #> $1
THEN SELECT element->>'discount_price' AS price, element->>'currency' AS currency
ELSE SELECT element->>'price' AS price, element->>'currency' AS currency
END
END LOOP
FROM items;
This is the output I want to get if $1 is equal to 'optional':
{
"id": 1,
"title": "example title",
"parts": [
{
"name": "Keyboard",
"discount_price": 40,
"currency": "USD"
},
{
"name": "Mouse",
"discount_price": 30,
"currency": "USD"
}
]
}
Any help is highly appreciated. I follow official docs but it is not beginner-friendly. I use PostgreSQL 13.
You need to unnest the array, filter out the unwanted parts, remove the unwanted key, then aggregate the changed parts back into a JSON array.
This can be done using a scalar sub-query:
select id, title,
(select jsonb_agg(x.part - 'item_tags')
from jsonb_array_elements(i.parts) as x(part)
where (x.part -> 'item_tags') ? 'optional')
from items i;
The expression x.part - 'item_tags' removes the item_tags key from the JSON object. The ? operator tests if the item_tags array contains the string on the right hand side. And jsonb_agg() then aggregates those JSON values back into an array.
You can pass your parameter in the place of the 'optional' string.

Nested json data not captured by JSON_TABLE in oracle sql

I'm using Oracle 12c(12.2) to read json data in a table.
SELECT jt.name,
jt.employee_id,
jt.company
FROM JSON_TABLE ( BFILENAME ('DB_DIR', 'vv.json')
i've nested data in json output. The key:value in nested data start with a value
"past_work": "N.A" for a record.
for other many records below it, have actual values like
"past_work": [{ "company": "XXXXX", "title": "XXXX"}]
but because first record done have value and start and end brackets [], oracle not capturing below records nested values.
any idea how to capture below records?
Example: Actual data like below
SELECT
jt.company,
jt.title
FROM
JSON_TABLE(
'{
"employee_data": [
{ "employee_id": "111",
"past_work": "N/A"
},
{ "employee_id": "222",
"past_work": [
{"company": "XXXXX", "title": "XXXX"},
{"company": "YYYYY", "title": "YYYY"}
]
},
{ "employee_id": "333",
"past_work": [
{"company": "XXXXX", "title": "XXXX"},
{"company": "YYYYY", "title": "YYYY"}
]
}
]
}',
'$.past_work[*]'
COLUMNS (
company VARCHAR2(100) PATH '$.company',
title VARCHAR2(100) PATH '$.title'
)
)
AS jt
now when i execute above statment, i'm getting null for company values for emplyee_id 333 and below.
Thanks
If past_work is supposed to be an array of past (company, title) pairs, then the proper way to encode "no history" is not to use a string value like "N/A", but instead you should use an empty array, as I show in the code below. If you do it your way, you can still extract the data, but it will be exceptionally messy. If you use JSON, use it correctly.
Also, you said you want to extract company and title. Just those? That makes no sense. Rather, you probably want to extract the employee id for each employee, along with the work history. In the work history, I add a column "for ordinality" (to show which company was first, which was second, etc.) If you don't need it, just leave it out.
To access nested columns, you must use the nested clause in the columns specification.
select employee_id, ord, company, title
from json_table(
'{
"employee_data": [
{ "employee_id": "111",
"past_work": [ ]
},
{ "employee_id": "222",
"past_work": [
{"company": "XXXXX", "title": "XXXX"},
{"company": "YYYYY", "title": "YYYY"}
]
},
{ "employee_id": "333",
"past_work": [
{"company": "XXXXX", "title": "XXXX"},
{"company": "YYYYY", "title": "YYYY"}
]
}
]
}', '$.employee_data[*]'
columns (
employee_id varchar2(10) path '$.employee_id',
nested path '$.past_work[*]'
columns (
ord for ordinality,
company varchar2(10) path '$.company',
title varchar2(10) path '$.title'
)
)
) jt
order by employee_id, ord;
Output:
EMPLOYEE_ID ORD COMPANY TITLE
----------- --- ------- -----
111
222 1 XXXXX XXXX
222 2 YYYYY YYYY
333 1 XXXXX XXXX
333 2 YYYYY YYYY
First, the json snippet is malformed, it MUST be surrounded by {} in order to be parsable as a json object...
{"past_work": [{ "company": "XXXXX", "title": "XXXX"}]}
Then, you can tell the json parser that you want to pull the rows from the past_work element...
JSON_TABLE(<yourJsonString>, '$.past_work[*]')
The [*] tells the parser that past_work is an array, and to process that array in to rows of json objects, rather than just return the whole array as a single json object.
That gives something like...
SELECT
jt.company,
jt.title
FROM
JSON_TABLE(
'{
"past_work": [
{"company": "XXXXX", "title": "XXXX"},
{"company": "YYYYY", "title": "YYYY"}
]
}',
'$.past_work[*]'
COLUMNS (
company VARCHAR2(100) PATH '$.company',
title VARCHAR2(100) PATH '$.title'
)
)
AS jt
db<>fiddle demo
For more details, I recommend reading the docs:
https://docs.oracle.com/database/121/SQLRF/functions092.htm#SQLRF56973
EDIT: Updated example, almost a copy and paste from the docs
Please Read The Docs!
SELECT
jt.*
FROM
JSON_TABLE(
'{
"XX_data":[
{
"employee_id": "E1",
"full_name": "E1 Admin",
"past_work": "N/A"
},
{
"employee_id": "E2",
"full_name": "E2 Admin",
"past_work": [
{"company": "E2 PW1 C", "title": "E2 PW1 T"},
{"company": "E2 PW2 C", "title": "E2 PW2 T"},
]
},
]
}',
'$.XX_data[*]'
COLUMNS (
employee_id VARCHAR2(100) PATH '$.employee_id',
full_name VARCHAR2(100) PATH '$.full_name',
past_work VARCHAR2(100) PATH '$.past_work',
NESTED PATH '$.past_work[*]'
COLUMNS (
past_work_company VARCHAR2(100) PATH '$.company',
past_work_title VARCHAR2(100) PATH '$.title'
)
)
)
AS jt
Another db<>fiddle demo
Thanks all for the Comments. Have asked product team to provide data in correct format.

JSON database table query

I have JSON table with some objects and I am trying to query the amount value in the object
{
"authorizations": [
{
"id": "d50",
"type": "passed",
"amount": 100,
"fortId": 5050,
"status": "GENERATED",
"voided": false,
"cardNumber": 3973,
"expireDate": null,
"description": "Success",
"customerCode": "858585",
"paymentMethod": "cash",
"changeDatetime": null,
"createDatetime": 000000000,
"reservationCode": "202020DD",
"authorizationCode": "D8787"
},
{
"id": "d50",
"type": "passed",
"amount": 100,
"fortId": 5050,
"status": "GENERATED",
"voided": false,
"cardNumber": 3973,
"expireDate": null,
"description": "Success",
"customerCode": "858585",
"paymentMethod": "cash",
"changeDatetime": null,
"createDatetime": 000000000,
"reservationCode": "202020DD",
"authorizationCode": "D8787"
}
],
}
I have tried the following four options, but none of these give me the value of the object:
SELECT info #> 'authorizations:[{amount}]'
FROM idv.reservations;
SELECT info -> 'authorizations:[{amount}]'
FROM idv.reservations;
info -> ''authorizations' ->> 'amount'
FROM idv.reservations
select (json_array_elements(info->'authorizations')->'amount')::int from idv.reservations
note I am using DBeaver
If you want one row per object contained in the "authorizations" JSON array, with the corresponding amount, you can use a lateral join and jsonb_array_elements():
select r.*, (x.obj ->> 'amount')::int as amount
from reservations r
cross join lateral jsonb_array_elements(r.info -> 'authorizations') x(obj)
We can also extract all amounts at once and put them in an array, like so:
select r.*,
jsonb_path_query_array(r.info, '$.authorizations[*].amount') as amounts
from reservations r
Demo on DB Fiddlde

I am trying to access the data stored in a snowflake table using python sql. Below is the columns given below i want to access

Below is the data-sample and i want to access columns value,start. This data i dumped in one column(DN) of a table (stg)
{
"ok": true,
"metrics": [
{
"name": "t_in",
"data": [{"value": 0, "group": {"start": "00:00"}}]
},
{
"name": "t_out",
"data": [{"value": 0,"group": {"start": "00:00"}}]
}
]
}
##consider many lines stored in same column in different rows.
Below query only fetched data for name. I want to access other columns value also. This query is a part of python script.
select
replace(DN : metrics[0].name , '"' , '')as metrics_name, #able to get
replace(DN : metrics[2].data , '"' , '')as metrics_data_value,##suggestion needed
replace(DN : metrics.data.start, '"','') as metrics_start, ##suggestion needed
replace(DN : metrics.data.group.finish, '"','') as metrics_finish, ##suggestion needed
from stg
Do i need to iterate over data and group? If yes, please suggest the code.
Here is an example of how to query that data.
Set up sample data:
create or replace transient table test_db.public.stg (DN variant);
insert overwrite into test_db.public.stg (DN)
select parse_json('{
"ok": true,
"metrics": [
{
"name": "t_in",
"data": [
{"value": 0, "group": {"start": "00:00"}}
]
},
{
"name": "t_out",
"data": [
{"value": 0,"group": {"start": "00:00"}}
]
}
]
}');
Select statement example:
select
DN:metrics[0].name::STRING,
DN:metrics[1].data,
DN:metrics[1].data[0].group.start::TIME,
DN:metrics[1].data[0].group.finish::TIME
from test_db.public.stg;
Instead of querying individual indexes of the JSON arrays, I think you'll want to use the flatten function which is documented here.
Here is how you do it with the flatten which is what I am guessing you want:
select
mtr.value:name::string,
dta.value,
dta.value:group.start::string,
dta.value:group.finish::string
from test_db.public.stg stg,
lateral flatten(input => stg.DN:metrics) mtr,
lateral flatten(input => mtr.value:data) dta

How to "zip" multiple nested JSON arrays without using id key?

I'm trying to merge some nested JSON arrays without looking at the id. Currently I'm getting this when I make a GET request to /surveyresponses:
{
"surveys": [
{
"id": 1,
"name": "survey 1",
"isGuest": true,
"house_id": 1
},
{
"id": 2,
"name": "survey 2",
"isGuest": false,
"house_id": 1
},
{
"id": 3,
"name": "survey 3",
"isGuest": true,
"house_id": 2
}
],
"responses": [
{
"question": "what is this anyways?",
"answer": "test 1"
},
{
"question": "why?",
"answer": "test 2"
},
{
"question": "testy?",
"answer": "test 3"
}
]
}
But I would like to get it where each survey has its own question and answers so something like this:
{
"surveys": [
{
"id": 1,
"name": "survey 1",
"isGuest": true,
"house_id": 1
"question": "what is this anyways?",
"answer": "test 1"
}
]
}
Because I'm not going to a specific id I'm not sure how to make the relationship work. This is the current query I have that's producing those results.
export function getSurveyResponse(id: number): QueryBuilder {
return db('surveys')
.join('questions', 'questions.survey_id', '=', 'surveys.id')
.join('questionAnswers', 'questionAnswers.question_id', '=', 'questions.id')
.select('surveys.name', 'questions.question', 'questions.question', 'questionAnswers.answer')
.where({ survey_id: id, question_id: id })
}
Assuming jsonb in current Postgres 10 or 11, this query does the job:
SELECT t.data, to_jsonb(s) AS new_data
FROM t
LEFT JOIN LATERAL (
SELECT jsonb_agg(s || r) AS surveys
FROM (
SELECT jsonb_array_elements(t.data->'surveys') s
, jsonb_array_elements(t.data->'responses') r
) sub
) s ON true;
db<>fiddle here
I unnest both nested JSON arrays in parallel to get the desired behavior of "zipping" both directly. The number of elements in both nested JSON arrays has to match or you need to do more (else you lose data).
This builds on implementation details of how Postgres deals with multiple set-returning functions in a SELECT list to make it short and fast. See:
What is the expected behaviour for multiple set-returning functions in select clause?
One could be more explicit with a ROWS FROM expression, which works properly since Postgres 9.4:
SELECT t.data
, to_jsonb(s) AS new_data
FROM tbl t
LEFT JOIN LATERAL (
SELECT jsonb_agg(s || r) AS surveys
FROM ROWS FROM (jsonb_array_elements(t.data->'surveys')
, jsonb_array_elements(t.data->'responses')) sub(s,r)
) s ON true;
The manual about combining multiple table functions.
Or you could use WITH ORDINALITY to get original order of elements and combine as you wish:
PostgreSQL unnest() with element number