I have some json similar to the json below stored in a postgres json column. I'm trying query it to identify some incorrectly entered data. I'm basically looking for addresses where the house description is the same as the house number. I can't quite work out how to do it.
{
"timestamp": "2014-10-23T16:15:28+01:00",
"schools": [
{
"school_id": "1",
"addresses": [
{
"town": "Birmingham",
"house_description": "1",
"street_name": "Parklands",
"addr_id": "4",
"postcode": "B5 8KL",
"house_no": "1",
"address_type": "UK"
},
{
"town": "Plymouth",
"house_description": "Flat a",
"street_name": "Fore Street",
"addr_id": "2",
"postcode": "PL9 8AY",
"house_no": "15",
"address_type": "UK"
}
]
},
{
"school_id": "2",
"addresses": [
{
"town": "Coventry",
"street_name": "Shipley Way",
"addr_id": "19",
"postcode": "CV8 3DL",
"house_no": "662",
"address_type": "UK"
}
]
}
]
}
I have written this sql which will find where the data matches:
select *
FROM title_register_data
where address_data->'schools'->0->'addresses'->0->>'house_description'=
address_data->'schools'->0->'addresses'->0->>'house_no'
This obviously only works on the first address on the first school. Is there a way of querying all of the addresses of every school?
Use jsonb_array_elements() in lateral, join as many times as the depth of a json array which elements you want to compare:
select
schools->>'school_id' school_id,
addresses->>'addr_id' addr_id,
addresses->>'house_description' house_description,
addresses->>'house_no' house_no
from title_register_data,
jsonb_array_elements(address_data->'schools') schools,
jsonb_array_elements(schools->'addresses') addresses
where addresses->>'house_description' = addresses->>'house_no';
school_id | addr_id | house_description | house_no
-----------+---------+-------------------+----------
1 | 4 | 1 | 1
(1 row)
Related
I'm using Oracle 12c(12.2) to read json data in a table.
SELECT jt.name,
jt.employee_id,
jt.company
FROM JSON_TABLE ( BFILENAME ('DB_DIR', 'vv.json')
i've nested data in json output. The key:value in nested data start with a value
"past_work": "N.A" for a record.
for other many records below it, have actual values like
"past_work": [{ "company": "XXXXX", "title": "XXXX"}]
but because first record done have value and start and end brackets [], oracle not capturing below records nested values.
any idea how to capture below records?
Example: Actual data like below
SELECT
jt.company,
jt.title
FROM
JSON_TABLE(
'{
"employee_data": [
{ "employee_id": "111",
"past_work": "N/A"
},
{ "employee_id": "222",
"past_work": [
{"company": "XXXXX", "title": "XXXX"},
{"company": "YYYYY", "title": "YYYY"}
]
},
{ "employee_id": "333",
"past_work": [
{"company": "XXXXX", "title": "XXXX"},
{"company": "YYYYY", "title": "YYYY"}
]
}
]
}',
'$.past_work[*]'
COLUMNS (
company VARCHAR2(100) PATH '$.company',
title VARCHAR2(100) PATH '$.title'
)
)
AS jt
now when i execute above statment, i'm getting null for company values for emplyee_id 333 and below.
Thanks
If past_work is supposed to be an array of past (company, title) pairs, then the proper way to encode "no history" is not to use a string value like "N/A", but instead you should use an empty array, as I show in the code below. If you do it your way, you can still extract the data, but it will be exceptionally messy. If you use JSON, use it correctly.
Also, you said you want to extract company and title. Just those? That makes no sense. Rather, you probably want to extract the employee id for each employee, along with the work history. In the work history, I add a column "for ordinality" (to show which company was first, which was second, etc.) If you don't need it, just leave it out.
To access nested columns, you must use the nested clause in the columns specification.
select employee_id, ord, company, title
from json_table(
'{
"employee_data": [
{ "employee_id": "111",
"past_work": [ ]
},
{ "employee_id": "222",
"past_work": [
{"company": "XXXXX", "title": "XXXX"},
{"company": "YYYYY", "title": "YYYY"}
]
},
{ "employee_id": "333",
"past_work": [
{"company": "XXXXX", "title": "XXXX"},
{"company": "YYYYY", "title": "YYYY"}
]
}
]
}', '$.employee_data[*]'
columns (
employee_id varchar2(10) path '$.employee_id',
nested path '$.past_work[*]'
columns (
ord for ordinality,
company varchar2(10) path '$.company',
title varchar2(10) path '$.title'
)
)
) jt
order by employee_id, ord;
Output:
EMPLOYEE_ID ORD COMPANY TITLE
----------- --- ------- -----
111
222 1 XXXXX XXXX
222 2 YYYYY YYYY
333 1 XXXXX XXXX
333 2 YYYYY YYYY
First, the json snippet is malformed, it MUST be surrounded by {} in order to be parsable as a json object...
{"past_work": [{ "company": "XXXXX", "title": "XXXX"}]}
Then, you can tell the json parser that you want to pull the rows from the past_work element...
JSON_TABLE(<yourJsonString>, '$.past_work[*]')
The [*] tells the parser that past_work is an array, and to process that array in to rows of json objects, rather than just return the whole array as a single json object.
That gives something like...
SELECT
jt.company,
jt.title
FROM
JSON_TABLE(
'{
"past_work": [
{"company": "XXXXX", "title": "XXXX"},
{"company": "YYYYY", "title": "YYYY"}
]
}',
'$.past_work[*]'
COLUMNS (
company VARCHAR2(100) PATH '$.company',
title VARCHAR2(100) PATH '$.title'
)
)
AS jt
db<>fiddle demo
For more details, I recommend reading the docs:
https://docs.oracle.com/database/121/SQLRF/functions092.htm#SQLRF56973
EDIT: Updated example, almost a copy and paste from the docs
Please Read The Docs!
SELECT
jt.*
FROM
JSON_TABLE(
'{
"XX_data":[
{
"employee_id": "E1",
"full_name": "E1 Admin",
"past_work": "N/A"
},
{
"employee_id": "E2",
"full_name": "E2 Admin",
"past_work": [
{"company": "E2 PW1 C", "title": "E2 PW1 T"},
{"company": "E2 PW2 C", "title": "E2 PW2 T"},
]
},
]
}',
'$.XX_data[*]'
COLUMNS (
employee_id VARCHAR2(100) PATH '$.employee_id',
full_name VARCHAR2(100) PATH '$.full_name',
past_work VARCHAR2(100) PATH '$.past_work',
NESTED PATH '$.past_work[*]'
COLUMNS (
past_work_company VARCHAR2(100) PATH '$.company',
past_work_title VARCHAR2(100) PATH '$.title'
)
)
)
AS jt
Another db<>fiddle demo
Thanks all for the Comments. Have asked product team to provide data in correct format.
I'm trying to parse out some JSON files in snowflake. In this case, I'd like to extract the "gift card" from the line that has "fulfillment_service": "gift_card". I've had success querying one dimensional JSON data, but this - with the square brackets - is confounding me.
Here's my simple query - I've created a small table called "TEST_WEEK"
select line_items:fulfillment_service
from TEST_WEEK
, lateral flatten(FULFILLMENTS:line_items) line_items;
Hopefully this isn't too basic a question. I'm very new with parsing JSON.
Thanks in advance!
Here's the start of the FULLFILLMENTS field with the info I want to get at.
[
{
"admin_graphql_api_id": "gid://shopify/Fulfillment/2191015870515",
"created_at": "2020-08-10T14:54:38Z",
"id": 2191015870515,
"line_items": [
{
"admin_graphql_api_id": "gid://shopify/LineItem/5050604355635",
"discount_allocations": [],
"fulfillable_quantity": 0,
"fulfillment_service": "gift_card",
"fulfillment_status": "fulfilled",
"gift_card": true,
"grams": 0,
"id": 5050604355635,
"name": "Gift Card - $100.00",
"origin_location": {
"address1": "100 Indian Road",
"address2": "",
"city": "Toronto",
"country_code": "CA",
Maybe you can use two lateral flatten to process values in line_items array:
Sample table:
create table TEST_WEEK( FULFILLMENTS variant ) as
select parse_json(
'[
{
"admin_graphql_api_id": "gid://shopify/Fulfillment/2191015870515",
"created_at": "2020-08-10T14:54:38Z",
"id": 2191015870515,
"line_items": [
{
"admin_graphql_api_id": "gid://shopify/LineItem/5050604355635",
"discount_allocations": [],
"fulfillable_quantity": 0,
"fulfillment_service": "gift_card",
"fulfillment_status": "fulfilled",
"gift_card": true,
"grams": 0,
"id": 5050604355635,
"name": "Gift Card - $100.00",
"origin_location": {
"address1": "100 Indian Road",
"address2": "",
"city": "Toronto",
"country_code": "CA"
}
}
]
}
]');
Sample query:
select s.VALUE:fulfillment_service
from TEST_WEEK,
lateral flatten( FULFILLMENTS ) f,
lateral flatten( f.VALUE:line_items ) s;
The output:
+-----------------------------+
| S.VALUE:FULFILLMENT_SERVICE |
+-----------------------------+
| "gift_card" |
+-----------------------------+
Those square brackets indicate that you have an array of JSON objects in your FULLFILLMENTS field. Unless there is a real need to have an array of objects in one field you should have a look at the STRIP_OUTER_ARRAY property of the COPY command. An example can be found here in the Snowflake documentation:
copy into <table>
from #~/<file>.json
file_format = (type = 'JSON' strip_outer_array = true);
In case others are stuck with same data issue (all json data in one array), I have this solution:
select f.VALUE:fulfillment_service::string
from TEST_WEEK,
lateral flatten( FULFILLMENTS[0].line_items ) f;
With this, you just grab the first element of the array (which is the only element).
If you have nested array elements, just add this to the lateral flatten:
, RECURSIVE => TRUE, mode => 'array'
Hi I have analytics events data moved from firebase to BigQuery and need to create visualization in PowerBI using that BigQuery dataset. I'm able to access the dataset in PowerBI but some fields are in array type I generally use UNNEST while querying in console but how to run the query inside PowerBI. Is there any other option available? Thanks.
Table In BigQuery
What we did until the driver fully supports arrays is to flatten in a view: create a view in bigquery with UNNEST() and query that in PBI instead.
You might need to Transform(parse Json into columns/rows) your specific column in your case event_params
So I have below Json as example for you.
{
"quiz": {
"sport": {
"q1": {
"question": "Which one is correct team name in NBA?",
"options": [
"New York Bulls",
"Los Angeles Kings",
"Golden State Warriros",
"Huston Rocket"
],
"answer": "Huston Rocket"
}
},
"maths": {
"q1": {
"question": "5 + 7 = ?",
"options": [
"10",
"11",
"12",
"13"
],
"answer": "12"
},
"q2": {
"question": "12 - 8 = ?",
"options": [
"1",
"2",
"3",
"4"
],
"answer": "4"
}
}
}
}
I had this json added to my table. currently it has only 1 column
Now I go to Edit queries and go on Transform Tab, there you find Parse, In my case I have Json
When you parse as Json you will have expandable column
Now click on expanding it and sometimes it asks for expand to new row.
Finally you will have such a Table
I have a CSV file with several million rows, and want to load it as a PostgreSQL table. One of the rows in the column 'json_doc' as an example contains:
{"id": <>,
"base":
{"ateco":
[
{
"code": "<>",
"rootCode": "<>",
"description": "<>"
}
],
"founded": "<>",
"legalName": "<>",
"legalForms":
[
{
"name": "<>",
"level": <>
},
{
"name": "<>",
"level": <>
}
]
},
"name": "<>",
"people":
{
"items":
[
{
"name": "<>",
"givenName": "<>",
"familyName": "<>"
}
]
},
"country": "<>",
"locations": {}
}
Which as you can see has many nested dictionaries. And there are several million of these.
I'd like to get this file into an SQL table with even the sub-dictionary values in their own columns. How can I do this? It would seem I have to use some sort of name spacing technique for the nested data as there are some duplicate keys i.e. 'name'.
The data will be analysed using Pandas, but I'd like to get this straight into Postgres if possible. Any assistance greatly appreciated.
The result will look like:
id | base_ateco_code | etc | base_ateco_legalForms_name | etc |
Unless there are any ideas about this - it's a pretty open project from my employer - I just need to be able to use this information as part of a JOIN with another table.
Many thanks.
Can I create an inner join on free base that uses another query?
I tried to create a join between 2 querys:
The first, select all the artist that the genre is the same as nirvana without nirvana:
[{
"id": null,
"name": null,
"name!=": "nirvana",
"type": "/music/artist",
"genre": [{
"name|=": [
"Punk rock",
"Grunge",
"Alternative rock",
"Rock music",
"Hardcore punk"
]
}]
}]
the second, select all the genre of nirvana:
[{
"name": "nirvana",
"type": "/music/musical_group",
"/music/artist/genre": []
}]
I want to create a query like this but it does not work.
[{
"id": null,
"name": null,
"name!=": "nirvana",
"type": "/music/artist",
"genre": [{
"name|=":
[{
"name": "nirvana",
"type": "/music/musical_group",
"/music/artist/genre": []
}]
}]
}]
I'm not sure why you need to do this all as one query. Since Nirvana's music spans a number of very popular genres your query will probably return thousands of results which will mean that you'll have to make multiple API requests to get all the results either way.
In any case, here's a MQL query that finds all the bands which have at least one music genre in common with Nirvana:
[{
"id": null,
"name": null,
"/music/artist/genre": [{
"id": null,
"name": null,
"!/music/artist/genre": {
"id": "/m/0b1zz",
"name": null
}
}]
}]
The exclamation mark in front of the 2nd genre property means that we want the inverse relationship ie. "music artists with this genre" instead of "music genres for this artist".
Note that I've used the MID (/m/0b1zz) to represent Nirvana. You shouldn't be using names to identify topics in a query since they're not unique. You want results for the Nirvana started by Kurt Cobain in 1987, not just any band named Nirvana.