Presto Query Engine - Running mongoDB query to extract JSON type data - hive

I am trying to query a record in Mongo with below schema in trino.
{
"_id": {
"$oid": "123456789010111213"
},
"table": "personaldatacollection",
"fields": [
{
"name": "eventString",
"type": "row(..)",
"hidden": false
},
{
"name": "personaldetailsmap",
"type": "JSON",
"hidden": false
}
]
}
"personaldetailsmap" is in JSON format and also it is an Array which can have Array of Arrays in side it. And there are more than 200 or more attributes inside "personaldetailsmap" which has to be to be represented as columns as shown in the below query. Is there any proper way to extract these fields without repetetively using json_extract_scalar(..) many times?
select _id as id,eventString,domaindetails,technicaldetails,processStages,personaldetailsmap,
json_extract_scalar(personaldetailsmap, '$.0.firtName.0') as firtName,
json_extract_scalar(personaldetailsmap, '$.0.middleName.0') as middleName,
json_extract_scalar(personaldetailsmap, '$.0.lastName.0') as lastName,
json_extract_scalar(personaldetailsmap, '$.0.initials.0') as initials,
json_extract_scalar(personaldetailsmap, '$.0.age.0') as age,
json_extract_scalar(personaldetailsmap, '$.0.birthMonth.0') as birthMonth,
json_extract_scalar(personaldetailsmap, '$.0.birthDate.0') as birthDate,
json_extract_scalar(personaldetailsmap, '$.0.birthYear.0') as birthYear,
.
.
.
.
.
from "test".db01.personaldatacollection;

I understand that you want to flatten the array fields using Trino. Instead of fetching these values one at a time you can simply flatten them using `CROSS JOIN UNNEST. Below is an example to flatten the JSON data you have shared .
select field.name,field.hidden from <table-name> CROSS JOIN UNNEST(fields) as t(field)

Related

Using PostgreSQL JSON function to obtain an array object from a JSON stored key

I have a table on AWS RDS PostgreSQL that stores JSON objects. For instance I have this registry:
{
"id": "87b05c62-4153-4341-9b58-e86bade25ffd",
"title": "Just Ok",
"rating": 2,
"gallery": [
{
"id": "1cb158af-0983-4bac-9e4f-0274b3836cdd",
"typeCode": "PHOTO"
},
{
"id": "aae64f19-22a8-4da7-b40a-fbbd8b2ef30b",
"typeCode": "PHOTO"
}
],
"reviewer": {
"memberId": "2acf2ea7-7a37-42d8-a019-3d9467cbdcd1",
},
"timestamp": {
"createdAt": "2011-03-30T09:52:36.000Z",
"updatedAt": "2011-03-30T09:52:36.000Z"
},
"isUserVerified": true,
}
And I would like to create a query for obtaining one of the gallery objects.
I have tried this but get both objects in the array:
SELECT jsonb_path_query(data->'gallery', '$[*]') AS content
FROM public.reviews
WHERE jsonb_path_query_first(data->'gallery', '$.id') ? '1cb158af-0983-4bac-9e4f-0274b3836cdd'
With this other query I get the first object:
SELECT jsonb_path_query_first(data->'gallery', '$[*]') AS content
FROM public.reviews
WHERE jsonb_path_query_first(data->'gallery', '$.id') ? '1cb158af-0983-4bac-9e4f-0274b3836cdd'
But filtering by the second array object id, I get no result:
SELECT jsonb_path_query_first(data->'gallery', '$[*]') AS content
FROM public.reviews
WHERE jsonb_path_query_first(data->'gallery', '$.id') ? 'aae64f19-22a8-4da7-b40a-fbbd8b2ef30b'
I have read the official documentation and tried other functions like jsonb_path_exists or jsonb_path_match on the where condition but was not able to make the query work.
Any help would be greatly appreciated. Thanks in advance.
I managed to get the query working as needed. Here is my proposal:
SELECT gallery
FROM public.reviews, jsonb_path_query(data->'gallery', '$[*]') as gallery
WHERE data->>'id' = '87b05c62-4153-4341-9b58-e86bade25ffd' and gallery->>'id' = 'aae64f19-22a8-4da7-b40a-fbbd8b2ef30b'
Hope it helps others.

How to check if json array already contains a certain key?

Let's say I have this json in my jsonb column
{
"fields": [
{
"name": "firstName",
},
{
"name": "lastName",
},
...
}
How can I know if the "firstName" already exist?
I've tried this so far
SELECT field->>'fields'
from person where (field->'name')::jsonb ? 'firstName';
Use the containment operator #>:
select field->>'fields'
from person
where field->'fields' #> '[{"name": "firstName"}]'
you can use json_array_elements to generate fields elements so you can filter based on 'name'.
SELECT field->>'fields', obj.*
from person, jsonb_array_elements_text(field->'fields') obj
where obj = '{"name": "firstName"}'
see dbfiddle

Get value from array in JSON in SQL Server

Let's say we have this json in our database table. I want to select value from tags. I already know how to get an array from this data but I don't know how to access array members. The question would be how do I get the first value from the array? Is there a function for this task?
{
"info": {
"type": 1,
"address": {
"town": "Bristol",
"county": "Avon",
"country": "England"
},
"tags": ["Sport", "Water polo"]
},
"type": "Basic"
}
Query I already have:
SELECT JSON_QUERY(MyTable.Data, '$.info.tags')
FROM MyTable
This returns me:
["Sport", "Water polo"]
How do I get
Sport
JSON_QUERY returns an object or array. You need JSON_VALUE to return a scalar value, eg :
SELECT JSON_VALUE(Data, '$.info.tags[0]')
from MyTable
Check the section Compare JSON_VALUE and JSON_QUERY in the docs for more examples

jsonb LIKE query on nested objects in an array

My JSON data looks like this:
[{
"id": 1,
"payload": {
"location": "NY",
"details": [{
"name": "cafe",
"cuisine": "mexican"
},
{
"name": "foody",
"cuisine": "italian"
}
]
}
}, {
"id": 2,
"payload": {
"location": "NY",
"details": [{
"name": "mbar",
"cuisine": "mexican"
},
{
"name": "fdy",
"cuisine": "italian"
}
]
}
}]
given a text "foo" I want to return all the tuples that have this substring. But I cannot figure out how to write the query for the same.
I followed this related answer but cannot figure out how to do LIKE.
This is what I have working right now:
SELECT r.res->>'name' AS feature_name, d.details::text
FROM restaurants r
, LATERAL (SELECT ARRAY (
SELECT * FROM json_populate_recordset(null::foo, r.res#>'{payload,
details}')
)
) AS d(details)
WHERE d.details #> '{cafe}';
Instead of passing the whole text of cafe I want to pass ca and get the results that match that text.
Your solution can be simplified some more:
SELECT r.res->>'name' AS feature_name, d.name AS detail_name
FROM restaurants r
, jsonb_populate_recordset(null::foo, r.res #> '{payload, details}') d
WHERE d.name LIKE '%oh%';
Or simpler, yet, with jsonb_array_elements() since you don't actually need the row type (foo) at all in this example:
SELECT r.res->>'name' AS feature_name, d->>'name' AS detail_name
FROM restaurants r
, jsonb_array_elements(r.res #> '{payload, details}') d
WHERE d->>'name' LIKE '%oh%';
db<>fiddle here
But that's not what you asked exactly:
I want to return all the tuples that have this substring.
You are returning all JSON array elements (0-n per base table row), where one particular key ('{payload,details,*,name}') matches (case-sensitively).
And your original question had a nested JSON array on top of this. You removed the outer array for this solution - I did the same.
Depending on your actual requirements the new text search capability of Postgres 10 might be useful.
I ended up doing this(inspired by this answer - jsonb query with nested objects in an array)
SELECT r.res->>'name' AS feature_name, d.details::text
FROM restaurants r
, LATERAL (
SELECT * FROM json_populate_recordset(null::foo, r.res#>'{payload, details}')
) AS d(details)
WHERE d.details LIKE '%oh%';
Fiddle here - http://sqlfiddle.com/#!15/f2027/5

How to perform a SELECT in the results returned from a GROUP BY Druid?

I am having a hard time converting this simple SQL Query below into Druid:
SELECT country, city, Count(*)
FROM people_data
WHERE name="Mary"
GROUP BY country, city;
So I came up with this query so far:
{
"queryType": "groupBy",
"dataSource" : "people_data",
"granularity": "all",
"metric" : "num_of_pages",
"dimensions": ["country", "city"],
"filter" : {
"type" : "and",
"fields" : [
{
"type": "in",
"dimension": "name",
"values": ["Mary"]
},
{
"type" : "javascript",
"dimension" : "email",
"function" : "function(value) { return (value.length !== 0) }"
}
]
},
"aggregations": [
{ "type": "longSum", "name": "num_of_pages", "fieldName": "count" }
],
"intervals": [ "2016-07-20/2016-07-21" ]
}
The query above runs but it doesn't seem like groupBy in the Druid datasource is even being evaluated since I see people in my output with names other than Mary. Does anyone have any input on how to make this work?
Simple answer is that you cannot select arbitrary dimensions in your groupBy queries.
Strictly speaking even SQL query does not make sense. If for a given combination of country, city there are many different values of name and street, then how do you squeeze that into a single row? You have to aggregate them, e.g. by using max function.
In this case you can include the same column in your data as both dimension and metric, e.g. name_dim and name_metric, and include corresponding aggregation over your metric, max(name_metric).
Please note, that if these columns, name etc, have high granularity values, then that will kill Druid's roll-up feature.