Average of numeric values in Postgres JSON column - sql

I have a jsonb column with the following structure:
{
"key1": {
"type": "...",
"label": "...",
"variables": [
{
"label": "Height",
"value": 131315.9289,
"variable": "myVar1"
},
{
"label": "Width",
"value": 61085.7525,
"variable": "myVar2"
}
]
},
}
I want to query for the average height across all rows. The top-level key values are unknown, so I have something like this:
select id,
avg((latVars ->> 'value')::numeric) as avg
from "MyTable",
jsonb_array_elements((my_json_field->jsonb_object_keys(my_json_field)->>'variables')::jsonb) as latVars
where my_json_field is not null
group by id;
It's throwing the following error:
ERROR: set-returning functions must appear at top level of FROM
Moving the jsonb_array_elements function above MyTable in the FROM clause doesn't work.
I'm following the basic advice found in this SO answer to no avail.
Any advice?

jsonb_array_elements is not relevant until my_json_field is a json array at the top level.
You can use instead the jsonb_path_query function based on the jsonpath language if postgres >= 12 :
select id
, avg(v.value :: numeric) as avg
from "MyTable"
, jsonb_path_query(my_json_field, '$.*.variables[*] ? (#.label == "Height").value') AS v(value)
where my_json_field is not null
group by id;

Related

Querying over PostgreSQL JSONB column

I have a table "blobs" with a column "metadata" in jsonb data-type,
Example:
{
"total_count": 2,
"items": [
{
"name": "somename",
"metadata": {
"metas": [
{
"id": "11258",
"score": 6.1,
"status": "active",
"published_at": "2019-04-20T00:29:00",
"nvd_modified_at": "2022-04-06T18:07:00"
},
{
"id": "9251",
"score": 5.1,
"status": "active",
"published_at": "2018-01-18T23:29:00",
"nvd_modified_at": "2021-01-08T12:15:00"
}
]
}
]
}
I want to identify statuses in the "metas" array that match with certain, given strings. I have tried the following so far but without results:
SELECT * FROM blobs
WHERE metadata is not null AND
(
SELECT count(*) FROM jsonb_array_elements(metadata->'metas') AS cn
WHERE cn->>'status' IN ('active','reported')
) > 0;
It would also be sufficient if I could compare the string with "status" in the first array object.
I am using PostgreSQL 9.6.24
for some clarity I usually break code into series of WITH statements. My idea for your problem would be to use json path (https://www.postgresql.org/docs/12/functions-json.html#FUNCTIONS-SQLJSON-PATH) and function jsonb_path_query.
Below code gives a list of counts, I will leave the rest to you, to get final data.
I've added ID column just to have something to join on. Otherwise join on metadata.
Also, note additional " in where condition. Left join in blob_ext is there just to have null value if metadata is not present or that path does not work.
with blob as (
select row_number() over()"id", * from (VALUES
(
'{
"total_count": 2,
"items": [
{
"name": "somename",
"metadata": {
"metas": [
{
"id": "11258",
"score": 6.1,
"status": "active",
"published_at": "2019-04-20T00:29:00",
"nvd_modified_at": "2022-04-06T18:07:00"
},
{
"id": "9251",
"score": 5.1,
"status": "active",
"published_at": "2018-01-18T23:29:00",
"nvd_modified_at": "2021-01-08T12:15:00"
}
]
}
}
]}'::jsonb),
(null::jsonb)) b(metadata)
)
, blob_ext as (
select bb.*, blob_sts.status
from blob bb
left join (
select
bb2.id,
jsonb_path_query (bb2.metadata::jsonb, '$.items[*].metadata.metas[*].status'::jsonpath)::character varying "status"
FROM blob bb2
) as blob_sts ON
blob_sts.id = bb.id
)
select bbe.id, count(*) cnt, bbe.metadata
from blob_ext bbe
where bbe.status in ('"active"', '"reported"')
group by bbe.id, bbe.metadata;
A way is to peel one layer at a time with jsonb_extract_path() and jsonb_array_elements():
with cte_items as (
select id,
metadata,
jsonb_extract_path(jx.value,'metadata','metas') as metas
from blobs,
lateral jsonb_array_elements(jsonb_extract_path(metadata,'items')) as jx),
cte_metas as (
select id,
metadata,
jsonb_extract_path_text(s.value,'status') as status
from cte_items,
lateral jsonb_array_elements(metas) s)
select distinct
id,
metadata
from cte_metas
where status in ('active','reported');

PostgreSQL how to query jsonb by a value?

SELECT * FROM some_table;
I can query to get the following results:
{
"sku0": {
"Id": "18418",
"Desc": "yes"
},
"sku1": {
"Id": "17636",
"Desc": "no"
},
"sku2": {
"Id": "206714",
"Desc": "yes"
},
"brand": "abc",
"displayName": "something"
}
First, the number of skus is not fixed. It may be sku0, sku1, sku2, sku3, sku4 ... but they all start with sku.
Then, I want to query Id with 17636 and determine whether its value of Desc is yes or no. After reading the PostgreSQL JSON Functions and Operators documentation, Depressing I didn't find a good way.
I can convert the result into a Python dictionary, and then use python's method can easily achieve my requirements.
If the requirements can also be achieved with postgresql statements, which method is more recommended than the Python dictionary?
I am not sure I completely understand what the result is you want. But if you want to filter on the Id, you need to unnest all the elements inside the JSON column:
select d.v ->> 'Desc' as description
from the_table t
cross join jsonb_each(t.data) as d(k,v)
where d.v ->> 'Id' = '17636'
You could use the new jsonpath notation of PostgreSQL v12:
SELECT data ## '$.* ? (#.Id == "17636").Desc == "yes"'
FROM some_table;
That will start with the root of data ($), find any attribute in it (*), filter only those attributes that contain an Id with value "17636", get their Desc attribute and return TRUE only if that attribute is "yes".
Nice, isn't it?
This will probably give you what you need.
select value->>'Desc' from jsonb_each('{
"sku0": {
"Id": "18418",
"Desc": "yes"
},
"sku1": {
"Id": "17636",
"Desc": "no"
},
"sku2": {
"Id": "206714",
"Desc": "yes"
},
"brand": "abc",
"displayName": "something"
}'::jsonb)
where key like 'sku%'
and value->>'Id'='17636'
Best regards,
Bjarni

How to "zip" multiple nested JSON arrays without using id key?

I'm trying to merge some nested JSON arrays without looking at the id. Currently I'm getting this when I make a GET request to /surveyresponses:
{
"surveys": [
{
"id": 1,
"name": "survey 1",
"isGuest": true,
"house_id": 1
},
{
"id": 2,
"name": "survey 2",
"isGuest": false,
"house_id": 1
},
{
"id": 3,
"name": "survey 3",
"isGuest": true,
"house_id": 2
}
],
"responses": [
{
"question": "what is this anyways?",
"answer": "test 1"
},
{
"question": "why?",
"answer": "test 2"
},
{
"question": "testy?",
"answer": "test 3"
}
]
}
But I would like to get it where each survey has its own question and answers so something like this:
{
"surveys": [
{
"id": 1,
"name": "survey 1",
"isGuest": true,
"house_id": 1
"question": "what is this anyways?",
"answer": "test 1"
}
]
}
Because I'm not going to a specific id I'm not sure how to make the relationship work. This is the current query I have that's producing those results.
export function getSurveyResponse(id: number): QueryBuilder {
return db('surveys')
.join('questions', 'questions.survey_id', '=', 'surveys.id')
.join('questionAnswers', 'questionAnswers.question_id', '=', 'questions.id')
.select('surveys.name', 'questions.question', 'questions.question', 'questionAnswers.answer')
.where({ survey_id: id, question_id: id })
}
Assuming jsonb in current Postgres 10 or 11, this query does the job:
SELECT t.data, to_jsonb(s) AS new_data
FROM t
LEFT JOIN LATERAL (
SELECT jsonb_agg(s || r) AS surveys
FROM (
SELECT jsonb_array_elements(t.data->'surveys') s
, jsonb_array_elements(t.data->'responses') r
) sub
) s ON true;
db<>fiddle here
I unnest both nested JSON arrays in parallel to get the desired behavior of "zipping" both directly. The number of elements in both nested JSON arrays has to match or you need to do more (else you lose data).
This builds on implementation details of how Postgres deals with multiple set-returning functions in a SELECT list to make it short and fast. See:
What is the expected behaviour for multiple set-returning functions in select clause?
One could be more explicit with a ROWS FROM expression, which works properly since Postgres 9.4:
SELECT t.data
, to_jsonb(s) AS new_data
FROM tbl t
LEFT JOIN LATERAL (
SELECT jsonb_agg(s || r) AS surveys
FROM ROWS FROM (jsonb_array_elements(t.data->'surveys')
, jsonb_array_elements(t.data->'responses')) sub(s,r)
) s ON true;
The manual about combining multiple table functions.
Or you could use WITH ORDINALITY to get original order of elements and combine as you wish:
PostgreSQL unnest() with element number

How to generate JSON array from multiple rows, then return with values of another table

I am trying to build a query which combines rows of one table into a JSON array, I then want that array to be part of the return.
I know how to do a simple query like
SELECT *
FROM public.template
WHERE id=1
And I have worked out how to produce the JSON array that I want
SELECT array_to_json(array_agg(to_json(fields)))
FROM (
SELECT id, name, format, data
FROM public.field
WHERE template_id = 1
) fields
However, I cannot work out how to combine the two, so that the result is a number of fields from public.template with the output of the second query being one of the returned fields.
I am using PostGreSQL 9.6.6
Edit, as requested more information, a definition of field and template tables and a sample of each queries output.
Currently, I have a JSONB row on the template table which I am using to store an array of fields, but I want to move fields to their own table so that I can more easily enforce a schema on them.
Template table contains:
id
name
data
organisation_id
But I would like to remove data and replace it with the field table which contains:
id
name
format
data
template_id
At the moment the output of the first query is:
{
"id": 1,
"name": "Test Template",
"data": [
{
"id": "1",
"data": null,
"name": "Assigned User",
"format": "String"
},
{
"id": "2",
"data": null,
"name": "Office",
"format": "String"
},
{
"id": "3",
"data": null,
"name": "Department",
"format": "String"
}
],
"id_organisation": 1
}
This output is what I would like to recreate using one query and both tables. The second query outputs this, but I do not know how to merge it into a single query:
[{
"id": 1,
"name": "Assigned User",
"format": "String",
"data": null
},{
"id": 2,
"name": "Office",
"format": "String",
"data": null
},{
"id": 3,
"name": "Department",
"format": "String",
"data": null
}]
The feature you're looking for is json concatenation. You can do that by using the operator ||. It's available since PostgreSQL 9.5
SELECT to_jsonb(template.*) || jsonb_build_object('data', (SELECT to_jsonb(field) WHERE template_id = templates.id)) FROM template
Sorry for poorly phrasing what I was trying to achieve, after hours of Googling I have worked it out and it was a lot more simple than I thought in my ignorance.
SELECT id, name, data
FROM public.template, (
SELECT array_to_json(array_agg(to_json(fields)))
FROM (
SELECT id, name, format, data
FROM public.field
WHERE template_id = 1
) fields
) as data
WHERE id = 1
I wanted the result of the subquery to be a column in the ouput rather than compiling the entire output table as a JSON.

How to perform a SELECT in the results returned from a GROUP BY Druid?

I am having a hard time converting this simple SQL Query below into Druid:
SELECT country, city, Count(*)
FROM people_data
WHERE name="Mary"
GROUP BY country, city;
So I came up with this query so far:
{
"queryType": "groupBy",
"dataSource" : "people_data",
"granularity": "all",
"metric" : "num_of_pages",
"dimensions": ["country", "city"],
"filter" : {
"type" : "and",
"fields" : [
{
"type": "in",
"dimension": "name",
"values": ["Mary"]
},
{
"type" : "javascript",
"dimension" : "email",
"function" : "function(value) { return (value.length !== 0) }"
}
]
},
"aggregations": [
{ "type": "longSum", "name": "num_of_pages", "fieldName": "count" }
],
"intervals": [ "2016-07-20/2016-07-21" ]
}
The query above runs but it doesn't seem like groupBy in the Druid datasource is even being evaluated since I see people in my output with names other than Mary. Does anyone have any input on how to make this work?
Simple answer is that you cannot select arbitrary dimensions in your groupBy queries.
Strictly speaking even SQL query does not make sense. If for a given combination of country, city there are many different values of name and street, then how do you squeeze that into a single row? You have to aggregate them, e.g. by using max function.
In this case you can include the same column in your data as both dimension and metric, e.g. name_dim and name_metric, and include corresponding aggregation over your metric, max(name_metric).
Please note, that if these columns, name etc, have high granularity values, then that will kill Druid's roll-up feature.