How to group results by values that are inside json array in postgreSQL - sql

I have a column of type jSONB that have data like this:
column name: used_filters
row number 1 example:
{ "categories" : ["economic", "Social"], "tags": ["world" ,"eco-friendly"] }
row number 2 example:
{ "categories" : ["economic"], "tags": ["eco-friendly"] , "keywords" : ["2050"] }
I want to group the result to get the most frequent value for each one of the keys
something like this:
key
most_freq
category
economic
tags
eco-friendly
keyword
2050
the keys are not constant and could be something other than the example I said but I know that they will be frequent.

You can extract keys and values as arrays first by using jsonb_each, and then unnest the generated arrays by jsonb_array_elements_text. The rest is classical aggregation along with sorting through the count values by window function such as
SELECT key, value
FROM ( SELECT j.key, jj.value,
RANK() OVER (PARTITION BY j.key ORDER BY COUNT(*) DESC)
FROM t,
LATERAL jsonb_each(js) AS j,
LATERAL jsonb_array_elements_text(j.value) AS jj
GROUP BY j.key, jj.value ) AS q
WHERE rank = 1
Demo

Related

Postgres JSONB select json objects as columns

The table has an id and step_data columns.
id | step_data
--------------
a1 | {...}
a2 | {...}
a3 | {...}
Where step_data is nested structure as follows, where the actual keys of the metadata can be in of the events objects.
{
"events": [
{
"timestamp": "2021-04-07T17:46:13.739Z",
"meta": [
{
"key": "length",
"value": "0.898"
},
{
"key": "height",
"value": "607023104"
},
{
"key": "weight",
"value": "33509376"
}
]
},
{
"timestamp": "2021-04-07T17:46:13.781Z",
"meta": [
{
"key": "color",
"value": "0.007"
},
{
"key": "count",
"value": "641511424"
},
{
"key": "age",
"value": "0"
}
]
}
]
}
I can extract one field like length pretty easily.
select cast(metadata ->> 'value' as double precision) as length,
id
from (
select jsonb_array_elements(jsonb_array_elements(step_data #> '{events}') #> '{meta}') metadata,
id
from table
) as parsed_keys
where metadata #> '{
"key": "length"
}'::jsonb
id
length
a1
0.898
a2
0.800
But what I really need is to extract the metadata as columns from a couple of known keys, like length and color. Not sure how to get another column efficiently once I split the array with jsonb_array_elements().
Is there an efficient way to do this without having to call jsonb_array_elements() again and do a join on every single one? For example such that the result set looks like this.
id
length
color
weight
a1
0.898
0.007
33509376
a2
0.800
1.000
15812391
Using Postgres 11.7.
With Postgres 11, I can only think of unnesting both levels, then aggregating back into a key/value pair from which you can extract the desired keys:
select t.id,
(x.att ->> 'length')::numeric as length,
(x.att ->> 'color')::numeric as color,
(x.att ->> 'weight')::numeric as weight
from the_table t
cross join lateral (
select jsonb_object_agg(m.item ->> 'key', m.item -> 'value') as att
from jsonb_array_elements(t.step_data -> 'events') as e(event)
cross join jsonb_array_elements(e.event -> 'meta') as m(item)
where m.item ->> 'key' in ('color', 'length', 'weight')
) x
;
With Postgres 12 you could write it a bit simpler:
select t.id,
jsonb_path_query_first(t.step_data, '$.events[*].meta[*] ? (#.key == "length").value') #>> '{}' as length,
jsonb_path_query_first(t.step_data, '$.events[*].meta[*] ? (#.key == "color").value') #>> '{}' as color,
jsonb_path_query_first(t.step_data, '$.events[*].meta[*] ? (#.key == "weight").value') #>> '{}' as weight
from the_table t
;
crosstab()
For any Postgres version.
You could feed the result into the crosstab() function to pivot the result. You need the additional module tablefunc installed. If you are unfamiliar, read basic instructions here first:
PostgreSQL Crosstab Query
SELECT *
FROM crosstab(
$$
SELECT id, metadata->>'key', metadata ->>'value'
FROM (SELECT id, jsonb_array_elements(jsonb_array_elements(step_data -> 'events') -> 'meta') metadata FROM tbl) AS parsed_keys
ORDER BY 1
$$
, $$VALUES ('length'), ('color'), ('weight')$$
) AS ct (id text, length float, color float, weight float);
db<>fiddle here (demonstrating all)
Should deliver best performance, especially for many columns.
Note how we need no explicit cast to double precision (float). crosstab() processes text input anyway, the result is coerced to the types given in the column definition list.
If one of the keys should appear multiple times, the last row wins. (No error is raised, like you would seem to prefer.) You can add a deterministic sort order to the query in $1 to sort the preferred row last. Example: to get the lowest value per key:
ORDER BY 1, 2, 3 DESC
Conditional aggregates with FILTER clause
For Postgres 9.4 or newer.
See:
Aggregate columns with additional (distinct) filters
SELECT id
, min(val) FILTER (WHERE key = 'length') AS length
, min(val) FILTER (WHERE key = 'color') AS color
, min(val) FILTER (WHERE key = 'weight') AS weight
FROM (
SELECT id, metadata->>'key' AS key, (metadata ->>'value')::float AS val
FROM (SELECT id, jsonb_array_elements(jsonb_array_elements(step_data -> 'events') -> 'meta') metadata FROM tbl) AS parsed_keys
) sub
GROUP BY id;
In Postgres 12 or later I would compare performance with jsonb_path_query_first(). a_horse provided a solution.
This query is an option:
SELECT id,
MAX(CASE WHEN metadata->>'key' = 'length' THEN metadata->>'value' END) AS length,
MAX(CASE WHEN metadata->>'key' = 'color' THEN metadata->>'value' END) AS color,
MAX(CASE WHEN metadata->>'key' = 'weight' THEN metadata->>'value' END) AS weight
FROM (SELECT id, jsonb_array_elements(jsonb_array_elements(step_data #> '{events}') #> '{meta}') as metadata
FROM table t) AS aux
GROUP BY id;

Error deleting json object from array in Postgres

I have a Postgres table timeline with two columns:
user_id (varchar)
items (json)
This is the structure of items json field:
[
{
itemId: "12345",
text: "blah blah"
},
//more items with itemId and text
]
I need to delete all the items where itemId equals a given value. e.g. 12345
I have this working SQL:
UPDATE timeline
SET items = items::jsonb - cast((
SELECT position - 1 timeline, jsonb_array_elements(items::jsonb)
WITH ORDINALITY arr(item_object, position)
WHERE item_object->>'itemId' = '12345') as int)
It works fine. It only fails when no items are returned by the subquery i.e. when there are no items whose itemId equals '12345'. In those cases, I get this error:
null value in column "items" violates not-null constraint
How could I solve this?
Try this:
update timeline
set items=(select
json_agg(j)
from json_array_elements(items) j
where j->>'itemId' not in ( '12345')
);
DEMO
The problem is that when null is passed to the - operator, it results in null for the expression. That not only violates your not null constraint, but it is probably also not what you are expecting.
This is a hack way of getting past it:
UPDATE timeline
SET items = items::jsonb - coalesce(
cast((
SELECT position - 1 timeline, jsonb_array_elements(items::jsonb)
WITH ORDINALITY arr(item_object, position)
WHERE item_object->>'itemId' = '12345') as int), 99999999)
A more correct way to do it would be to collect all of the indexes you want to delete with something like the below. If there is the possibility of more than one userId: 12345 within a single user_id row, then this will either fail or mess up your items(I have to test to see which), but at least it updates only rows with the 12345 records.
WITH deletes AS (
SELECT t.user_id, e.rn - 1 as position
FROM timeline t
CROSS JOIN LATERAL JSONB_ARRAY_ELEMENTS(t.items)
WITH ORDINALITY as e(jobj, rn)
WHERE e.jobj->>'itemId' = '12345'
)
UPDATE timeline
SET items = items - d.position
FROM deletes d
WHERE d.user_id = timeline.user_id;

postgresql Looping through JSONB array and performing SELECTs

I have jsonb in one of my table
the jsonb looks like this
my_data : [
{pid: 1, stock: 500},
{pid: 2, stock: 1000},
...
]
pid refers to products' table id ( which is pid )
EDIT: The table products has following properties: pid (PK), name
I want to loop over my_data[] in my JSONB and fetch pid's name from product table.
I need the result to look something like this (including the product names from the second table) ->
my_data : [
{
product_name : "abc",
pid: 1,
stock : 500
},
...
]
How should I go about performing such jsonb inner join?
Edit :- tried S-Man's solutions and i'm getting this error
"invalid reference to FROM-clause entry for table \"jc\""
here is the
SQL QUERY
step-by-step demo:db<>fiddle
SELECT
jsonb_build_object( -- 5
'my_data',
jsonb_agg( -- 4
elems || jsonb_build_object('product_name', mot.product_name) -- 3
)
)
FROM
mytable,
jsonb_array_elements(mydata -> 'my_data') as elems -- 1
JOIN
my_other_table mot ON (elems ->> 'pid')::int = mot.pid -- 2
Expand JSON array into one row per array element
Join the other table against the current one using the pid values (notice the ::int cast, because otherwise it would be text value)
The new columns from the second table now can be converted into a JSON object. This one can be concatenate onto the original one using the || operator
After that recreating the array from the array elements again
Putting this in array into a my_data element
Another way is using jsonb_set() instead of step 5 do reset the array into the original array directly:
step-by-step demo:db<>fiddle
SELECT
jsonb_set(
mydata,
'{my_data}',
jsonb_agg(
elems || jsonb_build_object('product_name', mot.product_name)
)
)
FROM
mytable,
jsonb_array_elements(mydata -> 'my_data') as elems
JOIN
my_other_table mot ON (elems ->> 'pid')::int = mot.pid
GROUP BY mydata

Extracting data from an array of JSON objects for specific object values

In my table, there is a column of JSON type which contains an array of objects describing time offsets:
[
{
"type": "start"
"time": 1.234
},
{
"type": "end"
"time": 50.403
}
]
I know that I can extract these with JSON_EACH() and JSON_EXTRACT():
CREATE TEMPORARY TABLE Items(
id INTEGER PRIMARY KEY,
timings JSON
);
INSERT INTO Items(timings) VALUES
('[{"type": "start", "time": 12.345}, {"type": "end", "time": 67.891}]'),
('[{"type": "start", "time": 24.56}, {"type": "end", "time": 78.901}]');
SELECT
JSON_EXTRACT(Timings.value, '$.type'),
JSON_EXTRACT(Timings.value, '$.time')
FROM
Items,
JSON_EACH(timings) AS Timings;
This returns a table like:
start 12.345
end 67.891
start 24.56
end 78.901
What I really need though is to:
Find the timings of specific types. (Find the first object in the array that matches a condition.)
Take this data and select it as a column with the rest of the table.
In other words, I'm looking for a table that looks like this:
id start end
-----------------------------
0 12.345 67.891
1 24.56 78.901
I'm hoping for some sort of query like this:
SELECT
id,
JSON_EXTRACT(timings, '$.[type="start"].time'),
JSON_EXTRACT(timings, '$.[type="end"].time')
FROM Items;
Is there some way to use path in the JSON functions to select what I need? Or, some other way to pivot what I have in the first example to apply to the table?
One possibility:
WITH cte(id, json) AS
(SELECT Items.id
, json_group_object(json_extract(j.value, '$.type'), json_extract(j.value, '$.time'))
FROM Items
JOIN json_each(timings) AS j ON json_extract(j.value, '$.type') IN ('start', 'end')
GROUP BY Items.id)
SELECT id
, json_extract(json, '$.start') AS start
, json_extract(json, '$.end') AS "end"
FROM cte
ORDER BY id;
which gives
id start end
---------- ---------- ----------
1 12.345 67.891
2 24.56 78.901
Another one, that uses the window functions added in sqlite 3.25 and avoids creating intermediate JSON objects:
SELECT DISTINCT Items.id
, max(json_extract(j.value, '$.time'))
FILTER (WHERE json_extract(j.value, '$.type') = 'start') OVER ids AS start
, max(json_extract(j.value, '$.time'))
FILTER (WHERE json_extract(j.value, '$.type') = 'end') OVER ids AS "end"
FROM Items
JOIN json_each(timings) AS j ON json_extract(j.value, '$.type') IN ('start', 'end')
WINDOW ids AS (PARTITION BY Items.id)
ORDER BY Items.id;
The key is using the ON clause of the JOIN to limit results to just the two objects in each array that you care about, and then merging those up to two rows for each Items.id into one with a couple of different approaches.

postgreSQL query empty array fields within jsonb column

device_id | device
-----------------------------
9809 | { "name" : "printer", "tags" : [] }
9810 | { "name" : "phone", "tags" : [{"count": 2, "price" : 77}, {"count": 3, "price" : 37} ] }
For the following postgres SQL query on a jsonb column "device" that contains array 'tags':
SELECT t.device_id, elem->>'count', elem->>'price'
FROM tbl t, json_array_elements(t.device->'tags') elem
where t.device_id = 9809
device_id is the primary key.
I have two issues that I don't know how to solve:
tags is an array field that may be empty, in which case I got 0 rows. I want output no matter tags is empty or not. Dummy values are ok.
If tags contain multiple elements, I got multiple rows for the same device id. How to aggregate those multiple elements into one row?
Your first problem can be solved by using a left outer join, that will substitute NULL values for missing matches on the right side.
The second problem can be solved with an aggregate function like json_agg, array_agg or string_agg, depending on the desired result type:
SELECT t.device_id,
jsonb_agg(elem->>'count'),
jsonb_agg(elem->>'price')
FROM tbl t
LEFT JOIN LATERAL jsonb_array_elements(t.device->'tags') elem
ON TRUE
GROUP BY t.device_id;
You will get a JSON array containing just null for those rows where the array is empty, I hope that is ok for you.