How to process json data and put it into arrays in postgresql - sql

I have a postgresql table with two columns. name VARCHAR(255) and notes JSON. A sample dataset might look like this:
| name | notes |
|-----------|----------------------------------|
| 'anna' | {'link_to': ['bob']} |
| 'bob' | {'link_to': ['anna', 'claudia']} |
| 'claudia' | {'link_to': []} |
Now I want to do two things.
put the list from the json at 'link_to' into another column called referrals_to (which must be of array VARCHAR type then). From my example:
name
notes
referrals_to
'anna'
{'link_to': ['bob']}
['bob']
'bob'
{'link_to': ['anna', 'claudia']}
['anna', 'claudia']
'claudia'
{'link_to': []}
[]
create another column called referrals_from, where I want to store all names from which a name was referred. In my example:
name
notes
referrals_from
'anna'
{'link_to': ['bob']}
['bob']
'bob'
{'link_to': ['anna', 'claudia']}
['anna']
'claudia'
{'link_to': []}
['bob']
How do I do this using postgresql queries? I could easily do it using python, but this would be slower than using postgresql directly, I guess.

The referrals_to column can be done using the -> operator to extract the array. For the referrals_from column I would use a scalar sub-select to collect all referring names into a single column (a JSON array)
select name,
notes,
notes -> 'link_to' as referrals_to,
(select jsonb_agg(name)
from the_table t2
where t2.name <> t1.name
and t2.notes -> 'link_to' ? t1.name) as referrals_from
from the_table t1
;
The ? operator tests if a JSON array contains a specific string. In this case it tests if the link_to array contains the name from the outer query.
Online Example
Assuming name is unique in your table you can use that query to update the new columns in the table:
update the_table
set referrals_to = notes -> 'link_to',
referrals_from = t.referrals_from
from (
select t1.name,
(select jsonb_agg(name)
from the_table t2
where t2.name <> t1.name
and t2.notes -> 'link_to' ? t1.name) as referrals_from
from the_table t1
) t
where t.name = the_table.name;

More or less what you described in your steps:
WItH cte AS (
SELECT
name,
BTRIM(value:: text, '"') AS referrals
FROM
(
SELECT
name,
foo.notes:: json - > > 'link_to' AS link_to
FROM
foo
) a
left JOIN LATERAL json_array_elements (a.link_to:: json) ON TRUE
)
SELECT
a1.name,
array_to_json(array_agg(a1.referrals))
FROM
cte a1
JOIN cte a2 ON a1.name = a2.referrals
GROUP BY
a1.name

Related

how to delete data array on jsonb postgresql

how to update array data in jsonb column on database postgresql?
for example on table table1 i have column attribute that have value like this:
id
attribute
1
[{"task_customs": ["a", "b", "c"]}]
2
[{"task_customs": ["d", "e", "f"]}]
for example if i want to delete b from id 1, so it will be like this on attribute column
id
attribute
1
[{"task_customs": ["a", "c"]}]
2
[{"task_customs": ["d", "e", "f"]}]
already do some research but didn't get what i need..
try this :
(a) Delete 'b' acccording to its position in the array :
UPDATE table1
SET attribute = attribute #- array['0', 'task_customs', '1'] :: text[]
WHERE id = 1
(b) Delete 'b' without knowing its position in the array :
WITH list AS
( SELECT id, to_jsonb(array[jsonb_build_object('task_customs', jsonb_agg(i.item ORDER BY item_id))]) AS new_attribute
FROM table1
CROSS JOIN LATERAL jsonb_array_elements_text(attribute#>'{0,task_customs}') WITH ORDINALITY AS i(item,item_id)
WHERE id = 1
AND i.item <> 'b'
GROUP BY id
)
UPDATE table1 AS t
SET attribute = l.new_attribute
FROM list AS l
WHERE t.id = l.id
see the test result in dbfiddle.
One option is to start splitting the JSONB value by using jsonb_to_recordset such as
UPDATE table1 AS t
SET attribute =
(
SELECT json_build_array(
jsonb_build_object('task_customs',task_customs::JSONB - 'b')
)
FROM table1,
LATERAL jsonb_to_recordset(attribute) AS (task_customs TEXT)
WHERE id = t.id
)
WHERE id = 1
Demo
Edit : If you need more elements as expressed within the comment then you can rather prefer using
UPDATE table1 AS t
SET attribute =
(
SELECT jsonb_agg(
jsonb_build_object(key,je.value::JSONB - 'b')
)
FROM table1,
LATERAL jsonb_array_elements_text(attribute) AS atr,
LATERAL jsonb_each_text(atr::JSONB) AS je
WHERE id = t.id
)
WHERE id = 1
Demo
i solve this issue by combining both answer from Edouard and Barbaros
this is my final query
UPDATE table1 AS t
SET attribute =
jsonb_set(
attribute,
'{0,task_customs}',
(
SELECT task_customs::JSONB - 'b'
FROM table1
CROSS JOIN LATERAL jsonb_to_recordset(attribute) AS (task_customs TEXT)
WHERE id = t.id
)
)
WHERE id = 1
see the test result in dbfiddle

Sequelize.js postgres LATERAL usage

I have a postgres table with JSONB field.
json contains array of objects
| id | my_json_field |
-------------------------------------------------------
| 1234 | [{"id": 1, type: "c"}, {"id": 2, type: "v"}] |
| 1235 | [{"id": 1, type: "e"}, {"id": 2, type: "d"}] |
I need to sort/filter table by type key of json field.
Server accept id, so if id=1 - I need to sort by "c","e", if id=2 - by "v","d"
I have next SQL:
LEFT JOIN LATERAL (
SELECT elem ->> 'type' AS my_value
FROM jsonb_array_elements(my_json_field) a(elem)
WHERE elem ->> 'id' = '1'
) a ON true
this will add my_value field to the results and I can use it to sort/filter the table
This works fine in console, but I didn't find a way to add this using Sequelize.js
Also I'm open for any other solutions, thanks!
Edit, full query:
SELECT my_value FROM "main_table" AS "main_table"
LEFT OUTER JOIN ( "table2" AS "table2"
LEFT OUTER JOIN "form_table" AS "table2->form_table" ON "table2"."id" = "table2->form_table"."table2_id")
ON "main_table"."id" = "table2"."main_table_id"
LEFT JOIN LATERAL (
SELECT elem ->> 'type' AS my_value
FROM jsonb_array_elements("table2->form_table".structure) a(elem)
WHERE elem ->> 'id' = '1'
) a ON TRUE
ORDER BY "my_value" DESC;
You don't really need the keyword LATERAL as that is implied if you use a set returning function directly and not in a sub-select.
The following should do the same thing as your query and doesn't need the LATERAL keyword:
SELECT a.elem ->> 'type' as my_value
FROM "main_table"
LEFT JOIN "table2" ON "main_table"."id" = "table2"."main_table_id"
LEFT JOIN "form_table" AS "table2->form_table" ON "table2"."id" = "table2->form_table"."table2_id")
LEFT JOIN jsonb_array_elements("table2->form_table".structure) a(elem) on a.elem ->> 'id' = '1'
ORDER BY my_value DESC;
I also removed the useless parentheses around the outer joins and the aliases that don't give the table a new name to simplify the syntax.
Maybe that allows you to use the query with your ORM (aka "obfuscation layer")

How to group multiple columns into a single array or similar?

I would like my query to return a result structured like this, where tags is an array of arrays or similar:
id | name | tags
1 a [[1, "name1", "color1"], [2, "name2", color2"]]
2 b [[1, "name1", "color1"), (3, "name3", color3"]]
I expected this query to work, but it gives me an error:
SELECT i.id, i.name, array_agg(t.tag_ids, t.tag_names, t.tag_colors) as tags
FROM ITEMS
LEFT OUTER JOIN (
SELECT trm.target_record_id
, array_agg(tag_id) as tag_ids
, array_agg(t.tag_name) as tag_names
, array_agg(t.tag_color) as tag_colors
FROM tags_record_maps trm
INNER JOIN tags t on t.id = trm.tag_id
GROUP BY trm.target_record_id
) t on t.target_record_id = i.id;
Error:
PG::UndefinedFunction: ERROR: function array_agg(integer[], character varying[], character varying[]) does not exist
LINE 1: ..., action_c2, action_c3, action_name, action_desc, array_agg(...
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
This query works and produces similar results (but not quite what I want):
SELECT i.id, i.name, t.tag_ids, t.tag_names, t.tag_colors as tags as tags
FROM ITEMS
LEFT OUTER JOIN (
SELECT trm.target_record_id, array_agg(tag_id) as tag_ids, array_agg(t.tag_name) as tag_names, array_agg(t.tag_color) as tag_colors
FROM tags_record_maps trm
INNER JOIN tags t on t.id = trm.tag_id
GROUP BY trm.target_record_id
) t on t.target_record_id = i.id;
Result:
id | name | tag_ids | tag_names | tag_colors
1 a [1, 2] ["name1, "name2"] ["color1", "color2"]
1 a [1, 3] ["name1, "name3"] ["color1", "color3"]
Edit:
This query almost produces what I'm looking for, except it names the json keys f1, f2, f3. It would be perfect if I could name them id, name, color:
SELECT trm.target_record_id, json_agg( (t.id, t.tag_name, t.tag_color) )
FROM tags_record_maps trm
INNER JOIN tags t on t.site_id = trm.site_id and t.id = trm.tag_id
GROUP BY trm.target_record_id
having count(*) > 1;
Result:
[{"f1":1,"f2":"name1","f3":"color1"},{"f1":2,"f2":"name2","f3":"color2"}]
(t.id, t.tag_name, t.tag_color) is short syntax for ROW(t.id, t.tag_name, t.tag_color) - and a ROW constructor does not preserve nested attribute names. The manual:
By default, the value created by a ROW expression is of an anonymous record type. If necessary, it can be cast to a named composite type —
either the row type of a table, or a composite type created with
CREATE TYPE AS.
Bold emphasis mine. To also get proper key names in the result, cast to a registered composite type as advised in the quote, use a nested subselect, or simply use json_build_object() in Postgres 9.4 or newer (effectively avoiding the ROW constructor a priori):
SELECT trm.target_record_id
, json_agg(json_build_object('id', t.id
, 'tag_name', t.tag_name
, 'tag_color', t.tag_color)) AS tags
FROM tags_record_maps trm
JOIN tags t USING (site_id)
WHERE t.id = trm.tag_id
GROUP BY trm.target_record_id
HAVING count(*) > 1;
I use original column names, but you can chose your key names freely. In your case:
json_agg(json_build_object('id', t.id
, 'name', t.tag_name
, 'color', t.tag_color)) AS tags
Detailed explanation:
Return multiple columns of the same row as JSON array of objects
array_agg() puts one argument into an array. You could try to concatenate the values together:
array_agg(t.tag_ids || ':' || t.tag_names || ':' || t.tag_colors)
Or perhaps use a row constructor:
array_agg( (t.tag_ids, t.tag_names, t.tag_colors) )
Why not try a Json_Agg()?
SELECT
json_agg(tag_ids, tag_names, tag_colors)
FROM items
Etc...
DB fiddle
let's play with composite type.
create type tags as(tag_id bigint, tag_name text,tag_color text);
using array_agg:
select item_id,name, array_agg(row(trm.tag_id, tag_name, tag_color)::tags) as tags
from items i join tags_record_maps trm on i.item_id = trm.target_record_id
group by 1,2;
to json.
select item_id,name, to_json( array_agg(row(trm.tag_id, tag_name, tag_color)::tags)) as tags
from items i join tags_record_maps trm on i.item_id = trm.target_record_id
group by 1,2;
access individual/base element of composite type:
with a as(
select item_id,name, array_agg(row(trm.tag_id, tag_name, tag_color)::tags) as tags
from items i join tags_record_maps trm on i.item_id = trm.target_record_id
group by 1,2)
select a.item_id, a.tags[2].tag_id from a;

Join three rows if the same value in one column

There is a Postgres database and the table has three columns. The data structure is in external system so I can not modify it.
Every object is represented by three rows (identified by column element_id - rows with the same value in this column represents the same object), for example:
key value element_id
-----------------------------------
status active 1
name exampleNameAAA 1
city exampleCityAAA 1
status inactive 2
name exampleNameBBB 2
city exampleCityBBB 2
status inactive 3
name exampleNameCCC 3
city exampleCityCCC 3
I want to get all values describing every objects (name, status and city).
For this example the output should be like:
exampleNameAAA | active | exampleCityAAA
exampleNameBBB | inactive | exampleCityBBB
exampleNameCCC | inactive | exampleCityCCC
I know how to join two rows:
select a.value as name,
b.value as status
from the_table a
join the_table b
on a.element_id = b.element_id
and b."key" = 'status'
where a."key" = 'name';
How is it possible to join three columns?
You can try below
DEMO
select a.value as name,
b.value as status,c.value as city
from t1 a
join t1 b
on a.element_id = b.element_id and b."keys" = 'status'
join t1 c on a.element_id = c.element_id and c."keys" = 'city'
where a."keys" = 'name';
OUTPUT
name status city
exampleNameAAA active exampleCityAAA
exampleNameBBB inactive exampleCityBBB
exampleNameCCC inactive exampleCityCCC
One option is to simply add another join for each value you need (this is one of the big disadvantages of the EAV (anti) pattern you are using:
select a.value as name,
b.value as status,
c.value as city
from the_table a
join the_table b on a.element_id = b.element_id and b."key" = 'status'
join the_table c on a.element_id = c.element_id and c."key" = 'city'
where a."key" = 'name';
Another option is to aggregate all key/value pairs for an element into a JSON then you can easily access each one without additional joins:
select t.element_id,
t.obj ->> 'city' as city,
t.obj ->> 'status' as status,
t.obj ->> 'name' as name
from (
select e.element_id, jsonb_object_agg("key", value) as obj
from element e
group by e.element_id
) t;
If the table is really big this might be a lot slower than the join version due to the aggregation step. If you limit the query to only some elements (e.g. by adding a where element_id = 1 or where element_id in (1,2,3)) then this should be quite fast.
It has the advantage that you always have all key/value pairs for each element_id available regardless on what you do. The inner select could be put into a view, to make things easier.
Online example: https://rextester.com/MSZOWU37182
Seems like you want to PIVOT
One way to do that is via conditional aggregation.
select
-- t.element_id,
max(case when t.key = 'name' then t.value end) as name,
max(case when t.key = 'status' then t.value end) as status,
max(case when t.key = 'city' then t.value end) as city
from the_table t
group by t.element_id;
db<>fiddle here
Or use crosstab:
select
-- element_id,
name,
status,
city
from crosstab (
'select t.element_id, t.key, t.value
from the_table t'
) as ct (element_id int, name varchar(30), status varchar(30), city varchar(30));
But if you do like those joins, here's a way
select
-- el.element_id,
nm.value as name,
st.value as status,
ci.value as city
from
(
select distinct t.element_id
from the_table t
where t.key in ('name','status','city')
) as el
left join the_table as nm on (nm.element_id = el.element_id and nm.key = 'name')
left join the_table as st on (st.element_id = el.element_id and st.key = 'status')
left join the_table as ci on (ci.element_id = el.element_id and ci.key = 'city');

USING limit/offset in a JOIN query

I have 4 tables
A user account
user_id | username | password
---------+----------+----------
A projects table
project_id | project_name | category_id
------------+------------------------------+-------------
A user_projects table (many to many relationship)
accounts_projects_id | account_id | project_id
----------------------+------------+------------
A project_messages table (a project will have many messages)
message_id | project_id |message| username
------------+------------+--------+---------
At login, I'm running a query where I fetch the number of projects a user belongs to and the messages for each project using the below query
SELECT account.user_id,account.username,
array_agg(json_build_object('message',project_messages.message,'username',project_messages.username)) AS messages,
project.project_name
FROM account
JOIN accounts_projects ON account.user_id = accounts_projects.account_id
JOIN project_messages ON accounts_projects.project_id = project_messages.project_id
JOIN project ON project.project_id = accounts_projects.project_id
WHERE account.username=$1
GROUP BY project.project_name,account.user_id
this gives me the below output
userid,username, messages (json array object),project_name`
87;"kannaj";"{"{\"message\" : \"saklep\", \"username\" : \"kannaj\"}"}";"Football with Javascript"
87;"kannaj";"{"{\"message\" : \"work\", \"username\" : \"kannaj\"}","{\"message\" : \"you've been down to long in the midnight sea\", \"username\" : \"kannaj\"}","{\"message\" : \"Yeaaaa\", \"username\" : \"house\"}"}";"Machine Learning with Python"
87;"kannaj";"{"{\"message\" : \"holyy DIVVEERRR\", \"username\" : \"kannaj\"}"}";"Beethoven with react"
Is there a way I can use the LIMIT/OFFSET function when retrieving the messages from the project_messages table?
To make our examples simpler lets say we have two linked tables:
t1(id);
t2(id, t1_id);
And query is
select t1.id, array_agg(t2.id)
from t1 join t2 on (t1.id = t2.t1_id)
group by t1.id;
It is very simplified variant of the your large query as you can see.
1) Arrays
select t1.id, (array_agg(t2.id order by t2.id desc))[3:5]
from t1 join t2 on (t1.id = t2.t1_id)
group by t1.id;
This query works just as original, but returns only from 3,4 and 5 elements of the array which is equal to offset 2 limit 3.
2) Subquery and lateral
select
t1.id,
array_agg(t.x)
from
t1 join lateral
(select t2.id as x from t2 where t1.id = t2.t1_id order by t2.id desc offset 2 limit 3) t on (true)
group by t1.id;
Here lateral keyword allows to use fields from other tables mentioned in the main from clause in our subquery (t1.id).