Match two jsonb documents by order of elements in array - sql

I have table of data jsonb documents in postgres and second table containing templates for data.
I need to match data jsonb row with template jsonb row just by order of elements in array in effective way.
template jsonb document:
{
"template":1,
"rows":[
"first row",
"second row",
"third row"
]
}
data jsonb document:
{
"template":1,
"data":[
125,
578,
445
]
}
desired output:
Desc
Amount
first row
125
second row
578
third row
445
template table:
| id | jsonb |
| -------- | ------------------------------------------------------ |
| 1 | {"template":1,"rows":["first row","second row","third row"]} |
| 2 | {"template":2,"rows":["first row","second row","third row"]} |
| 3 | {"template":3,"rows":["first row","second row","third row"]} |
data table:
| id | jsonb |
| -------- | ------------------------------------------- |
| 1 | {"template":1,"data":[125,578,445]} |
| 2 | {"template":1,"data":[125,578,445]} |
| 3 | {"template":2,"data":[125,578,445]} |
I have millions of data jsonb documents and hundreds of templates.
I would do it just by converting both to tables, then use row_number windowed function but it does not seem very effective way to me.
Is there better way of doing this?

You will have to normalize this mess "on-the-fly" to get the output you want.
You need to unnest each array using jsonb_array_elements() using the with ordinality option to get the array index. You can join the two tables by extracting the value of the template key:
Assuming you want to return this for a specific row from the data table:
select td.val, dt.val
from data
cross join jsonb_array_elements_text(data.jsonb_column -> 'data') with ordinality as dt(val, idx)
left join template tpl
on tpl.jsonb_column ->> 'template' = data.jsonb_column ->> 'template'
left join jsonb_array_elements_text(tpl.jsonb_column -> 'rows') with ordinality as td(val, idx)
on td.idx = dt.idx
where data.id = 1;
Online example

Related

Postgres jsonb. Heterogenous json fields

If I have a table with a single jsonb column and the table has data like this:
[{"body": {"project-id": "111"}},
{"body": {"my-org.project-id": "222"}},
{"body": {"other-org.project-id": "333"}}]
Basically it stores project-id differently for different rows.
Now I need a query where the data->'body'->'etc'., from different rows would coalesce into a single field 'project-id', how can I do that?
e.g.: if I do something like this:
select data->'body'->'project-id' projectid from mytable
it will return something like:
| projectid |
| 111 |
But I also want project-id's in other rows too, but I don't want additional columns in the results. i.e, I want this:
| projectid |
| 111 |
| 222 |
| 333 |
I understand that each of your rows contains a json object, with a nested object whose key varies over rows, and whose value you want to acquire.
Assuming the 'body' always has a single key, you could do:
select jsonb_extract_path_text(t.js -> 'body', x.k) projectid
from t
cross join lateral jsonb_object_keys(t.js -> 'body') as x(k)
The lateral join on jsonb_object_keys() extracts all keys in the object as rows. Then we use jsonb_extract_path_text() to get the corresponding value.
Demo on DB Fiddle:
with t as (
select '{"body": {"project-id": "111"}}'::jsonb js
union all select '{"body": {"my-org.project-id": "222"}}'::jsonb
union all select '{"body": {"other-org.project-id": "333"}}'::jsonb
)
select jsonb_extract_path_text(t.js -> 'body', x.k) projectid
from t
cross join lateral jsonb_object_keys(t.js -> 'body') as x(k)
| projectid |
| :--------- |
| 111 |
| 222 |
| 333 |

PostgreSQL update field for each element in array

I wish to do the following task in SQL:
I have a table with columns:
uuid (uuid), word (text), wordList (text[]), uuidList (uuid[])
I have the wordList array, uuid and word columns populated. I wish to update and populate the uuidList like this:
foreach element in wordList
var x = select uuid where word = element;
uuidList.append(x);
Example:
I have a table like this:
+---------+-------+--------------------+----------+
| uuid | word | wordList | uuidList |
+---------+-------+--------------------+----------+
| aaaa... | hello | NULL | NULL |
| bbbb... | world | NULL | NULL |
| cccc... | blah | {'hello', 'world'} | NULL |
+---------+-------+--------------------+----------+
I want it to become like this:
+---------+-------+--------------------+--------------------+
| uuid | word | wordList | uuidList |
+---------+-------+--------------------+--------------------+
| aaaa... | hello | NULL | NULL |
| bbbb... | world | NULL | NULL |
| cccc... | blah | {'hello', 'world'} | {aaaa..., bbbb...} |
+---------+-------+--------------------+--------------------+
I'm quite new to SQL and have gotten confused how to do it. I don't think I can join a table to itself. I don't know if I should be storing information in a temporary table to somehow achieve this (some related questions I read had this proposed)...
Thanks!
You can aggregate all the needed UUIDs in a single statement:
select w1.uid, array_agg(w2.uid order by wl.idx) as uuidlist
from words w1
cross join lateral unnest(w1.wordlist) with ordinality as wl(word,idx)
join words w2 on w2.word = wl.word
where w1.wordlist is not null
and w1.uuidlist is null -- optional
group by w1.uid;
The option with ordinality returns an additional column that indicates the position of the element in the original array. This is needed to aggregate the UUIDs in the correct order.
This returns the following result with your sample data:
uid | uuidlist
-----+------------
cccc | {aaaa,bbbb}
This can be used as the source of an update statement (assuming the column uid is unique):
update words
set uuidlist = t.uuidlist
from (
select w1.uid, array_agg(w2.uid order by wl.idx) as uuidlist
from words w1
cross join lateral unnest(w1.wordlist) with ordinality as wl(word,idx)
join words w2 on w2.word = wl.word
where w1.wordlist is not null
and w1.uuidlist is null -- optional
group by w1.uid
) t
where t.uid = words.uid;
Online example: https://rextester.com/LZUYC57184
(note that the display of arrays is a bit weird in that example)

Update a field in a JSON column in PostgreSQL

I have a work_item table that has the following schema
+---------+----------+---------------+
| id | data | data_type |
+------------------------------------+
| | | |
| | | |
| | | |
+---------+--------------------------+
and a document_type table with the following schema:
+---------+----------+
| id | name |
+--------------------+
| | |
| | |
| | |
+---------+-----------
The data column is a json column that has a Type field. This is a sample column data:
{"Id":"5d35a41f-3e91-4eda-819d-0f2d7c2ba55e","WorkItem":"24efa9ea-4291-4b0a-9623-e6122201fe4a","Type":"Tax Document","Date":"4/16/2009"}
I need to update data columns whose data_type column value is DocumentModel and Type field values matches a value in the name column of the document_type table to a json object containing the document_type id and the document_type name. Something like this {"id": "<doc_type_id>", name: "<doc_type_name>"}.
I tried to do this by executing this query:
UPDATE wf.work_item wi
SET data = jsonb_set(data::jsonb, '{Type}', (
SELECT jsonb_build_object('id', dt.id, 'name', dt.name)
FROM wf.document_type AS dt
WHERE wi.data ->> 'Type'::text = dt.name::text
), false)
WHERE wi.data_type = 'DocumentModel';
The above script runs without an error. However, what it does is something unwanted, it changes the data and data_type columns to null instead of updating the data column.
What is the issue with my script? Or can you suggest a better alternative to do a the desired update?
The problem arises when the document type is missing from the document_type table. Then jsonb_set() returns null (as the subquery does not give any results). A safer solution is to use the from clause in update:
update wf.work_item wi
set data = jsonb_set(
data::jsonb,
'{Type}',
jsonb_build_object('id', dt.id, 'name', dt.name),
false)
from wf.document_type as dt
where wi.data_type = 'DocumentModel'
and wi.data ->> 'Type'::text = dt.name::text;

PostgreSQL: Efficiently split JSON array into rows

I have a table (Table A) that includes a text column that contains JSON encoded data.
The JSON data is always an array with between one and a few thousand plain object.
I have another table (Table B) with a few columns, including a column with a datatype of 'JSON'
I want to select all the rows from table A, split the json array into its elements and insert each element into table B
Bonus objective: Each object (almost) always has a key, x. I want to pull the value of x out into column, and delete x from the original object (if it exists).
E.g.: Table A
| id | json_array (text) |
+----+--------------------------------+
| 1 | '[{"x": 1}, {"y": 8}]' |
| 2 | '[{"x": 2, "y": 3}, {"x": 1}]' |
| 3 | '[{"x": 8, "z": 2}, {"z": 3}]' |
| 4 | '[{"x": 5, "y": 2, "z": 3}]' |
...would become: Table B
| id | a_id | x | json (json) |
+----+------+------+--------------------+
| 0 | 1 | 1 | '{}' |
| 1 | 1 | NULL | '{"y": 8}' |
| 2 | 2 | 2 | '{"y": 3}' |
| 3 | 2 | 1 | '{}' |
| 4 | 3 | 8 | '{"y": 2}' |
| 5 | 3 | NULL | '{"z": 3}' |
| 6 | 4 | 5 | '{"y": 2, "z": 3}' |
This initially has to work on a few million rows, and would then need to be run at regular intervals, so making it efficient would be a priority.
Is it possible to do this without using a loop and PL/PgSQL? I haven't been making much progress.
The json data type is not particularly suitable (or intended) for modification at the database level. Extracting "x" objects from the JSON object is therefore cumbersome, although it can be done.
You should create your table B (with hopefully a more creative column name than "json"; I am using item here) and make the id column a serial that starts at 0. A pure json solution then looks like this:
INSERT INTO b (a_id, x, item)
SELECT sub.a_id, sub.x,
('{' ||
string_agg(
CASE WHEN i.k IS NULL THEN '' ELSE '"' || i.k || '":' || i.v END,
', ') ||
'}')::json
FROM (
SELECT a.id AS a_id, (j.items->>'x')::integer AS x, j.items
FROM a, json_array_elements(json_array) j(items) ) sub
LEFT JOIN json_each(sub.items) i(k,v) ON i.k <> 'x'
GROUP BY sub.a_id, sub.x
ORDER BY sub.a_id;
In the sub-query this extracts the a_id and x values, well as the JSON object. In the outer query the JSON object is broken into its individual pieces and the objects with key x thrown out (the LEFT JOIN ON i.k <> 'x'). In the select list the pieces are put back together again with string concatenation and grouped into compound objects.
This necessarily has to be like this because json has no built-in manipulation functions of any consequence. This works on PG versions 9.3+, i.e. since time immemorial insofar as JSON support is concerned.
If you are using PG9.5+, the solution is much simpler through a cast to jsonb:
INSERT INTO b (a_id, x, item)
SELECT a.id, (j.items->>'x')::integer, j.items #- '{x}'
FROM a, jsonb_array_elements(json_array::jsonb) j(items);
The #- operator on the jsonb data type does all the dirty work here. Obviously, there is a lot of work going on behind the scenes, converting json to jsonb, so if you find that you need to manipulate your JSON objects more frequently then you are better off using the jsonb type to begin with. In your case I suggest you do some benchmarking with EXPLAIN ANALYZE SELECT ... (you can safely forget about the INSERT while testing) on perhaps 10,000 rows to see which works best for your setup.

postgres - pivot query with array values

Suppose I have this table:
Content
+----+---------+
| id | title |
+----+---------+
| 1 | lorem |
+----|---------|
And this one:
Fields
+----+------------+----------+-----------+
| id | id_content | name | value |
+----+------------+----------+-----------+
| 1 | 1 | subtitle | ipsum |
+----+------------+----------+-----------|
| 2 | 1 | tags | tag1 |
+----+------------+----------+-----------|
| 3 | 1 | tags | tag2 |
+----+------------+----------+-----------|
| 4 | 1 | tags | tag3 |
+----+------------+----------+-----------|
The thing is: i want to query the content, transforming all the rows from "Fields" into columns, having something like:
+----+-------+----------+---------------------+
| id | title | subtitle | tags |
+----+-------+----------+---------------------+
| 1 | lorem | ipsum | [tag1,tag2,tag3] |
+----+-------+----------+---------------------|
Also, subtitle and tags are just examples. I can have as many fields as I desired, them being array or not.
But I haven't found a way to convert the repeated "name" values into an array, even more without transforming "subtitle" into array as well. If that's not possible, "subtitle" could also turn into an array and I could change it later on the code, but I needed at least to group everything somehow. Any ideas?
You can use array_agg, e.g.
SELECT id_content, array_agg(value)
FROM fields
WHERE name = 'tags'
GROUP BY id_content
If you need the subtitle, too, use a self-join. I have a subselect to cope with subtitles that don't have any tags without returning arrays filled with NULLs, i.e. {NULL}.
SELECT f1.id_content, f1.value, f2.value
FROM fields f1
LEFT JOIN (
SELECT id_content, array_agg(value) AS value
FROM fields
WHERE name = 'tags'
GROUP BY id_content
) f2 ON (f1.id_content = f2.id_content)
WHERE f1.name = 'subtitle';
See http://www.postgresql.org/docs/9.3/static/functions-aggregate.html for details.
If you have access to the tablefunc module, another option is to use crosstab as pointed out by Houari. You can make it return arrays and non-arrays with something like this:
SELECT id_content, unnest(subtitle), tags
FROM crosstab('
SELECT id_content, name, array_agg(value)
FROM fields
GROUP BY id_content, name
ORDER BY 1, 2
') AS ct(id_content integer, subtitle text[], tags text[]);
However, crosstab requires that the values always appear in the same order. For instance, if the first group (with the same id_content) doesn't have a subtitle and only has tags, the tags will be unnested and will appear in the same column with the subtitles.
See also http://www.postgresql.org/docs/9.3/static/tablefunc.html
If the subtitle value is the only "constant" that you wan to separate, you can do:
SELECT * FROM crosstab
(
'SELECT content.id,name,array_to_string(array_agg(value),'','')::character varying FROM content inner join
(
select * from fields where fields.name = ''subtitle''
union all
select * from fields where fields.name <> ''subtitle''
) fields_ordered
on fields_ordered.id_content = content.id group by content.id,name'
)
AS
(
id integer,
content_name character varying,
tags character varying
);