Error deleting json object from array in Postgres - sql

I have a Postgres table timeline with two columns:
user_id (varchar)
items (json)
This is the structure of items json field:
[
{
itemId: "12345",
text: "blah blah"
},
//more items with itemId and text
]
I need to delete all the items where itemId equals a given value. e.g. 12345
I have this working SQL:
UPDATE timeline
SET items = items::jsonb - cast((
SELECT position - 1 timeline, jsonb_array_elements(items::jsonb)
WITH ORDINALITY arr(item_object, position)
WHERE item_object->>'itemId' = '12345') as int)
It works fine. It only fails when no items are returned by the subquery i.e. when there are no items whose itemId equals '12345'. In those cases, I get this error:
null value in column "items" violates not-null constraint
How could I solve this?

Try this:
update timeline
set items=(select
json_agg(j)
from json_array_elements(items) j
where j->>'itemId' not in ( '12345')
);
DEMO

The problem is that when null is passed to the - operator, it results in null for the expression. That not only violates your not null constraint, but it is probably also not what you are expecting.
This is a hack way of getting past it:
UPDATE timeline
SET items = items::jsonb - coalesce(
cast((
SELECT position - 1 timeline, jsonb_array_elements(items::jsonb)
WITH ORDINALITY arr(item_object, position)
WHERE item_object->>'itemId' = '12345') as int), 99999999)
A more correct way to do it would be to collect all of the indexes you want to delete with something like the below. If there is the possibility of more than one userId: 12345 within a single user_id row, then this will either fail or mess up your items(I have to test to see which), but at least it updates only rows with the 12345 records.
WITH deletes AS (
SELECT t.user_id, e.rn - 1 as position
FROM timeline t
CROSS JOIN LATERAL JSONB_ARRAY_ELEMENTS(t.items)
WITH ORDINALITY as e(jobj, rn)
WHERE e.jobj->>'itemId' = '12345'
)
UPDATE timeline
SET items = items - d.position
FROM deletes d
WHERE d.user_id = timeline.user_id;

Related

Ecto Query: Count/1 of left_joined value multiplied by outer_join value count

I have an items postgres table that has_many bookmarks and has_many notes. I am querying these with {:ecto_sql, "~> 3.7"}.
I want a query that returns all finished items that have notes, and I want that query to also count that item's bookmarks.
When I left_join a note's bookmarks and select_merge count(bookmarks), I get the proper count, but when I add an inner_join notes to the item, the count of bookmarks is multiplied by the number of notes, e.g. if an item has 2 bookmarks and 4 notes, bookmark_count will be 8 when it should be 2.
Here is my funky ecto query:
from item in Item,
where: item.finished == true,
left_join: bookmark in assoc(item, :bookmarks),
on: bookmark.item_id == item.id and bookmark.deleted == false,
select_merge: %{bookmark_count: count(bookmark)},
inner_join: note in assoc(item, :notes),
on: note.accepted == true
Many thanks in advance for feedback/guidance!
Basically: aggregate the N-side before joining to avoid multiplying rows from the main table. Faster, too. See:
Two SQL LEFT JOINS produce incorrect result
Use a semi-join for notes with EXISTS to only verify the existence of a related qualifying row. This also never multiplies rows.
This query should implement your objective:
SELECT i.*, COALESCE(b.ct, 0) AS bookmark_count
FROM items i
LEFT JOIN (
SELECT b.item_id AS id, count(*) AS ct
FROM bookmarks b
WHERE NOT b.deleted
GROUP BY 1
) b USING (id)
WHERE i.finished
AND EXISTS (
SELECT FROM notes n
WHERE n.item_id = i.id
AND n.accepted
);
I slipped in a couple other minor improvements.

How to group results by values that are inside json array in postgreSQL

I have a column of type jSONB that have data like this:
column name: used_filters
row number 1 example:
{ "categories" : ["economic", "Social"], "tags": ["world" ,"eco-friendly"] }
row number 2 example:
{ "categories" : ["economic"], "tags": ["eco-friendly"] , "keywords" : ["2050"] }
I want to group the result to get the most frequent value for each one of the keys
something like this:
key
most_freq
category
economic
tags
eco-friendly
keyword
2050
the keys are not constant and could be something other than the example I said but I know that they will be frequent.
You can extract keys and values as arrays first by using jsonb_each, and then unnest the generated arrays by jsonb_array_elements_text. The rest is classical aggregation along with sorting through the count values by window function such as
SELECT key, value
FROM ( SELECT j.key, jj.value,
RANK() OVER (PARTITION BY j.key ORDER BY COUNT(*) DESC)
FROM t,
LATERAL jsonb_each(js) AS j,
LATERAL jsonb_array_elements_text(j.value) AS jj
GROUP BY j.key, jj.value ) AS q
WHERE rank = 1
Demo

postgresql Looping through JSONB array and performing SELECTs

I have jsonb in one of my table
the jsonb looks like this
my_data : [
{pid: 1, stock: 500},
{pid: 2, stock: 1000},
...
]
pid refers to products' table id ( which is pid )
EDIT: The table products has following properties: pid (PK), name
I want to loop over my_data[] in my JSONB and fetch pid's name from product table.
I need the result to look something like this (including the product names from the second table) ->
my_data : [
{
product_name : "abc",
pid: 1,
stock : 500
},
...
]
How should I go about performing such jsonb inner join?
Edit :- tried S-Man's solutions and i'm getting this error
"invalid reference to FROM-clause entry for table \"jc\""
here is the
SQL QUERY
step-by-step demo:db<>fiddle
SELECT
jsonb_build_object( -- 5
'my_data',
jsonb_agg( -- 4
elems || jsonb_build_object('product_name', mot.product_name) -- 3
)
)
FROM
mytable,
jsonb_array_elements(mydata -> 'my_data') as elems -- 1
JOIN
my_other_table mot ON (elems ->> 'pid')::int = mot.pid -- 2
Expand JSON array into one row per array element
Join the other table against the current one using the pid values (notice the ::int cast, because otherwise it would be text value)
The new columns from the second table now can be converted into a JSON object. This one can be concatenate onto the original one using the || operator
After that recreating the array from the array elements again
Putting this in array into a my_data element
Another way is using jsonb_set() instead of step 5 do reset the array into the original array directly:
step-by-step demo:db<>fiddle
SELECT
jsonb_set(
mydata,
'{my_data}',
jsonb_agg(
elems || jsonb_build_object('product_name', mot.product_name)
)
)
FROM
mytable,
jsonb_array_elements(mydata -> 'my_data') as elems
JOIN
my_other_table mot ON (elems ->> 'pid')::int = mot.pid
GROUP BY mydata

postgres - How to update the same record twice in a joined update query

I'm trying to write the following migration that drops the conversations_users table (which was a join table that included a read_up_to column) and copies the read_up_to information into the new messages.read_by_user_ids array. For every 1 message row there are at least 2 conversations_users rows, so this join is repeating messages. I expected the following expression to work, but it's only assigning one user_id to the read_by_user_ids array, and I'm guessing that's because the update isn't happening sequentially.
Result:
message_id: 1, read_by_user_ids: { 15 }
Desired result: message_id: 1, read_by_user_ids: { 15, 19 }
UPDATE
messages as m
SET
read_by_user_ids = CASE
WHEN cu.read_up_to >= m.created_at THEN array_append(
COALESCE(m.read_by_user_ids, '{}'),
cu.user_id
) ELSE m.read_by_user_ids
END
FROM
conversations_users cu
WHERE
cu.conversation_id = m.thread_id;
I'm on my phone, so apologies for untested typos.
As per my comment, aggregate the individual incoming user_ids into one array per conversation. Then use array_cat to combine the two arrays.
This way you only need to do one update per target row.
I also noticed that you only want to update rows based on a date comparison, so I added that to the sub query I proposed.
UPDATE
messages as m
SET
read_by_user_ids = array_cat(
COALESCE(m.read_by_user_ids, '{}'),
cu.user_id_array
)
FROM
(
SELECT
cu.conversation_id,
array_agg(cu.user_id) AS user_id_array
FROM
messages
INNER JOIN
conversations_users
ON cu.conversation_id = m.thread_id
AND cu.read_up_to >= m.created_at
GROUP BY
cu.conversation_id
)
cu
WHERE
cu.conversation_id = m.thread_id;
There are many other options on how to generate the array in the sub-query. Which is the most efficient will depend on the profile of your data, indexes, etc. But the principle remains the same; updating the same row multiple times in a single statement doesn't work, you need to update each row once, with an array as the input.

Help with a complex join query

Keep in mind I am using SQL 2000
I have two tables.
tblAutoPolicyList contains a field called PolicyIDList.
tblLossClaims contains two fields called LossPolicyID & PolicyReview.
I am writing a stored proc that will get the distinct PolicyID from PolicyIDList field, and loop through LossPolicyID field (if match is found, set PolicyReview to 'Y').
Sample table layout:
PolicyIDList LossPolicyID
9651XVB19 5021WWA85, 4421WWA20, 3314WWA31, 1121WAW11, 2221WLL99 Y
5021WWA85 3326WAC35, 1221AXA10, 9863AAA44, 5541RTY33, 9651XVB19 Y
0151ZVB19 4004WMN63, 1001WGA42, 8587ABA56, 8541RWW12, 9329KKB08 N
How would I go about writing the stored proc (looking for logic more than syntax)?
Keep in mind I am using SQL 2000.
Select LossPolicyID, * from tableName where charindex('PolicyID',LossPolicyID,1)>0
Basically, the idea is this:
'Unroll' tblLossClaims and return two columns: a tblLossClaims key (you didn't mention any, so I guess it's going to be LossPolicyID) and Item = a single item from LossPolicyID.
Find matches of unrolled.Item in tblAutoPolicyList.PolicyIDList.
Find matches of distinct matched.LossPolicyID in tblLossClaims.LossPolicyID.
Update tblLossClaims.PolicyReview accordingly.
The main UPDATE can look like this:
UPDATE claims
SET PolicyReview = 'Y'
FROM tblLossClaims claims
JOIN (
SELECT DISTINCT unrolled.LossPolicyID
FROM (
SELECT LossPolicyID, Item = itemof(LossPolicyID)
FROM unrolling_join
) unrolled
JOIN tblAutoPolicyList
ON unrolled.ID = tblAutoPolicyList.PolicyIDList
) matched
ON matched.LossPolicyID = claims.LossPolicyID
You can take advantage of the fixed item width and the fixed list format and thus easily split LossPolicyID without a UDF. I can see this done with the help of a number table and SUBSTRING(). unrolling_join in the above query is actually tblLossClaims joined with the number table.
Here's the definition of unrolled 'zoomed in':
...
(
SELECT LossPolicyID,
Item = SUBSTRING(LossPolicyID,
(v.number - 1) * #ItemLength + 1,
#ItemLength)
FROM tblLossClaims c
JOIN master..spt_values v ON v.type = 'P'
AND v.number BETWEEN 1 AND (LEN(c.LossPolicyID) + 2) / (#ItemLength + 2)
) unrolled
...
master..spt_values is a system table that is used here as the number table. Filter v.type = 'P' gives us a rowset with number values from 0 to 2047, which is narrowed down to the list of numbers from 1 to the number of items in LossPolicyID. Eventually v.number serves as an array index and is used to cut out single items.
#ItemLength is of course simply LEN(tblAutoPolicyList.PolicyIDList). I would probably also declared #ItemLength2 = #ItemLength + 2 so it wasn't calculated every time when applying the filter.
Basically, that's it, if I haven't missed anything.
If the PolicyIDList field is a delimited list, you have to first separate the individual policy IDs and create a temporary table with all of the results. Next up, use an update query on the tblLossClaims with 'where exists (select * from #temptable tt where tt.PolicyID = LossPolicyID).
Depending on the size of the table/data, you might wish to add an index to your temporary table.