Bulk update in PostgreSQL by unnesting a JSON array - sql

I want to do a batch update in PostgreSQL in one go by passing in an array of JSON objects, but I am not sure how I should approach the problem.
An example:
[
{ "oldId": 25, "newId": 30 },
{ "oldId": 41, "newId": 53 }
]
should resolve as:
UPDATE table SET id = 30 WHERE id = 25 and UPDATE table SET id = 41 WHERE id = 53, in a single command, of course.

Use the function jsonb_array_elements() in the from clause:
update my_table
set id = (elem->'newId')::int
from jsonb_array_elements(
'[
{ "oldId": 25, "newId": 30 },
{ "oldId": 41, "newId": 50 }
]') as elem
where id = (elem->'oldId')::int
Note that if the column id is unique (primary key) the update may result in a duplication error depending on the data provided.
Db<>fiddle.

You need to unnest the array the cast the elements to the proper data type:
update the_table
set id = (x.item ->> 'newId')::int
from jsonb_array_elements('[...]') x(item)
where the_table.id = (x.item ->> 'oldId')::int

Related

How can I modify all values that match a condition inside a json array?

I have a table which has a JSON column called people like this:
Id
people
1
[{ "id": 6 }, { "id": 5 }, { "id": 3 }]
2
[{ "id": 2 }, { "id": 3 }, { "id": 1 }]
...and I need to update the people column and put a 0 in the path $[*].id where id = 3, so after executing the query, the table should end like this:
Id
people
1
[{ "id": 6 }, { "id": 5 }, { "id": 0 }]
2
[{ "id": 2 }, { "id": 0 }, { "id": 1 }]
There may be more than one match per row.
Honestly, I didnĀ“t tried any query since I cannot figure out how can I loop inside a field, but my idea was something like this:
UPDATE mytable
SET people = JSON_SET(people, '$[*].id', 0)
WHERE /* ...something should go here */
This is my version
SELECT VERSION()
+-----------------+
| version() |
+-----------------+
| 10.4.22-MariaDB |
+-----------------+
If the id values in people are unique, you can use a combination of JSON_SEARCH and JSON_REPLACE to change the values:
UPDATE mytable
SET people = JSON_REPLACE(people, JSON_UNQUOTE(JSON_SEARCH(people, 'one', 3)), 0)
WHERE JSON_SEARCH(people, 'one', 3) IS NOT NULL
Note that the WHERE clause is necessary to prevent the query replacing values with NULL when the value is not found due to JSON_SEARCH returning NULL (which then causes JSON_REPLACE to return NULL as well).
If the id values are not unique, you will have to rely on string replacement, preferably using REGEXP_REPLACE to deal with possible differences in spacing in the values (and also avoiding replacing 3 in (for example) 23 or 34:
UPDATE mytable
SET people = REGEXP_REPLACE(people, '("id"\\s*:\\s*)2\\b', '\\14')
Demo on dbfiddle
As stated in the official documentation, MySQL stores JSON-format strings in a string column, for this reason you can either use the JSON_SET function or any string function.
For your specific task, applying the REPLACE string function may suit your case:
UPDATE
mytable
SET
people = REPLACE(people, CONCAT('"id": ', 3, ' '), CONCAT('"id": ',0, ' '))
WHERE
....;

HiveQL: How to write a query to select and filter records based on nested JSON array values

In our logging database we store custom UI data as a serialized JSON string. I have been using lateral view json_tuple() to traverse the JSON object and extract nested values. However, I need to filter some of my query results based on whether an array of objects contains certain values or not. After doing some digging I think I need to use lateral view explode(), but I am not a HiveQL expert and I'm not sure exactly how to use this in the way I need.
EX: (simplified for clarity and brevity)
// ui_events table schema
eventDate, eventType, eventData
// serialized JSON string stored in eventData
{ foo: { bar: [{ x: 1, y: 0 }, { x: 0, y: 1 }] } }
// HiveQL query
select
eventDate,
x,
y
from ui_events
lateral view json_tuple(eventData, 'foo') as foo
lateral view json_tuple(foo, 'bar') as bar
// <-- how to select only sub-item(s) in bar where x = 0 and y = 1
where
eventType = 'custom'
and // <-- how to only return records where at least 1 `bar` item was found above?
Any help would be greatly appreciated. Thanks!
Read comments in the code. You can filter the dataset as you want:
with
my_table as(
select stack(2, '{ "foo": { "bar": [{ "x": 1, "y": 0 }, { "x": 0, "y": 1 }] } }',
'{ "foo": { } }'
) as EventData
)
select * from
(
select --get_json_object returns string, not array.
--remove outer []
--and replace delimiter between },{ with ,,,
--to be able to split array
regexp_replace(regexp_replace(get_json_object(EventData, '$.foo.bar'),'^\\[|\\]$',''),
'\\},\\{', '},,,{'
)bar
from my_table t
) s --explode array
lateral view explode (split(s.bar,',,,')) b as bar_element
--get struct elements
lateral view json_tuple(b.bar_element, 'x','y') e as x, y
Result:
s.bar b.bar_element e.x e.y
{"x":1,"y":0},,,{"x":0,"y":1} {"x":1,"y":0} 1 0
{"x":1,"y":0},,,{"x":0,"y":1} {"x":0,"y":1} 0 1

Querying an array of objects in JSONB

I have a table with a column of the data type JSONB. Each row in the column has a JSON that looks something like this:
[
{
"A":{
"AA": "something",
"AB": false
}
},
{
"B": {
"BA":[
{
"BAAA": [1,2,3,4]
},
{
"BABA": {
....
}
}
]
}
}
]
Note: the JSON is a complete mess of lists and objects, and it has a total of 300 lines. Not my data but I am stuck with it. :(
I am using postgresql version 12
How would I write the following queries:
Return all row that has the value of AB set to false.
Return the values of BAAA is each row.
You can find the AB = false rows with a JSON Path query:
select *
from test
where data ## '$[*].A.AB == false'
If you don't know where exactly the key AB is located, you can use:
select *
from test
where data ## '$[*].**.AB == false'
To display all elements from the array as rows, you can use:
select id, e.*
from test
cross join jsonb_array_elements(jsonb_path_query_first(data, '$[*].B.BA.BAAA')) with ordinality as e(item, idx)
I include a column "id" as a placeholder for the primary key column, so that the source of the array element can be determined in the output.
Online example

Postgres WHERE array contains empty string

I'm trying to do select * from demo where demojson->'sub'->'item' = array("") but this doesn't work. I'd like to find the following
All rows where .sub.item in the JSON column is an array containing exactly one empty string ([""])
All rows where .sub.item in the JSON column is an array that may contain more than one item, but at least one of the items is an empty string. (["not empty", "also not empty", ""])
demojson column could contain for example
{
"key": "value",
"sub": {
"item": [""]
}
}
Have you tried
SELECT * from demo
WHERE demojson->'sub'->>'item' = '[""]';
Here ->> operator allows to get JSON object field as text.
And another solution
SELECT * from demo
WHERE json_array_length(demojson->'sub'->'item') = 1 AND
demojson->'sub'->'item'->>0 = '';
Here ->> operators allows to get JSON first array element as text.
Due JSONLint doesn't validate the supplied text example, I've used the next:
CREATE TABLE info (id int, j JSON);
insert into info values
(1, '{"key":"k1", "sub": {"item":["i1","i2"]}}'),
(2, '{"key":"k2", "sub": {"item":[""]}}'),
(3, '{"key":"k3", "sub": {"item":["i2","i3"]}}');
Using the where clause in this way, it works:
select * from info
where j->'sub'->>'item' = '[""]';
+----+------------------------------------+
| id | j |
+----+------------------------------------+
| 2 | {"key":"k2", "sub": {"item":[""]}} |
+----+------------------------------------+
Can check it here: http://rextester.com/VEPY57423
Try the following:
SELECT * FROM demo
WHERE demojson->'sub'->'item' = to_jsonb(ARRAY['']);

Complex count based on the latest date of a month

I have a model:
class HistoricalRecord(models.Model):
history_id = models.CharField(max_length=32)
type = models.CharField(max_length=8)
history_date = models.DateField()
How can I get the count of each type of HistoricalRecord by getting only the latest object (based on the history_id) for a given month. For example:
With these example objects:
HistoricalRecord.objects.create(history_id="ABC1", type="A", history_date=date(2000, 10, 5))
HistoricalRecord.objects.create(history_id="ABC1", type="A", history_date=date(2000, 10, 27))
HistoricalRecord.objects.create(history_id="DEF1", type="A", history_date=date(2000, 10, 16))
HistoricalRecord.objects.create(history_id="ABC1", type="B", history_date=date(2000, 10, 8))
The result should be:
[
{
"type": "A",
"type_count": 2
},
{
"type": "B",
"type_count": 0
}
]
"A" is 2 because the latest HistoryRecord object with history_id "ABC1" is on the 27th and the type is A; the other one is the record with history_id "DEF1".
I've tried:
HistoricalRecord.objects.filter(history_date__range(month_start, month_end)).order_by("type").values("type").annotate(type_count=Count("type"))
but obviously this is incorrect since it gets all the values for the month. The structure of the result doesn't have to be exactly like above, as long as it clearly conveys the count of each type.
This can likely be done with .extra(), add this to the query:
.extra(
where=["""history_date = (SELECT MAX(history_date) FROM historical_record hr
WHERE hr.history_id = historical_record.history_id
AND hr.history_date < %s)"""],
params=[month_end]
)