SQL Rows to Json Array with grouping and aggregation - sql

My goal is to take data from a row with a specific ID and convert it into a JSON object to insert into another table. What I'm starting with looks like this
Event_Details
-----------------------------------
ID | ID2 | First_Name| Last_Name |
-----------------------------------
1X | 2B | John | Smith |
2X | 2B | Adam | John |
3X | 2B | Sarah | Jones |
1X | 5C | Joe | Rob |
What I want looks like this:
[
{
"id2": "2B",
"event": {
"ID": "1X",
"First_Name": "John",
"Last_Name": "Smith"
}
},
{
"id2": "5C",
"event": {
"ID": "1X",
"First_Name": "Joe",
"Last_Name": "Rob"
}
}
]
I need to group the items into a single JSON object by "ID" but I want the id2 outside of the "Event" array.
This is what I have so far which does the first thing, I'm just having trouble nesting the query for the array inside of it:
select json_agg (b)
from (select ID2 as "ID2"
from event_details
)b

I believe this is what you are looking for:
select json_agg(jsonb_build_object('id2', id2,
'event', jsonb_build_object('ID', id,
'First_Name', first_name,
'Last_Name', last_name
)))
from event_details group by id;

Related

Returning a complex sql with joins as an array of json objects

Using a sql query, how can I return an array of json objects that looks like this:
{
"result":[
{
"RentBookRegistrationId":1,
"date":"15-08-2022",
"PersonName":"Peter",
"Books":[
{
"name":"Ulysses"
},
{
"name":"Hamlet
}
],
"Processes":[
{
"no":1,
"name":"Online booking"
},
{
"no":2,
"name":"Reserved beforehand"
},
{
"no":4,
"name":"Vending machined used"
}
]
}
]
}
From a SQL Server database that looks like this:
Table: RentBookRegistration
+----+------------+-----------------------------+
| id | date | person |
+----+------------+-----------------------------+
| 1 | 15-08-2022 | {"no": 1, "name": "Peter"} |
+----+-------- ---+-----------------------------+
| 2 | 16-08-2022 | {"no": 2, "name": "Anna"} |
+----+------------+-----------------------------+
| 3 | 17-08-2022 | {"no": 1, "name": "Peter"} |
+----+------------+-----------------------------+
| 4 | 17-08-2022 | {"no": 2, "name": "Mark"} |
+----+------------+-----------------------------+
Table: BookData
+----+------------------------+-----------------------------------------------------------+
| id | rentBookRegistrationId | book |
+----+------------------------+-----------------------------------------------------------+
| 1 | 1 | {"name": "Ulysses", "author": "James Joyce", "year": 1918}|
+----+------------------------+-----------------------------------------------------------+
| 2 | 1 | {"name": "Hamlet", "author": "Shakespeare", "year": 1601} |
+----+------------------------+-----------------------------------------------------------+
| 3 | 2 | {"name": "Dune", "author": "Frank Herbert", "year": 1965} |
+----+------------------------+-----------------------------------------------------------+
| 4 | 3 | {"name": "Hamlet", "author": "Shakespeare", "year": 1601} |
+----+------------------------+-----------------------------------------------------------+
| 5 | 4 | {"name": "Hamlet", "author": "Shakespeare", "year": 1601} |
+----+------------------------+-----------------------------------------------------------+
Table: ProcessData
+----+------------------------+-----------------------------------------------------------+
| id | rentBookRegistrationId | processUsed |
+----+------------------------+-----------------------------------------------------------+
| 1 | 1 | {"no": 1, "name": "Online booking"} |
+----+------------------------+-----------------------------------------------------------+
| 2 | 1 | {"no": 2, "name": "Reserved beforehand"} |
+----+------------------------+-----------------------------------------------------------+
| 3 | 1 | {"no": 4, "name": "Vending machined used"} |
+----+------------------------+-----------------------------------------------------------+
| 3 | 2 | {"no": 1, "name": "Online booking"} |
+----+------------------------+-----------------------------------------------------------+
| 4 | 2 | {"no": 4, "name": "Vending machined used"} |
+----+------------------------+-----------------------------------------------------------+
| 5 | 3 | {"no": 2, "name": "Reserved beforehand"} |
+----+------------------------+-----------------------------------------------------------+
The table layout might seems a bit stupid, but they are simplified to make the question straightforward.
This is how far I've come so far:
select … from RentBookRegistration R where R.PersonName = 'Peter'
This isn't pretty. You are storing JSON data and then want to consume that JSOn data and turn it into different JSON data. Honestly, you would be better off storing your data in a normalised format, and then you can make the JSON more easily.
Saying that, you can do this, but you need to consume your JSON data first, with OPENJSON and then turn it back into JSON with FOR JSON:
SELECT (SELECT RBR.id AS RentBookRegistrationId,
RBR.date,
p.name,
(SELECT b.name
FROM dbo.BookData BD
CROSS APPLY OPENJSON(BD.book)
WITH (name nvarchar(50)) b
WHERE BD.rentBookRegistrationId = RBR.id
FOR JSON AUTO) AS Books,
(SELECT pU.no,
pU.name
FROM dbo.ProcessData PD
CROSS APPLY OPENJSON(PD.processUsed)
WITH (no int,
name nvarchar(50)) pU
WHERE PD.rentBookRegistrationId = RBR.id
FOR JSON AUTO) AS Processes
FROM OPENJSON(RBR.person)
WITH (no int,
name nvarchar(50)) p
FOR JSON PATH,ROOT('result'))
FROM dbo.RentBookRegistration RBR;
I assume you want 1 row per person.
db<>fiddle
If you can't fix the table structure, you can use JSON_VALUE and JSON_QUERY to do what you want to do. But I recommend fixing the table structures. Better performance, better resource use, and much simpler queries that are more human readable.
https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver16

Postgres SQL: How to query jsonb column having data in array

I have a table customrule with a structure
id. - int
name - varchar
actions. - jsonb
I already have read about the -> operator. But it seems not working in my case as I have data stored in as an array.
+-----------------------------------------------------------------------------------+
| Name | Id | Actions |
+-----------------------------------------------------------------------------------+
| CR-1 | 1 | [{"name": "Action1", "count": "1"},{"name": "Action2", "count": "2"}] |
+-------------------+---------------------------------------------------------------+
| CR-2 | 2 | [{"name": "Action5", "count": "1"},{"name": "Action4", "count": "2"}] |
+-----------------------------------------------------------------------------------+
| CR-3 | 3 | [{"name": "Action1", "count": "1"},{"name": "Action1", "count": "2"}] |
+-----------------------------------------------------------------------------------+
I want to query this data and get all records which have Action1 used in the actions column. Which should return row 1 and 3rd as a result.
You need to use the contains operator with an array parameter
select id, name, actions
from customrule
where actions #> '[{"name": "Action1"}]'
Online example

How to count instances inside json object array?

I have some table that includes an array of jsonb objects as a column:
| event_id | attendees |
|----------|----------------------------------------------------------------------------------------------------------------------------------------------|
| 1 | [{"name": "john smith", "username": "jsmith"}, {"name": "jeff jones", "username": "jjones"}, {"name": "steve woods", "username": "swoods"}] |
| 2 | [{"name": "al williams", "username": "awilliams"}, {"name": "james lee", "username": "jlee"}, {"name": "bob thomas", "username": "bthomas"}] |
| 3 | [{"name": "doug hanes", "username": "dhanes"}, {"name": "stan peters", "username": "speters"}, {"name": "jane kay", "username": "jkay"}] |
I would like to get the count of all attendees whose username matches some condition (let's say whose username starts with "j") for each event.
Looking at the documentation, I couldn't really find anything that I could use for jsonb object arrays. The closest thing I could see was the jsonb_array_elements function, but that returns a set and not individual values. so something like:
select event_id, count(jsonb_array_elements(attendees) ->> 'username')
from events
where jsonb_array_elements(attendees) ->> 'username' like 'a%'
group by event_id
would obviously not work. Is there something that would return this output (count of usernames that begin with j for each event):
| event_id | count |
|----------|-------|
| 1 | 2 |
| 2 | 1 |
| 3 | 1 |
Well, just split your SQL logic to two part.
As below, you can get the all username for each event_id,
select
event_id,
jsonb_array_elements(attendees) ->> 'username' as user_name
from
events;
event_id | user_name
----------+-----------
1 | jsmith
1 | jjones
1 | swoods
2 | awilliams
2 | jlee
2 | bthomas
3 | dhanes
3 | speters
3 | jkay
(9 rows)
And then we can calculate some statistics data of json elements for event_id dimension,for example, you want to get the username's number of each event_id whose username started with some character such as 'j', the complete SQL would be:
with tmp as (
select
event_id,
jsonb_array_elements(attendees) ->> 'username' as user_name
from
events
)
select
event_id,
count(1)
from
tmp
where
user_name like 'j%'
group by
event_id
order by
event_id;
event_id | count
----------+-------
1 | 2
2 | 1
3 | 1
(3 rows)

PostgreSQL set field of JSON object in JSON array

I have a table like this:
| id (SERIAL) | game (TEXT) | players (JSONB) |
+-------------+-------------+-----------------+
| 1 | chess | [{name: Joe, role: admin}, {name: Mike, role: user}] |
| 2 | football | [{name: Foo, role: user}, {name: Bar, role: user}] |
+-------------+-------------+-----------------+
I want to set the role of a player (Joe) to a certain value (user) in a certain game (chess), so the result should look like this:
| id (SERIAL) | game (TEXT) | players (JSONB) |
+-------------+-------------+-----------------+
| 1 | chess | [{name: Joe, role: user}, {name: Mike, role: user}] |
| 2 | football | [{name: Foo, role: user}, {name: Bar, role: user}] |
+-------------+-------------+-----------------+
Is it possible to achieve this with a single query?
This is possible by recreating the json array on each update.
SQL for table creation and example data insertion:
CREATE TABLE test_table(
id BIGSERIAL PRIMARY KEY ,
game TEXT,
players JSONB
);
INSERT INTO test_table(game, players)
VALUES
('chess', '[{"name": "Joe", "role": "admin"}, {"name": "Mike", "role": "user"}]'),
('football', '[{"name": "Foo", "role": "user"}, {"name": "Bar", "role": "user"}]');
The inserted data:
+----+----------+----------------------------------------------------------------------+
| id | game | players |
+----+----------+----------------------------------------------------------------------+
| 1 | chess | [{"name": "Joe", "role": "admin"}, {"name": "Mike", "role": "user"}] |
| 2 | football | [{"name": "Foo", "role": "user"}, {"name": "Bar", "role": "user"}] |
+----+----------+----------------------------------------------------------------------+
Update query:
WITH json_rows AS
(SELECT id, jsonb_array_elements(players) as json_data FROM test_table
WHERE game = 'chess'),
updated_rows AS (
SELECT
id,
array_to_json(array_agg(
CASE WHEN json_data -> 'name' = '"Joe"'
THEN jsonb_set(json_data, '{role}', '"user"')
ELSE json_data END)) as updated_json
FROM json_rows
GROUP BY id
)
UPDATE test_table SET players = u.updated_json
FROM updated_rows u
WHERE test_table.id = u.id;
Results of the query:
+----+----------+---------------------------------------------------------------------+
| id | game | players |
+----+----------+---------------------------------------------------------------------+
| 2 | football | [{"name": "Foo", "role": "user"}, {"name": "Bar", "role": "user"}] |
| 1 | chess | [{"name": "Joe", "role": "user"}, {"name": "Mike", "role": "user"}] |
+----+----------+---------------------------------------------------------------------+
The query works in the following way:
Convert the json array to json rows and filter them by the game property. This is done by creating the json_rows CTE.
Update the json data in the json rows where the user "Joe" is found.
Once you have the new json values, just do an update based on the id.
Note: As you can see, in the current implementation the json array gets recreated (only in the rows that need to be updated). This may cause a change in the order of the elements inside the array.

Rank over partition from postgresql in elasticsearch

We are facing a problem with migration a large data set into elasticsearch from postgres (backup or whatever).
We have schema similar like this
+---------------+--------------+------------+-----------+
| user_id | created_at | latitude | longitude |
+---------------+--------------+------------+-----------+
| 5 | 23.1.2015 | 12.49 | 20.39 |
+---------------+--------------+------------+-----------+
| 2 | 23.1.2015 | 12.42 | 20.32 |
+---------------+--------------+------------+-----------+
| 2 | 24.1.2015 | 12.41 | 20.31 |
+---------------+--------------+------------+-----------+
| 5 | 25.1.2015 | 12.45 | 20.32 |
+---------------+--------------+------------+-----------+
| 1 | 23.1.2015 | 12.43 | 20.34 |
+---------------+--------------+------------+-----------+
| 1 | 24.1.2015 | 12.42 | 20.31 |
+---------------+--------------+------------+-----------+
And we are able to find a latest position by created_at thanks to rank function in SQL
... WITH locations AS (
select user_id, lat, lon, rank() over (partition by user_id order by created_at) as r
FROM locations)
SELECT user_id, lat, lon FROM locations WHERE r = 1
and the result is only newest created locations for each user:
+---------------+--------------+------------+-----------+
| user_id | created_at | latitude | longitude |
+---------------+--------------+------------+-----------+
| 2 | 24.1.2015 | 12.41 | 20.31 |
+---------------+--------------+------------+-----------+
| 5 | 25.1.2015 | 12.45 | 20.32 |
+---------------+--------------+------------+-----------+
| 1 | 24.1.2015 | 12.42 | 20.31 |
+---------------+--------------+------------+-----------+
After we import the data into elasticsearch, our document model looks like:
{
"location" : { "lat" : 12.45, "lon" : 46.84 },
"user_id" : 5,
"created_at" : "2015-01-24T07:55:20.606+00:00"
}
etc...
I am looking for alternatives for this SQL query in elasticsearch query, I think it must be possible, but i did not find how yet.
You can achieve this using field collapsing clubbed with inner_hits.
{
"collapse": {
"field": "user_id",
"inner_hits": {
"name": "order by created_at",
"size": 1,
"sort": [
{
"created_at": "desc"
}
]
}
},
}
Detailed Article: https://blog.francium.tech/sql-window-function-partition-by-in-elasticsearch-c2e3941495b6
It is simple: if you want to find the oldest record (for a given id), you just need the records for which no older ones (with the same id) exist. (this assumes that for a given id, no records exist with the same created_at date)
SELECT * FROM locations ll
WHERE NOT EXISTS (
SELECT * FROM locations nx
WHERE nx.user_id = ll.user_id
AND nx.created_at > ll.created_at
);
EDITED (it appears the OP wants the newst observation, not the oldest)
use top_hits.
"aggs": {
"user_id": {
"terms": {"field": "user_id"},
"aggs": {
"top_location": {
"top_hits": {
"size": 1,
"sort": { "created_at": "asc" },
"_source": []
}
}
}
}
}