Rank over partition from postgresql in elasticsearch - sql

We are facing a problem with migration a large data set into elasticsearch from postgres (backup or whatever).
We have schema similar like this
+---------------+--------------+------------+-----------+
| user_id | created_at | latitude | longitude |
+---------------+--------------+------------+-----------+
| 5 | 23.1.2015 | 12.49 | 20.39 |
+---------------+--------------+------------+-----------+
| 2 | 23.1.2015 | 12.42 | 20.32 |
+---------------+--------------+------------+-----------+
| 2 | 24.1.2015 | 12.41 | 20.31 |
+---------------+--------------+------------+-----------+
| 5 | 25.1.2015 | 12.45 | 20.32 |
+---------------+--------------+------------+-----------+
| 1 | 23.1.2015 | 12.43 | 20.34 |
+---------------+--------------+------------+-----------+
| 1 | 24.1.2015 | 12.42 | 20.31 |
+---------------+--------------+------------+-----------+
And we are able to find a latest position by created_at thanks to rank function in SQL
... WITH locations AS (
select user_id, lat, lon, rank() over (partition by user_id order by created_at) as r
FROM locations)
SELECT user_id, lat, lon FROM locations WHERE r = 1
and the result is only newest created locations for each user:
+---------------+--------------+------------+-----------+
| user_id | created_at | latitude | longitude |
+---------------+--------------+------------+-----------+
| 2 | 24.1.2015 | 12.41 | 20.31 |
+---------------+--------------+------------+-----------+
| 5 | 25.1.2015 | 12.45 | 20.32 |
+---------------+--------------+------------+-----------+
| 1 | 24.1.2015 | 12.42 | 20.31 |
+---------------+--------------+------------+-----------+
After we import the data into elasticsearch, our document model looks like:
{
"location" : { "lat" : 12.45, "lon" : 46.84 },
"user_id" : 5,
"created_at" : "2015-01-24T07:55:20.606+00:00"
}
etc...
I am looking for alternatives for this SQL query in elasticsearch query, I think it must be possible, but i did not find how yet.

You can achieve this using field collapsing clubbed with inner_hits.
{
"collapse": {
"field": "user_id",
"inner_hits": {
"name": "order by created_at",
"size": 1,
"sort": [
{
"created_at": "desc"
}
]
}
},
}
Detailed Article: https://blog.francium.tech/sql-window-function-partition-by-in-elasticsearch-c2e3941495b6

It is simple: if you want to find the oldest record (for a given id), you just need the records for which no older ones (with the same id) exist. (this assumes that for a given id, no records exist with the same created_at date)
SELECT * FROM locations ll
WHERE NOT EXISTS (
SELECT * FROM locations nx
WHERE nx.user_id = ll.user_id
AND nx.created_at > ll.created_at
);
EDITED (it appears the OP wants the newst observation, not the oldest)

use top_hits.
"aggs": {
"user_id": {
"terms": {"field": "user_id"},
"aggs": {
"top_location": {
"top_hits": {
"size": 1,
"sort": { "created_at": "asc" },
"_source": []
}
}
}
}
}

Related

Create a nested json with column values as key-value pairs

I am trying to build a JSON from the following tables
table : car_makers
+------+-------------+---------+
| cmid | companyname | country |
+------+-------------+---------+
| 1 | Toyota | Japan |
| 2 | Volkswagen | Germany |
| 3 | Nissan | Japan |
+------+-------------+---------+
Table : cars
+------+---------+-----------+
| cmid | carname | cartype |
+------+---------+-----------+
| 1 | Camry | Sedan |
| 1 | Corolla | Sedan |
| 2 | Golf | Hatchback |
| 2 | Tiguan | SUV |
| 3 | Qashqai | SUV |
+------+---------+-----------+
I am trying to create a nested JSON of this structure :
{
"companyName": "Volkswagen",
"carType": "Germany",
"cars": {
"Tiguan": "SUV",
"Golf": "Hatchback"
}
}
but the best I could do with the this query
select json_build_object('companyName',companyName, 'carType', country, 'cars', JSON_AGG(json_build_object('carName', carName, 'carType', carType) ))
from car_makers cm
join cars c on c.cmid = cm.cmid
group by companyName,country
is this -
{
"companyName": "Volkswagen",
"carType": "Germany",
"cars": [
{
"carName": "Tiguan",
"carType": "SUV"
},
{
"carName": "Golf",
"carType": "Hatchback"
}
]
}
So, how can I correct my current query to replace the nested json array with a json element of key-value pairs from column values ?
here is the fiddle with sample data and the query I have tried
You can use json_object_agg:
select json_build_object('companyName', c.companyName,
'country', c.country, 'cars', json_object_agg(c1.carName, c1.carType))
from car_makers c join cars c1 on c.cmid = c1.cmid
group by c.companyName, c.country
See fiddle.

Returning a complex sql with joins as an array of json objects

Using a sql query, how can I return an array of json objects that looks like this:
{
"result":[
{
"RentBookRegistrationId":1,
"date":"15-08-2022",
"PersonName":"Peter",
"Books":[
{
"name":"Ulysses"
},
{
"name":"Hamlet
}
],
"Processes":[
{
"no":1,
"name":"Online booking"
},
{
"no":2,
"name":"Reserved beforehand"
},
{
"no":4,
"name":"Vending machined used"
}
]
}
]
}
From a SQL Server database that looks like this:
Table: RentBookRegistration
+----+------------+-----------------------------+
| id | date | person |
+----+------------+-----------------------------+
| 1 | 15-08-2022 | {"no": 1, "name": "Peter"} |
+----+-------- ---+-----------------------------+
| 2 | 16-08-2022 | {"no": 2, "name": "Anna"} |
+----+------------+-----------------------------+
| 3 | 17-08-2022 | {"no": 1, "name": "Peter"} |
+----+------------+-----------------------------+
| 4 | 17-08-2022 | {"no": 2, "name": "Mark"} |
+----+------------+-----------------------------+
Table: BookData
+----+------------------------+-----------------------------------------------------------+
| id | rentBookRegistrationId | book |
+----+------------------------+-----------------------------------------------------------+
| 1 | 1 | {"name": "Ulysses", "author": "James Joyce", "year": 1918}|
+----+------------------------+-----------------------------------------------------------+
| 2 | 1 | {"name": "Hamlet", "author": "Shakespeare", "year": 1601} |
+----+------------------------+-----------------------------------------------------------+
| 3 | 2 | {"name": "Dune", "author": "Frank Herbert", "year": 1965} |
+----+------------------------+-----------------------------------------------------------+
| 4 | 3 | {"name": "Hamlet", "author": "Shakespeare", "year": 1601} |
+----+------------------------+-----------------------------------------------------------+
| 5 | 4 | {"name": "Hamlet", "author": "Shakespeare", "year": 1601} |
+----+------------------------+-----------------------------------------------------------+
Table: ProcessData
+----+------------------------+-----------------------------------------------------------+
| id | rentBookRegistrationId | processUsed |
+----+------------------------+-----------------------------------------------------------+
| 1 | 1 | {"no": 1, "name": "Online booking"} |
+----+------------------------+-----------------------------------------------------------+
| 2 | 1 | {"no": 2, "name": "Reserved beforehand"} |
+----+------------------------+-----------------------------------------------------------+
| 3 | 1 | {"no": 4, "name": "Vending machined used"} |
+----+------------------------+-----------------------------------------------------------+
| 3 | 2 | {"no": 1, "name": "Online booking"} |
+----+------------------------+-----------------------------------------------------------+
| 4 | 2 | {"no": 4, "name": "Vending machined used"} |
+----+------------------------+-----------------------------------------------------------+
| 5 | 3 | {"no": 2, "name": "Reserved beforehand"} |
+----+------------------------+-----------------------------------------------------------+
The table layout might seems a bit stupid, but they are simplified to make the question straightforward.
This is how far I've come so far:
select … from RentBookRegistration R where R.PersonName = 'Peter'
This isn't pretty. You are storing JSON data and then want to consume that JSOn data and turn it into different JSON data. Honestly, you would be better off storing your data in a normalised format, and then you can make the JSON more easily.
Saying that, you can do this, but you need to consume your JSON data first, with OPENJSON and then turn it back into JSON with FOR JSON:
SELECT (SELECT RBR.id AS RentBookRegistrationId,
RBR.date,
p.name,
(SELECT b.name
FROM dbo.BookData BD
CROSS APPLY OPENJSON(BD.book)
WITH (name nvarchar(50)) b
WHERE BD.rentBookRegistrationId = RBR.id
FOR JSON AUTO) AS Books,
(SELECT pU.no,
pU.name
FROM dbo.ProcessData PD
CROSS APPLY OPENJSON(PD.processUsed)
WITH (no int,
name nvarchar(50)) pU
WHERE PD.rentBookRegistrationId = RBR.id
FOR JSON AUTO) AS Processes
FROM OPENJSON(RBR.person)
WITH (no int,
name nvarchar(50)) p
FOR JSON PATH,ROOT('result'))
FROM dbo.RentBookRegistration RBR;
I assume you want 1 row per person.
db<>fiddle
If you can't fix the table structure, you can use JSON_VALUE and JSON_QUERY to do what you want to do. But I recommend fixing the table structures. Better performance, better resource use, and much simpler queries that are more human readable.
https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver16

How to modify the following cypher syntax in AgensGraph?

MATCH (wu:wiki_user)
OPTIONAL MATCH (n:wiki_doc{author:wu.uid}), (o:wiki_doc{editor:wu.uid})
RETURN wu.uid AS User_id, wu.org AS Organization, wu.email AS email, wu.token AS balance,
count(n) AS Writing, count(o) AS Modifying;
user_id | organization | email | balance | writing | modifying
--------------------------------------------------------------------------
"ailee" | "Org2" | "hazel#gbc.com" | 5 | 0 | 0
"hazel" | "Org1" | "hazel#gbc.com" | 5 | 2 | 2
match (n:wiki_doc{editor:'hazel'}) return n;
n
wiki_doc[9.11]
{"bid": "hazel_doc1", "cid": "Basic", "org": "Org1", "title": "Hello world!",
"author": "hazel", "editor": "hazel", "revnum": 1, "created": "2018-09-25
09:00:000", "hasfile": 2, "contents": "I was wrong", "modified": "2018-09-25
10:00:000"}
(1 row)
In fact, the number of updates in the case of hazel is 1, and 2
queries are used when the above query is used.
How to modify the query so that only one can be normally viewed.
MATCH( wu:wiki_user )
OPTIONAL MATCH (n:wiki_doc{author:wu.uid}), (o:wiki_doc{editor:wu.uid})
RETURN wu.uid AS User_id, wu.org AS Organization, wu.email AS email, wu.token AS balance,
count(distinct id(n)) as Writing, count(distinct id(o)) as Modifying;
user_id | organization | email | balance | writing | modifying
+----------------------------------------------------------+
"ailee" | "Org2" | "hazel#gbc.com" | 5 | 0 | 0
"hazel" | "Org1" | "hazel#gbc.com" | 5 | 2 | 1
(2 rows)

Postgresql join on array and transform to json

I would like to make a join on array containing ids and transform the result of this subselect into json (json array).
I have the following model:

The lnam_refs column contains identifiers that are related to the lnam column
I would like transform the column lnam_refs into something like [row_to_json(), row_to_json()] or [] or [row_to_json()] or …
I tried several methods but I can not achieve a clean result…
To try to be clearer :
Table in input:
id | label | lnam | lnam_refs
--------+----------------------+----------+-----------------------
1 | 'master1' | 11111111 | {33333333}
2 | 'master2' | 22222222 | {44444444,55555555}
3 | 'slave1' | 33333333 | {}
4 | 'slave2' | 44444444 | {}
5 | 'slave3' | 55555555 | {}
6 | 'master3' | 66666666 | {}
Results Expected:
id | label | lnam | lnam_refs | slaves
--------+----------------------+----------+-----------------------+---------------------------------
1 | 'master1' | 11111111 | {33333333} | [ {id: 3, label: 'slave1', lnam: 33333333, lnam_refs: []} ]
2 | 'master2' | 22222222 | {44444444,55555555} | [ {id: 4, label: 'slave2', lnam: 44444444, lnam_refs: []}, {id: 5, label: 'slave3', lnam: 55555555, lnam_refs: []} ]
6 | 'master3' | 66666666 | {} | []
Thanks for your help !
Here's one way to do it. (I created a table called t with that data you supplied.)
SELECT *, (SELECT JSON_AGG(ROW_TO_JSON(t2)) FROM t t2 WHERE label LIKE 'slave%' AND lnam = ANY(t1.lnam_refs)) AS slaves
FROM t t1
WHERE label LIKE 'master%'
I use the label field in the WHERE clause as I don't know how else you're determining which records should be master etc.
Result:
1;master1;11111111;{33333333};[{"id":3,"label":"slave1","lnam":33333333,"lnam_refs":[]}]
2;master2;22222222;{44444444,55555555};[{"id":4,"label":"slave2","lnam":44444444,"lnam_refs":[]}, {"id":5,"label":"slave3","lnam":55555555,"lnam_refs":[]}]
6;master3;66666666;{};

How to create hierarchal json object using ltree query results? (postgresql)

I'm trying to create a storage system for custom categories using postgres.
After looking around for potential solutions I settled on trying to use ltree;
Here is an example of raw data below;
+----+---------+---------------------------------+-----------+
| id | user_id | path | name |
+----+---------+---------------------------------+-----------+
| 1 | 1 | root.test | test |
| 2 | 1 | root.test.inbox | inbox |
| 3 | 1 | root.personal | personal |
| 4 | 1 | root.project | project |
| 5 | 1 | root.project.idea | idea |
| 6 | 1 | root.personal.events | events |
| 7 | 1 | root.personal.events.janaury | january |
| 8 | 1 | root.project.objective | objective |
| 9 | 1 | root.personal.events.february | february |
| 10 | 1 | root.project.objective.january | january |
| 11 | 1 | root.project.objective.february | february |
+----+---------+---------------------------------+-----------+
I thought that it might be easier to first order the results, and remove the top level from the path return. Using;
select id, name, subpath(path, 1) as path, nlevel(subpath(path, 1)) as level from testLtree order by level, path
I get;
+----+-----------+----------------------------+-------+
| id | name | path | level |
+----+-----------+----------------------------+-------+
| 3 | personal | personal | 1 |
| 4 | project | project | 1 |
| 1 | test | test | 1 |
| 6 | events | personal.events | 2 |
| 5 | idea | project.idea | 2 |
| 8 | objective | project.objective | 2 |
| 2 | inbox | test.inbox | 2 |
| 9 | february | personal.events.february | 3 |
| 7 | january | personal.events.january | 3 |
| 11 | february | project.objective.february | 3 |
| 10 | january | project.objective.january | 3 |
+----+-----------+----------------------------+-------+
I'm hoping to be able to transform this result into a set of JSON data somehow. I would like an output similar to this;
personal: {
id: 3,
name: 'personal',
children: {
events: {
id: 6,
name: 'events',
children: {
january: {
id: 7,
name: 'january',
children: null
},
february: {
id: 9,
name: 'february',
children: null
}
}
}
}
},
project: {
id: 4,
name: 'project',
children: {
idea: {
id: 5,
name: 'idea',
children: null
},
objective: {
id: 8,
name: 'objective',
children: {
january: {
id: 10,
name: 'january',
children: null
},
february: {
id: 11,
name: 'february',
children: null
}
}
}
}]
},
test: {
id: 1,
name: 'test',
children: {
inbox: {
id: 2,
name: 'inbox',
children: null
}
}
}
I've been looking around for the best way to do this but haven't came across any solutions that make sense to me. However, as I am new to postgres and SQL in general this is expected.
I think I may have to use a recursive query? I'm a bit confused over what the best method/execution of this would be. Any help/advice is much appreciated! and any further questions please ask.
I've put everything into a sqlfiddle below;
http://sqlfiddle.com/#!17/1713e/5
I ran into the same problem as you. I had a large struggle with this in PostgreSQL and it became overly complex to solve. Since I'm using Django (Python framework), I decided to solve it using Python. In case it can help anyone in my same situation, I would like to share the code:
https://gist.github.com/eherrerosj/4685e3dc843e94f3ef8645d31dbe490c