Recursive query using postgreSQL - sql

My dataBase contains data (Image for example) and this data can be modified by a program (Image processing for example) so I get a new image derived from the other, and this image could be modified as well, etc...
2 Images could also be used to create a new one, for example: image a + image b = image c
So in my dataBase I have a table call "Derived from" which contains 2 columns (previous_id, new_id), previous_id is the image before an image processing and new_id is the result. So I can have a "change history" like this:
+------------------+------------------+
| id_previous | id_new |
+------------------+------------------+
| a | c |
| b | c |
| c | d |
| d | e |
+------------------+------------------+
So my questions is:
Is it possible to make a recursive query to have all the history of an data ID ?
Something like this:
Select * from derived_from where id_new = 'e'
Should return (d,c,b,a)
Thank you for your help

Yes, you can achieve this with a recursive CTE:
with recursive r as (
select id_previous
from derived_from
where id_new = 'e'
union
select d.id_previous
from derived_from d
join r on id_new = r.id_previous
)
select id_previous
from r
http://rextester.com/NZKT73800
Notes:
UNION can stop the recursion even when you have loops. With UNION ALL, you should handle loops yourself, unless you are really sure you have no loops.
This will give you separate rows (one for each "ascendant"). You can aggregate this too, but it's typically much more easier to consume than comma separated lists or arrays.

You can use a recursive CTE:
with recursive cte as (
select df.id_new, df.id_previous as parent
from derived_from df
where df.id_new = 'e'
union all
select cte.id_new, df.id_previous
from cte join
derived_from df
on cte.parent = df.id_new
)
select id_new, array_agg(parent)
from cte
group by id_new;

Related

Use recursion to get all children of node in Bigquery SQL table

I'm working with a dataset in bigquery that has parent-child relationships, but doesn't indicate final_parent...
My data looks something like this:
| id | parent |
| -----| --------|
| AA | AB |
| AB | AC |
| .. | .. |
The rows are either questions or answers, all answers roll up to a single question, but the you can answer an answer so there is this recursive graph structure... What I want is to get all the answers to a single question, starting with the row id of that question...
I generated the following query - I think it is logically correct for the task:
WITH RECURSIVE tbl_1 AS(
(SELECT *
FROM source_table
WHERE (id = xxxxxxxxxxx) OR (parent = xxxxxxxxxxx))
UNION ALL
(SELECT *
FROM source_table
WHERE (parent IN (SELECT DISTINCT id FROM tbl_1)
AND (id NOT IN (SELECT DISTINCT id FROM tbl_1))))
)
SELECT *
FROM tbl_1
However I get the following error...
ERROR:
400 A recursive reference from inside an expression subquery is not allowed at [9:49]
I think this is just something that hasn't been implemented yet in bigquery? Any advice on how to do it despite this? Thanks so much!!
Try below
with recursive tbl as (
select *, 1 pos from your_table
where question not in (select answer from your_table)
union all
select t1.question, t2.answer, pos + 1
from tbl t1
join your_table t2
on t2.question = t1.answer
)
select question, string_agg(answer order by pos) answers
from tbl
group by question
for dummy data as in below example
the output is

Creating a category tree table from an array of categories in PostgreSQL

How to generate ids and parent_ids from the arrays of categories. The number or depth of subcategories can be anything between 1-10 levels.
Example PostgreSQL column. Datatype character varying array.
data_column
character varying[] |
----------------------------------
[root_1, child_1, childchild_1] |
[root_1, child_1, childchild_2] |
[root_2, child_2] |
I would like to convert the column of arrays into the table as shown below that I assume is called the Adjacency List Model. I know there is also the Nested Tree Sets Model and Materialised Path model.
Final output table
id | title | parent_id
------------------------------
1 | root_1 | null
2 | root_2 | null
3 | child_1 | 1
4 | child_2 | 2
5 | childchild_1 | 3
6 | childchild_2 | 3
Final output tree hierarchy
root_1
--child_1
----childchild_1
----childchild_2
root_2
--child_2
step-by-step demo: db<>fiddle
You can do this with a recursive CTE
WITH RECURSIVE cte AS
( SELECT data[1] as title, 2 as idx, null as parent, data FROM t -- 1
UNION
SELECT data[idx], idx + 1, title, data -- 2
FROM cte
WHERE idx <= cardinality(data)
)
SELECT DISTINCT -- 3
title,
parent
FROM cte
The starting query of the recursion: Get all root elements and data you'll need within the recursion
The recursive part: Get element of new index and increase the index
After recursion: Query the columns you finally need. The DISTINCT removes tied elements (e.g. two times the same root_1).
Now you have created the hierarchy. Now you need the ids.
You can generate them in many different ways, for example using the row_number() window function:
WITH RECURSIVE cte AS (...)
SELECT
*,
row_number() OVER ()
FROM (
SELECT DISTINCT
title,
parent
FROM cte
) s
Now, every row has its own id. The order criterion may be tweaked a little. Here we have only little chance to change this without any further information. But the algorithm stays the same.
With the ids of each column, we can create a self join to join the parent id by using the parent title column. Because a self join is a repetition of the select query, it makes sense to encapsulate it into a second CTE to avoid code replication. The final result is:
WITH RECURSIVE cte AS
( SELECT data[1] as title, 2 as idx, null as parent, data FROM t
UNION
SELECT data[idx], idx + 1, title, data
FROM cte
WHERE idx <= cardinality(data)
), numbered AS (
SELECT
*,
row_number() OVER ()
FROM (
SELECT DISTINCT
title,
parent
FROM cte
) s
)
SELECT
n1.row_number as id,
n1.title,
n2.row_number as parent_id
FROM numbered n1
LEFT JOIN numbered n2 ON n1.parent = n2.title

Get specific row from each group

My question is very similar to this, except I want to be able to filter by some criteria.
I have a table "DOCUMENT" which looks something like this:
|ID|CONFIG_ID|STATE |MAJOR_REV|MODIFIED_ON|ELEMENT_ID|
+--+---------+----------+---------+-----------+----------+
| 1|1234 |Published | 2 |2019-04-03 | 98762 |
| 2|1234 |Draft | 1 |2019-01-02 | 98762 |
| 3|5678 |Draft | 3 |2019-01-02 | 24244 |
| 4|5678 |Published | 2 |2017-10-04 | 24244 |
| 5|5678 |Draft | 1 |2015-05-04 | 24244 |
It's actually a few more columns, but I'm trying to keep this simple.
For each CONFIG_ID, I would like to select the latest (MAX(MAJOR_REV) or MAX(MODIFIED_ON)) - but I might want to filter by additional criteria, such as state (e.g., the latest published revision of a document) and/or date (the latest revision, published or not, as of a specific date; or: all documents where a revision was published/modified within a specific date interval).
To make things more interesting, there are some other tables I want to join in.
Here's what I have so far:
SELECT
allDocs.ID,
d.CONFIG_ID,
d.[STATE],
d.MAJOR_REV,
d.MODIFIED_ON,
d.ELEMENT_ID,
f.ID FILE_ID,
f.[FILENAME],
et.COLUMN1,
e.COLUMN2
FROM DOCUMENT -- Get all document revisions
CROSS APPLY ( -- Then for each config ID, only look at the latest revision
SELECT TOP 1
ID,
MODIFIED_ON,
CONFIG_ID,
MAJOR_REV,
ELEMENT_ID,
[STATE]
FROM DOCUMENT
WHERE CONFIG_ID=allDocs.CONFIG_ID
ORDER BY MAJOR_REV desc
) as d
LEFT OUTER JOIN ELEMENT e ON e.ID = d.ELEMENT_ID
LEFT OUTER JOIN ELEMENT_TYPE et ON e.ELEMENT_TYPE_ID=et.ID
LEFT OUTER JOIN TREE t ON t.NODE_ID = d.ELEMENT_ID
OUTER APPLY ( -- This is another optional 1:1 relation, but it's wrongfully implemented as m:n
SELECT TOP 1
FILE_ID
FROM DOCUMENT_FILE_RELATION
WHERE DOCUMENT_ID=d.ID
ORDER BY MODIFIED_ON DESC
) as df -- There should never be more than 1, but we're using TOP 1 just in case, to avoid duplicates
LEFT OUTER JOIN [FILE] f on f.ID=df.FILE_ID
WHERE
allDocs.CONFIG_ID = '5678' -- Just for testing purposes
and d.state ='Released' -- One possible filter criterion, there may be others
It looks like the results are correct, but multiple identical rows are returned.
My guess is that for documents with 4 revisions, the same values are found 4 times and returned.
A simple SELECT DISTINCT would solve this, but I'd prefer to fix my query.
This would be a classic row_number & partition by question I think.
;with rows as
(
select <your-columns>,
row_number() over (partion by config_id order by <whatever you want>) as rn
from document
join <anything else>
where <whatever>
)
select * from rows where rn=1

Hierarchy Without CTE - Get Direct Children

I have a table for assets:
id|name|parentId
The view I'm trying to build is for an asset is:
{
'Id': ......,
'Name': ....,
'ChildrenIds': []
}
I need a query that selects TOP 50 assets and its direct children (so results could be more than 50 results).
I have a CTE that works, but its slow (5 seconds, parentId & id is indexed):
WITH MyCte as
(
SELECT TOP 50 a.Id, a.Name, a.ParentAssetId
FROM assets a
UNION ALL
SELECT a2.AssetId, a2.ParentAssetId
FROM assets a2
INNER JOIN MyCte cte ON cte.Id = a2.ParentAssetId
)
SELECT * From MyCte;
This join query does half of what I want.
SELECT TOP 50 a.Id, a.Name, a.ParentAssetId
FROM assets a
LEFT JOIN assets a2 ON a2.ParentAssetId = a.Id
Problem with JOIN, it gives me 50 results, and that's it. I need the descendant info to build a view. I could do 2 queries, but I'd rather not do that.
Any suggestions?
Maybe there is a better way for me to build this view? Without the 50 + N requirement? You can use a GROUP BY with STRING_AGG, but I worry about the size limitation.
SAMPLE DATA:
1,Site1,NULL
2,Site2,1
3,Site3,1
4,Site4,2
5,Site5,NULL
TOP 3 ORDER BY id DESC results will return
1,Site1,NULL
2,Site2,1
3,Site3,1
4,Site4,2
BUT I guess ideally something like this:
1,Site1,NULL|2,Site2,1|3,Site3,1
2,Site2,1|4,Site4,2
3,Site3,1
You can use a temp table to achieve what you need.
SELECT TOP (50) a.Id, a.Name, a.ParentAssetId
INTO #Assets
FROM assets a;
INSERT INTO #Assets
SELECT a2.Id, a2.Name, a2.ParentAssetId
FROM #Assets a
JOIN assets a2 ON a2.ParentAssetId = a.Id;
SELECT *
FROM #Assets;
Note that this is not deterministic because there's no ORDER BY when using TOP.
You could use this CTE and make a view from it:
WITH MyCte as
(
SELECT TOP 50 a.Id, a.Name, a.ParentAssetId
FROM assets a
)
SELECT cte.*, a1.Id as ChildId, a1.Name as ChildName
FROM MyCte cte
INNER JOIN assets a1
ON a1.ParentAssetId=cte.Id
Admittedly this will give you a different kind of result set than the UNION CTE in your question, but I'm assuming that you can make a simple adjustment to your consumer application to handle it. It might even be easier/more performant for the app this way, since the relationships are present in the row, and don't have to be extrapolated.
That said, if you are working with a recent-enough Version of SQL Server, you might look into the built-in JSON functions, since it looks like that is the output you are ultimately trying to generate.
According to what you provide, and if I understand, I think you're looking for
WITH CTE AS
(
SELECT TOP 3 *
FROM T
ORDER BY ID DESC
)
SELECT *
FROM CTE
UNION
SELECT *
FROM T
WHERE ID IN (SELECT ParentId FROM CTE);
Returns:
+----+-------+----------+
| ID | Name | ParentId |
+----+-------+----------+
| 1 | Site1 | |
| 2 | Site2 | 1 |
| 3 | Site3 | 1 |
| 4 | Site4 | 2 |
| 5 | Site5 | |
+----+-------+----------+
Here is a db<>fiddle
UPDATE:
Since you're looking for a way to pass an INT value present the rows number used in TOP, you can create an inline table-valued function as
CREATE FUNCTION dbo.MyFunction (#Rows INT = 1)
RETURNS TABLE
AS
RETURN
(
WITH CTE AS
(
SELECT TOP (#Rows) *
FROM T
ORDER BY ID DESC
)
SELECT *
FROM CTE
UNION
SELECT *
FROM T
WHERE ID IN (SELECT ParentId FROM CTE)
);
and just call it as
SELECT *
FROM dbo.MyFunction(2)
Demo

Convert SQL to LINQ for same table query

I've been trying to write a linq query but the groupby performance is horrifically slow, so I wrote my query in SQL instead and it's really speady but I can't get linq pad to convert it to linq for me. Can any body help me convert this sql to Linq please:
(SELECT mm.rcount, * FROM
(SELECT m.TourID AS myId, COUNT(m.RecordType) AS rcount FROM
(
((SELECT *
FROM Bookings h
WHERE h.RecordType = 'H' AND h.TourArea like '%bull%')
union
(SELECT *
FROM Bookings t
WHERE t.RecordType = 'T' and t.TourGuideName like '%bull%'))
) m
group by m.TourID) mm
INNER JOIN Bookings b ON mm.myId= b.TourID
WHERE b.RecordType = 'H');
here's my LINQ effort but it takes like 20 seconds to iterate over 200 records:
var heads = from head in db.GetTable<BookingType>()
where head.RecordType == "H" &&
head.TourArea.Contains("bull")
select g;
var tgs = from tourguides in db.GetTable<BookingType>()
where tourguides.RecordType == "T" &&
tourguides.TourGuideName.Contains("bull")
select tourguides;
var all = heads.Union(tgs);
var groupedshit = from r in all
group r by r.BookingID into g
select g;
return heads;
Edit 1:
Here's my database structure:
BookingID [PK] | TourID | RecordType | TourArea | TourGuideName | ALoadOfOtherFields
And here's some sample data:
1 | 1 | H | Bullring | null
2 | 1 | T | null | Bulldog
3 | 2 | H | Bullring | null
4 | 2 | T | null | Bulldog
5 | 2 | T | null | bull stamp
There will only ever be a single H (head) record but could potentially have many T (tour guide) records. After the grouping if I select a new (like this question: How to use LINQ to SQL to create ranked search results?) on the .Contains('bull') with a .Count() I can then get ranked searching (which is the whole point of this exercise).
Edit 2:
I've added in a property for search rank in the class itself to avoid the problem of then converting my results into a key/value pair. I don't know if this is best practice but it works.
/// <summary>
/// Search Ranking
/// </summary>
public int? SearchRank { get; set; }
and then I execute a SQL query directly using linq-to-sql:
IEnumerable<BookingType> results = db.ExecuteQuery<BookingType>
("(SELECT mm.rcount AS SearchRank, b.* FROM (SELECT m.TourID AS myId, COUNT(m.RecordType) AS rcount FROM (((SELECT * FROM Bookings h WHERE h.RecordType = 'H' AND h.TourArea like '%{0}%') union (SELECT * FROM Bookings t WHERE t.RecordType = 'T' and t.TourGuideName like '%{0}%')) ) m group by m.TourID) mm INNER JOIN Bookings b ON mm.myId= b.TourID WHERE b.RecordType = 'H')", "bull");
I can add in as many 'AND's and 'OR's as I like now without Linq-to-sql going mental (the query it generated was a crazy 200 lines long!
Ranked Search viola!
You don't have to use union at all. you can use Where OR AND something like this should work:
var result= from b in DB.GetTable<Booking>()
where (b.recordType =="H" || b.recordType=="T")
&&b.TourArea.Contains("bull")
group b by b.Booking_Id into g
select g;
Why bother converting it? You can just call the SQl you have opptimized.