Postgres self-join recursive CTE ancestry chain

Postgres self-join recursive CTE ancestry chain - sql

I have a pilates_bill table representing direct ancestry (not a tree structure)
bill_id (pk) | previous_bill_id (self-join fk)
=============+================================
1 2
2 3
3 4
5 NULL
Need to produce a list (parent / grandparent / etc) of all ancestors for any given row (below examples start with 1).
Obtaining a list of bill_ids with ancestor chain using recursive CTE
WITH RECURSIVE chain(from_id, to_id) AS (
SELECT NULL::integer, 1 -- starting id
UNION
SELECT c.to_id, pilates_bill.previous_bill_id
FROM chain c
LEFT OUTER JOIN pilates_bill ON (pilates_bill.bill_id = to_id)
WHERE c.to_id IS NOT NULL
)
SELECT from_id FROM chain WHERE from_id IS NOT NULL;
Result 1,2,3,4,5 as expected
But now when I try to produce table rows in order of ancestry the result is broken
SELECT * FROM pilates_bill WHERE bill_id IN
(
WITH RECURSIVE chain(from_id, to_id) AS (
SELECT NULL::integer, 1
UNION
SELECT c.to_id, pilates_bill.previous_bill_id
FROM chain c
LEFT OUTER JOIN pilates_bill ON (pilates_bill.bill_id = to_id)
WHERE c.to_id IS NOT NULL
)
SELECT from_id FROM chain WHERE from_id IS NOT NULL
)
Row order is 5,1,2,3,4
What a I doing wrong here ?

The rows returned by a SQL query are in random order unless you specify an order by.
You can calculate depth by keeping track of it in the recursive CTE:
WITH RECURSIVE chain(from_id, to_id, depth) AS
(
SELECT NULL::integer
, 1
, 1
UNION
SELECT c.to_id
, pb.previous_bill_id
, depth + 1
FROM chain c
LEFT JOIN
pilates_bill pb
ON pb.bill_id = c.to_id
WHERE c.to_id IS NOT NULL
)
SELECT *
FROM chain
ORDER BY
depth

Related

SQL Algorithm for grouping all "equivalent" strings

I have an input table with two columns, each holding a string representing an id_number, and these columns are called id1 and id2. The two id_numbers that appear as a pair in any given row are defined as being equivalent to each other. If one of those id_numbers also appears in another row, then then all the strings in both rows are equivalent to eachother, etc. The goal is to return a table with two columns, one containing all unique id_numbers and another identifying their grouping by equivalency.
Sample Data:
`
create table input_table (
id1 varchar(100),
id2 varchar(100)
)
insert into input_table(id1,id2)
values
('a','b'),
('b','c'),
('d','a'),
('a','b'),
('f','g'),
('f','k'),
('l','m')
Expected Output:
| Id | Grouping |
| a | 1 |
| b | 1 |
| c | 1 |
| d | 1 |
| f | 2 |
| g | 2 |
| k | 2 |
| l | 3 |
| m | 3 |
To further explain the results:
Row 1 tells us a=b so they are assigned to group 1
Row 2 tells us b=c, since b is already in group 1, c is also assigned to group 1
Row 3 tells us d=a, since a is already in group 1, d is also assigned to group 1
Row 4 tells us a=b, which we already know so we don't need to do anything
Row 5 tells us f=g, since neither are in an existing group, we assign them to group 2
etc.

Your table describes a nondirected graph, where the set (distinct values) of all Id1 and Id2 values over all tuples ("rows") represent nodes and the tuples themselves represent the node's edges (as they connect - or link - these Id values in a relationship) - which means we can apply graph-theory techniques to solve your problem, which can be stated as finding disconnected components and assigning them an identifier) within your graph (as each disconnected component represents each set of connected nodes).
...with that formalism out of the way, I'll say I don't think it's necessarily correct to use the word "equivalent" or "equivalence": just because a binary relation (each of your tuples) is transitive (i.e. nondirected node edges are a transitive relation) it says nothing about what the nodes themselves represent, so I'm guessing you meant to say the Id1/Id2 values represent things-that-are-equivalent in your problem-domain, which is fair, but in the context of my answer I'll refrain from using that term (not least because "set equivalence" is something else entirely)
ANYWAY...
Step 1: Normalize your bad data:
Assuming that those Id1/Id2 values are comparable values (as in: every value can be sorted consistently and deterministically by a less-than comparison), such as strings or integers, then do that first so we can generate a normalized representation of your data.
This normalized representation of your data would be the same as your current data, except that:
Every row's Id1 value must be < (i.e. less-than) its Id2 value.
There are no duplicate rows.
So if there's a row where Id2 < Id1 you should swap the values, so ( 'd', 'a' ) becomes ( 'a', 'd' ), and the duplicate ('a', 'b') row would be removed.
This also means that every (disconnected) component in the graph we're representing now has a minimum node which we can treat as the (arbitrary) "root" of that component - which means we will have a target to look for from any node (but we don't know which values are minimums in each component yet, hang on...)
To aid identification, let's call the smallest value in a pair the smol value and the big value.
(You can do this step as another CTE step, but for clarity I'm using a
table-variable #normalized, like so):
DECLARE #horribleInputData TABLE (
Id1 char(1) NOT NULL,
Id2 char(1) NOT NULL
);
INSERT INTO #horribleInputData ( Id1, Id2 ) VALUES
( 'a', 'b' ),
( 'b', 'c' ),
( 'd', 'a' ),
( 'a', 'b' ),
( 'f', 'g' ),
( 'f', 'k' ),
( 'l', 'm' );
--
DECLARE #normalized TABLE (
Smol char(1) NOT NULL,
Big char(1) NOT NULL,
INDEX IDX1 ( Smol ),
INDEX IDX2 ( Big ),
CHECK ( Smol < Big )
);
INSERT INTO #normalized ( Smol, Big )
SELECT
DISTINCT -- Exclude duplicate rows.
CASE WHEN Id1 < Id2 THEN Id1 ELSE Id2 END AS Smol, -- Sort the values horizontally, not vertically.
CASE WHEN Id1 < Id2 THEN Id2 ELSE Id1 END AS Big
FROM
#horribleInputData
WHERE
Id1 <> Id2; -- Exclude zero-information rows.
After running the above, the #normalized table looks like this:
Smol
Big
a
b
a
d
b
c
f
g
f
k
l
m
Step 2: Designate the minimum node as root in each disconnected component:
If we treat the node with the minimum-value as the "root" in a directed graph (or in this case, the minimum node in each disconnected component) then we can find connected nodes by finding a path from each node to that root. But what else defines a root? In this case, it would be a node with no "outgoing" references from Smol to Big - so we can simply take #normalized and do an anti-join to itself (i.e. for every Smol, see if that Smol is a Big to any other Smol - if none exist then that Smol is the Smollest - and is therefore a root:
SELECT
DISTINCT
l.Smol AS Smollest
FROM
#normalized AS l
LEFT OUTER JOIN #normalized AS r ON l.Smol = r.Big
WHERE
r.Big IS NULL;
Which gives us this output:
Smollest
a
f
l
...which tells us that there's 3 disconnected-components in this graph, and what those minimum nodes are.
Step 3: the srsbsns part
Which means we can now try to trace a route from each node to a root using a recursive CTE - which is a technique in SQL to traverse
hierarchical directed graphs (either from top-to-bottom, or bottom-to-top), so it's a good thing we converted the data into a directed graph first.
For example:
SQL Server CTE and recursion example
SQL Server 2012 CTE Find Root or Top Parent of Hierarchical Data
Identifying equivalent sets in SQL Server
Like so:
-- This `smolRoots` CTE is the same as the query from Step 2 above:
WITH smolRoots AS (
SELECT
DISTINCT
l.Smol AS Smollest
FROM
-- This is an anti-join (a LEFT OUTER JOIN combined with a IS NULL predicate):
#normalized AS l
LEFT OUTER JOIN #normalized AS r ON l.Smol = r.Big
WHERE
r.Big IS NULL
),
-- Generate a simple flat list of every Id1/Id2 value:
everyNode AS (
SELECT DISTINCT Smol AS Id FROM #normalized
UNION
SELECT DISTINCT Big AS Id FROM #normalized
),
-- Now do the tree-walk, it's like a nature-walk but more math-y:
recursiveCte AS (
-- Each root (Smol) value:
SELECT
CONVERT( char(1), NULL ) AS Smol,
s.Smol AS Big,
s.Smol AS SmolRoot,
0 AS Depth
FROM
smolRoots AS s
UNION ALL
-- Then recurisvely UNION ALL (concatenate) all other rows that can be connected by their Smol-to-Big values:
SELECT
f.Smol,
f.Big,
r.SmolRoot,
r.Depth + 1 AS Depth
FROM
#normalized AS f
INNER JOIN recursiveCte AS r ON f.Smol = r.Big
)
The above recursiveCte, if evalated as-is, will return this output:
Smol
Big
SmolRoot
Depth
NULL
a
a
0
NULL
f
f
0
NULL
l
l
0
l
m
l
1
f
g
f
1
f
k
f
1
a
b
a
1
a
d
a
1
b
c
a
2
Notice how for each row, each Smol and Big value is mapped to one of the 3 SmolRoot values identified in Step 2.
...so I hope you can now start to see how this works. But we're not done yet...
Step 3.2: I lied about Step 3 being Step 3, it's really Step 3.1:
We still need to now then query the recursiveCte's results to convert those SmolRoot values into unique identifiers for each disconnected component (i.e. each set) - so let's generate a new 1-based int value for each distinct value in SmolRoot - we can do this with ROW_NUMBER() but you could also use GENERATE_SERIES - and I'm sure other techniques exist to do this too:
WITH smolRoots AS ( /* same as above */ ),
everyNode AS ( /* same as above */ ),
recursiveCte AS ( /* same as above */ ),
numberForEachRoot AS ( -- Generate distinct numbers (from 1, 2, ...) for each SmolRoot number, i.e. a number for each disconnected-component or disjoint graph:
SELECT
SmolRoot,
ROW_NUMBER() OVER ( ORDER BY SmolRoot ) AS DisconnectedComponentNumber
FROM
recursiveCte AS r
GROUP BY
r.SmolRoot
And numberForEachRoot looks like this:
SmolRoot
DisconnectedComponentNumber
a
1
f
2
l
3
Step 5: There is no Step 4
Now ignore all of the above and start over with that everyNode CTE that was tucked-away in Step 2: Take the everyNode CTE and JOIN it to recursiveCte and numberForEachRoot to get the actual output you're after:
WITH
smolRoots AS ( /* same as above */ ),
everyNode AS ( /* same as above */ ),
recursiveCte AS ( /* same as above */ ),
numberForEachRoot AS ( /* same as above */ )
SELECT
e.Id,
n.DisconnectedComponentNumber
FROM
everyNode AS e
INNER JOIN recursiveCte AS r ON e.Id = r.Big
INNER JOIN numberForEachRoot AS n ON r.SmolRoot = n.SmolRoot
ORDER BY
e.Id;
Which gives us...
Id
DisconnectedComponentNumber
a
1
b
1
d
1
c
1
f
2
g
2
k
2
l
3
m
3
This technique also works on graphs with cycles, though you might get duplicate output rows in that case, but adjusting the above query to filter those out is a trivial exercise left to the reader.
Also it seems I spent too much time on this answer, that means the Vyvanse is working tonight. And yup, it's just gone 4am where I am right now, good job, me. Now where's my Ambien gone?
Step ∞: Just give me the solution I can copy-and-paste into my CS homework and/or Nissan car firmware
Alright, you asked for it...
DECLARE #horribleInputData TABLE (
Id1 char(1) NOT NULL,
Id2 char(1) NOT NULL
);
INSERT INTO #horribleInputData ( Id1, Id2 ) VALUES
('a','b'),
('b','c'),
('d','a'),
('a','b'),
('f','g'),
('f','k'),
('l','m'),
-- Also adding this to show that it works for cycles in graphs too:
( '0', '1' ),
( '1', '2' ),
( '2', '3' ),
( '3', '0' );
-------------
-- 1. Normalize to a table form that's easier to work with: given that transitivity exists, enforce `Id1 < Id2` so we can have a "direction" to look in.
-- This can be a CTE step instead of a TABLE too, if you dare. I'm curious what the execution plan would be in that case.
DECLARE #normalized TABLE (
Smol char(1) NOT NULL,
Big char(1) NOT NULL,
INDEX IDX1 ( Smol ),
INDEX IDX2 ( Big ),
CHECK ( Smol < Big )
);
INSERT INTO #normalized ( Smol, Big )
SELECT
DISTINCT
CASE WHEN Id1 < Id2 THEN Id1 ELSE Id2 END AS Smol,
CASE WHEN Id1 < Id2 THEN Id2 ELSE Id1 END AS Big
FROM
#horribleInputData
WHERE
Id1 <> Id2 -- Exclude zero-information rows.
ORDER BY
Smol, -- Make it easier to read, interactively.
Big;
/*
Smol Big
----------
a b
a d
b c
f g
f k
l m
*/
-- Also, just gonna bury this in here and see what happens:
DECLARE #superImportantPart nvarchar(300) = CONVERT( nvarchar(300), 0x4800450059002000450056004500520059004F004E00450020004900200043004F0050005900200041004E004400200050004100530054004500200043004F00440045002000460052004F004D00200053005400410043004B004F0056004500520046004C004F005700200057004900540048004F0055005400200055004E004400450052005300540041004E00440049004E00470020005700480041005400200049005400200044004F004500530020004F005200200048004F005700200049005400200057004F0052004B00530020004F005400480045005200570049005300450020004900200057004F0055004C004400200048004100560045002000520045004D004F005600450044002000540048004900530020005000520049004E0054002000530054004100540045004D0045004E005400 );
RAISERROR( #superImportantPart, /*severity:*/ 0, /*state:*/ 1 ) WITH NOWAIT;
-- Then trace a route from every Big to its smallest connected Smol.
-- Each Big sharing the same Smol is in the same connected graph, the set of distinct Smol nodes identifies each output set.
-- 1. Get all roots first: these will be `Smol` nodes that never appear in `Big`.
WITH smolRoots AS (
SELECT
DISTINCT
l.Smol AS Smollest
FROM
#normalized AS l
LEFT OUTER JOIN #normalized AS r ON l.Smol = r.Big
WHERE
r.Big IS NULL
/*
Smollest
-----
a
f
l
*/
),
everyNode AS (
SELECT DISTINCT Smol AS Id FROM #normalized
UNION
SELECT DISTINCT Big AS Id FROM #normalized
),
-- The tree-walk:
recursiveCte AS (
-- Each root (Smol) value:
SELECT
CONVERT( char(1), NULL ) AS Smol,
s.Smollest AS Big,
s.Smollest AS SmolRoot,
0 AS Depth
FROM
smolRoots AS s
UNION ALL
-- Then the magic happens...
SELECT
n.Smol,
n.Big,
r.SmolRoot,
r.Depth + 1 AS Depth
FROM
#normalized AS n
INNER JOIN recursiveCte AS r ON n.Smol = r.Big
/*
Smol Big SmolRoot Depth
-----------------------------
NULL a a 0
NULL f f 0
NULL l l 0
l m l 1
f g f 1
f k f 1
a b a 1
a d a 1
b c a 2
*/
),
numberForEachRoot AS ( -- Generate distinct numbers (from 1, 2, ...) for each SmolRoot number, i.e. a number for each disconnected-component or disjoint graph:
SELECT
SmolRoot,
ROW_NUMBER() OVER ( ORDER BY SmolRoot ) AS DisconnectedComponentNumber
FROM
recursiveCte AS r
GROUP BY
r.SmolRoot
/*
SmolRoot DisconnectedComponentNumber
-------------------
a 1
f 2
l 3
*/
)
-- Then ignore all of the above and start with `everyNode` and JOIN it to `recursiveCte` and `numberForEachRoot`:
SELECT
e.Id,
n.DisconnectedComponentNumber
FROM
everyNode AS e
INNER JOIN recursiveCte AS r ON e.Id = r.Big
INNER JOIN numberForEachRoot AS n ON r.SmolRoot = n.SmolRoot
ORDER BY
e.Id;
/*
Id DisconnectedComponentNumber
a 1
f 2
l 3
m 3
g 2
k 2
b 1
d 1
c 1
*/

How can I write a SQL query to calculate the quantity of components sold with their parent assemblies? (Postgres 11/recursive CTE?)

My goal
To calculate the sum of components sold as part of their parent assemblies.
I'm sure this must be a common use case, but I haven't yet found documentation that leads to the result I'm looking for.
Background
I'm running Postgres 11 on CentOS 7.
I have some tables like as follows:
CREATE TABLE the_schema.names_categories (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
thing_name TEXT NOT NULL,
thing_category TEXT NOT NULL
);
CREATE TABLE the_schema.relator (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
parent_name TEXT NOT NULL,
child_name TEXT NOT NULL,
child_quantity INTEGER NOT NULL
);
CREATE TABLE the_schema.sales (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
sold_name TEXT NOT NULL,
sold_quantity INTEGER NOT NULL
);
And a view like so, which is mainly to associate the category key with relator.child_name for filtering:
CREATE VIEW the_schema.relationships_with_child_catetgory AS (
SELECT
r.parent_name,
r.child_name,
r.child_quantity,
n.thing_category AS child_category
FROM
the_schema.relator r
INNER JOIN
the_schema.names_categories n
ON r.child_name = n.thing_name
);
And these tables contain some data like this:
INSERT INTO the_schema.names_categories (thing_name, thing_category)
VALUES ('parent1', 'bundle'), ('child1', 'assembly'), ('subChild1', 'component'), ('subChild2', 'component');
INSERT INTO the_schema.relator (parent_name, child_name, child_quantity)
VALUES ('parent1', 'child1', 1),('child1', 'subChild1', 10), ('child1', 'subChild2', 2);
INSERT INTO the_schema.sales (sold_name, sold_quantity)
VALUES ('parent1', 1), ('parent1', 2);
I need to construct a query that, given these data, will return something like the following:
child_name | sum_sold
------------+----------
subChild1 | 30
subChild2 | 6
(2 rows)
The problem is that I haven't the first idea how to go about this and in fact it's getting scarier as I type. I'm having a really hard time visualizing the connections that need to be made, so it's difficult to get started in a logical way.
Usually, Molinaro's SQL Cookbook has something to get started on, and it does have a section on hierarchical queries, but near as I can tell, none of them serve this particular purpose.
Based on my research on this site, it seems like I probably need to use a recursive CTE /Common Table Expression, as demonstrated in this question/answer, but I'm having considerable difficulty understanding this method and how to use this it for my case.
Aping the example from E. Brandstetter's answer linked above, I arrive at:
WITH RECURSIVE cte AS (
SELECT
s.sold_name,
r.child_name,
s.sold_quantity AS total
FROM
the_schema.sales s
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON s.sold_name = r.parent_name
UNION ALL
SELECT
c.sold_name,
r.child_name,
(c.total * r.child_quantity)
FROM
cte c
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON r.parent_name = c.child_name
) SELECT * FROM cte
which gets part of the way there:
sold_name | child_name | total
-----------+------------+-------
parent1 | child1 | 1
parent1 | child1 | 2
parent1 | subChild1 | 10
parent1 | subChild1 | 20
parent1 | subChild2 | 2
parent1 | subChild2 | 4
(6 rows)
However, these results include undesired rows (the first two), and when I try to filter the CTE by adding where r.child_category = 'component' to both parts, the query returns no rows:
sold_name | child_name | total
-----------+------------+-------
(0 rows)
and when I try to group/aggregate, it gives the following error:
ERROR: aggregate functions are not allowed in a recursive query's recursive term
I'm stuck on how to get the undesired rows filtered out and the aggregation happening; clearly I'm failing to comprehend how this recursive CTE works. All guidance is appreciated!

Basically you have the solution. If you stored the quantities and categories in your CTE as well, you can simply add a WHERE filter and a SUM aggregation afterwards:
SELECT
child_name,
SUM(sold_quantity * child_quantity)
FROM cte
WHERE category = 'component'
GROUP BY child_name
My entire query looks like this (which only differs in the details I mentioned above from yours):
demo:db<>fiddle
WITH RECURSIVE cte AS (
SELECT
s.sold_name,
s.sold_quantity,
r.child_name,
r.child_quantity,
nc.thing_category as category
FROM
sales s
JOIN relator r
ON s.sold_name = r.parent_name
JOIN names_categories nc
ON r.child_name = nc.thing_name
UNION ALL
SELECT
cte.sold_name,
cte.sold_quantity,
r.child_name,
r.child_quantity,
nc.thing_category
FROM cte
JOIN relator r ON cte.child_name = r.parent_name
JOIN names_categories nc
ON r.child_name = nc.thing_name
)
SELECT
child_name,
SUM(sold_quantity * child_quantity)
FROM cte
WHERE category = 'component'
GROUP BY child_name
Note: I didn't use your view, because I found it more handy to fetch the data from directly from the tables instead of joining data I already have. But that's just the way I personally like it :)

Well, I figured out that the CTE can be used as a subquery, which permits the filtering and aggregation that I needed :
SELECT
cte.child_name,
sum(cte.total)
FROM
(
WITH RECURSIVE cte AS (
SELECT
s.sold_name,
r.child_name,
s.sold_quantity AS total
FROM
the_schema.sales s
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON s.sold_name = r.parent_name
UNION ALL
SELECT
c.sold_name,
r.child_name,
(c.total * r.child_quantity)
FROM
cte c
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON r.parent_name = c.child_name
) SELECT * FROM cte ) AS cte
INNER JOIN
the_schema.relationships_with_child_catetgory r1
ON cte.child_name = r1.child_name
WHERE r1.child_category = 'component'
GROUP BY cte.child_name
;
which gives the desired rows:
child_name | sum
------------+-----
subChild2 | 6
subChild1 | 30
(2 rows)
Which is good and probably enough for the actual case at hand-- but I suspect there's a clearner way to go about this, so I'll be eager to read all other offered answers.

How do I combine multiple parent-child relationships with different lengths using T-SQL?

Summary
In an Azure database (using SQL Server Management Studio 17, so T-SQL) I seek to concatenate multiple parent-child relationships of different lengths.
Base Table
My table is of this form:
ID parent
1 2
2 NULL
3 2
4 3
5 NULL
Feel free to use this code to generate and fill it:
DECLARE #t TABLE (
ID int,
parent int
)
INSERT #t VALUES
( 1, 2 ),
( 2, NULL ),
( 3, 2 ),
( 4, 3 ),
( 5, NULL )
Issue
How do I receive a table with the path concatenation as shown in the following table?
ID path parentcount
1 2->1 1
2 2 0
3 2->3 1
4 2->3->4 2
5 5 0
Detail
The real table has many more rows and the longest path should contain ~15 IDs. So it would be ideal to find a solution that is dynamic in the aspect of parent count definition.
Also: I do not necessarily need the column 'parentcount', so feel free to skip that in answers.
select ##version:
Microsoft SQL Azure (RTM) - 12.0.2000.8

You can use a recursive CTE for this:
with cte as (
select id, parent, convert(varchar(max), concat(id, '')) as path, 0 as parentcount
from #t t
union all
select cte.id, t.parent, convert(varchar(max), concat(t.id, '->', path)), parentcount + 1
from cte join
#t t
on cte.parent = t.id
)
select top (1) with ties *
from cte
order by row_number() over (partition by id order by parentcount desc);

Clearly Gordon nailed it with a recursive CTE, but here is another option using the HierarchyID data type.
Example
Declare #YourTable Table ([ID] int,[parent] int)
Insert Into #YourTable Values
(1,2)
,(2,NULL)
,(3,2)
,(4,3)
,(5,NULL)
;with cteP as (
Select ID
,Parent
,HierID = convert(hierarchyid,concat('/',ID,'/'))
From #YourTable
Where Parent is Null
Union All
Select ID = r.ID
,Parent = r.Parent
,HierID = convert(hierarchyid,concat(p.HierID.ToString(),r.ID,'/'))
From #YourTable r
Join cteP p on r.Parent = p.ID
)
Select ID
,Parent
,[Path] = HierID.GetDescendant ( null , null ).ToString()
,ParentCount = HierID.GetLevel() - 1
From cteP A
Order By A.HierID
Returns

Hierarchical SQL Queries: Best SQL query to obtain the whole branch of a tree from a [nodeid, parentid] pairs table given the end node id

Is there any way to send a recursive query in SQL?
Given the end node id, I need all the rows up to the root node (which has parentid = NULL) ordered by level. E.g. if I have something like:
nodeid | parentid
a | NULL
b | a
c | b
after querying for end_node_id = c, I'd get something like:
nodeid | parentid | depth
a | NULL | 0
b | a | 1
c | b | 2
(Instead of the depth I can also work with the distance to the given end node)
The only (and obvious) way I could come up with is doing a single query per row until I reach the parent node.
Is there a more efficient way of doing it?

If you are using mssql 2005+ you can do this:
Test data:
DECLARE #tbl TABLE(nodeId VARCHAR(10),parentid VARCHAR(10))
INSERT INTO #tbl
VALUES ('a',null),('b','a'),('c','b')
Query
;WITH CTE
AS
(
SELECT
tbl.nodeId,
tbl.parentid,
0 AS Depth
FROM
#tbl as tbl
WHERE
tbl.parentid IS NULL
UNION ALL
SELECT
tbl.nodeId,
tbl.parentid,
CTE.Depth+1 AS Depth
FROM
#tbl AS tbl
JOIN CTE
ON tbl.parentid=CTE.nodeId
)
SELECT
*
FROM
CTE

Ended up with the following solutions (where level is the distance to the end node)
Oracle, using hierarchical queries (thanks to the info provided by #Mureinik):
SELECT IDCATEGORY, IDPARENTCATEGORY, LEVEL
FROM TNODES
START WITH IDCATEGORY=122
CONNECT BY IDCATEGORY = PRIOR IDPARENTCATEGORY;
Example using a view so it boils down to a single standard SQL query (requires >= 10g):
CREATE OR REPLACE VIEW VNODES AS
SELECT CONNECT_BY_ROOT IDCATEGORY "IDBRANCH", IDCATEGORY, IDPARENTCATEGORY, LEVEL AS LVL
FROM TNODES
CONNECT BY IDCATEGORY = PRIOR IDPARENTCATEGORY;
SELECT * FROM VNODES WHERE IDBRANCH = 122 ORDER BY LVL ASC;
http://sqlfiddle.com/#!4/18ba80/3
Postgres >= 8.4, using a WITH RECURSIVE Common Table Expression query:
WITH RECURSIVE BRANCH(IDPARENTCATEGORY, IDCATEGORY, LEVEL) AS (
SELECT IDPARENTCATEGORY, IDCATEGORY, 1 AS LEVEL FROM TNODES WHERE IDCATEGORY = 122
UNION ALL
SELECT p.IDPARENTCATEGORY, p.IDCATEGORY, LEVEL+1
FROM BRANCH pr, TNODES p
WHERE p.IDCATEGORY = pr.IDPARENTCATEGORY
)
SELECT IDCATEGORY,IDPARENTCATEGORY, LEVEL
FROM BRANCH
ORDER BY LEVEL ASC
Example using a view so it boils down to a single standard SQL query:
CREATE OR REPLACE VIEW VNODES AS
WITH RECURSIVE BRANCH(IDBRANCH,IDPARENTCATEGORY,IDCATEGORY,LVL) AS (
SELECT IDCATEGORY AS IDBRANCH, IDPARENTCATEGORY, IDCATEGORY, 1 AS LVL FROM TNODES
UNION ALL
SELECT pr.IDBRANCH, p.IDPARENTCATEGORY, p.IDCATEGORY, LVL+1
FROM BRANCH pr, TNODES p
WHERE p.IDCATEGORY = pr.IDPARENTCATEGORY
)
SELECT IDBRANCH, IDCATEGORY, IDPARENTCATEGORY, LVL
FROM BRANCH;
SELECT * FROM VNODES WHERE IDBRANCH = 122 ORDER BY LVL ASC;
http://sqlfiddle.com/#!11/42870/2

For Oracle, as requested in the comments, you can use the connect by operator to produce the hierarchy, and the level pseudocolumn to get the depth:
SELECT nodeid, parentid, LEVEL
FROM t
START WITH parentid IS NULL
CONNECT BY parentid = PRIOR nodeid;

Query to List all hierarchical parents and siblings and their childrens, but not list own childrens

I've a basic SQL table with a simple heirarchial connection between each rows. That is there is a ParentID for Every rows and using that its connecting with another row. Its as follows
AccountID | AccountName | ParentID
---------------------------------------
1 Mathew 0
2 Philip 1
3 John 2
4 Susan 2
5 Anita 1
6 Aimy 1
7 Elsa 3
8 Anna 7
.............................
.................................
45 Kristoff 8
Hope the structure is clear
But my requirement of listng these is a little weird. That is when we pass an AccountID, it should list all its parents and siblings and siblings childrens. But it never list any child of that AccountID to any level. I can explain that in little more detail with a picture. Sorry for the clarity of the picture.. mine is an old phone cam..
When we pass the AccountID 4, it should list all Parents and its siblings, but it should not list 4,6,7,8,9,10. That means that account and any of it childrens should be avoid in the result (Based on the picture tree elements). Hope the explanation is clear.

If I've got it right and you need to output whole table except 4 and all of it's descendants then try this recursive query:
WITH CT AS
(
SELECT * FROM T WHERE AccountID=4
UNION ALL
SELECT T.* FROM T
JOIN CT ON T.ParentID = CT.AccountId
)
SELECT * FROM T WHERE AccountID
NOT IN (SELECT AccountID FROM CT)
SQLFiddle demo
Answering to the question in the comment:
So it will not traverse to the top. It only traverse to specified
account. For example if I pass 4 as first parameter and 2 as second
parameter, the result should be these values 2,5,11,12
You should start from the ID=2 and travel to the bottom exclude ID=4 so you cut whole subtree after ID=4:
WITH CT AS
(
SELECT * FROM T WHERE AccountID=2
UNION ALL
SELECT T.* FROM T
JOIN CT ON T.ParentID = CT.AccountId
WHERE T.AccountId<>4
)
SELECT * FROM CT

Try this:
;with cte as
(select accountid,parentid, 0 as level from tbl
where parentid = 0
union all
select t.accountid,t.parentid,(level+1) from
cte c inner join tbl t on c.accountid= t.parentid
)
select * from cte
where level < (select level from cte where accountid = #accountid)
When you pass in the parameter #accountid this will return the accountid values of all nodes on levels before that of the parameter.
If you want to return everything on the same level as input except input itself, you can change the where clause to;
where level <=(select level from cte where accountid= #accountid )
and accountid <> #accountid
In your example, if #accountid = 4, this will return the values 1,2,3 (ancestors) as well as 5,13,14 (siblings).

Does this return what you are after?
declare #AccountID int
set #AccountID = 4
;with parents
as (
select AccountID, AccountName, ParentID
from Account
where AccountID = (select ParentID from Account Where AccountID = #AccountID)
union all
select A.AccountID, A.AccountName, A.ParentID
from Account as A
join parents as P
on P.ParentID = A.AccountID
),
children
as (
select AccountID, AccountName, ParentID
from parents
union all
select A.AccountID, A.AccountName, A.ParentID
from Account as A
join children as C
on C.AccountID = A.ParentID
where A.AccountID <> #AccountID
)
select distinct AccountID, AccountName, ParentID
from children
order by AccountID

For me it sounds like you want to go up in the tree. So considering this test data
DECLARE #tbl TABLE(AccountID INT,AccountName VARCHAR(100),ParentID INT)
INSERT INTO #tbl
VALUES
(1,'Mathew',0),
(2,'Philip',1),
(3,'John',2),
(4,'Susan',2),
(5,'Anita',1),
(6,'Aimy',1),
(7,'Elsa',3),
(8,'Anna',7)
The I would write a query like this:
DECLARE #AcountID INT=4
;WITH CTE
AS
(
SELECT
tbl.AccountID,
tbl.AccountName,
tbl.ParentID
FROM
#tbl AS tbl
WHERE
tbl.AccountID=#AcountID
UNION ALL
SELECT
tbl.AccountID,
tbl.AccountName,
tbl.ParentID
FROM
#tbl AS tbl
JOIN CTE
ON CTE.ParentID=tbl.AccountID
)
SELECT
*
FROM
CTE
WHERE
NOT CTE.AccountID=#AcountID
This will return a result like this:
2 Philip 1
1 Mathew 0

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Postgres self-join recursive CTE ancestry chain - sql

Related

SQL Algorithm for grouping all "equivalent" strings

How can I write a SQL query to calculate the quantity of components sold with their parent assemblies? (Postgres 11/recursive CTE?)

How do I combine multiple parent-child relationships with different lengths using T-SQL?

Hierarchical SQL Queries: Best SQL query to obtain the whole branch of a tree from a [nodeid, parentid] pairs table given the end node id

Query to List all hierarchical parents and siblings and their childrens, but not list own childrens

Categories

Resources