Recursive SQL - count number of descendants in hierarchical structure - sql

Consider a database table with the following columns:
mathematician_id
name
advisor1
advisor2
The database represents data from the Math Genealogy Project, where each mathematician usually has one single advisor, but there are situations when there are two advisors.
Visual aid to make things clearer:
How do I count the number of descendants for each of the mathematicians?
I should probably use Common Table Expressions (WITH RECURSIVE), but I am pretty much stuck at the moment. All the similar examples I found deal with hierarchies having only one parent, not two.
Update:
I adapted the solution for SQL Server provided by Vladimir Baranov to also work in PostgreSQL:
WITH RECURSIVE cte AS (
SELECT m.id as start_id,
m.id,
m.name,
m.advisor1,
m.advisor2,
1 AS level
FROM public.mathematicians AS m
UNION ALL
SELECT cte.start_id,
m.id,
m.name,
m.advisor1,
m.advisor2,
cte.level + 1 AS level
FROM public.mathematicians AS m
INNER JOIN cte ON cte.id = m.advisor1
OR cte.id = m.advisor2
),
cte_distinct AS (
SELECT DISTINCT start_id, id
FROM cte
)
SELECT cte_distinct.start_id,
m.name,
COUNT(*)-1 AS descendants_count
FROM cte_distinct
INNER JOIN public.mathematicians AS m ON m.id = cte_distinct.start_id
GROUP BY cte_distinct.start_id, m.name
ORDER BY cte_distinct.start_id

You didn't say what DBMS you use. I'll use SQL Server for this example, but it will work in other databases that support recursive queries as well.
Sample data
I entered only the right part of your tree, starting from Euler.
The most interesting part is the multiple paths between Lagrange and Dirichlet.
DECLARE #T TABLE (ID int, name nvarchar(50), Advisor1ID int, Advisor2ID int);
INSERT INTO #T (ID, name, Advisor1ID, Advisor2ID) VALUES
(1, 'Euler', NULL, NULL),
(2, 'Lagrange', 1, NULL),
(3, 'Laplace', NULL, NULL),
(4, 'Fourier', 2, NULL),
(5, 'Poisson', 2, 3),
(6, 'Dirichlet', 4, 5),
(7, 'Lipschitz', 6, NULL),
(8, 'Klein', NULL, 7),
(9, 'Lindemann', 8, NULL),
(10, 'Furtwangler', 8, NULL),
(11, 'Hilbert', 9, NULL),
(12, 'Taussky-Todd', 10, NULL);
This is how it looks like:
SELECT * FROM #T;
+----+--------------+------------+------------+
| ID | name | Advisor1ID | Advisor2ID |
+----+--------------+------------+------------+
| 1 | Euler | NULL | NULL |
| 2 | Lagrange | 1 | NULL |
| 3 | Laplace | NULL | NULL |
| 4 | Fourier | 2 | NULL |
| 5 | Poisson | 2 | 3 |
| 6 | Dirichlet | 4 | 5 |
| 7 | Lipschitz | 6 | NULL |
| 8 | Klein | NULL | 7 |
| 9 | Lindemann | 8 | NULL |
| 10 | Furtwangler | 8 | NULL |
| 11 | Hilbert | 9 | NULL |
| 12 | Taussky-Todd | 10 | NULL |
+----+--------------+------------+------------+
Query
It is a classic recursive query with two interesting points.
1) The recursive part of the CTE joins to the anchor part using both Advisor1ID and Advisor2ID:
INNER JOIN CTE
ON CTE.ID = T.Advisor1ID
OR CTE.ID = T.Advisor2ID
2) Since it is possible to have multiple paths to the descendant, recursive query may output the node several times. To eliminate these duplicates I used DISTINCT in CTE_Distinct. It may be possible to solve it more efficiently.
To understand better how the query works run each CTE separately and examine intermediate results.
WITH
CTE
AS
(
SELECT
T.ID AS StartID
,T.ID
,T.name
,T.Advisor1ID
,T.Advisor2ID
,1 AS Lvl
FROM #T AS T
UNION ALL
SELECT
CTE.StartID
,T.ID
,T.name
,T.Advisor1ID
,T.Advisor2ID
,CTE.Lvl + 1 AS Lvl
FROM
#T AS T
INNER JOIN CTE
ON CTE.ID = T.Advisor1ID
OR CTE.ID = T.Advisor2ID
)
,CTE_Distinct
AS
(
SELECT DISTINCT
StartID
,ID
FROM CTE
)
SELECT
CTE_Distinct.StartID
,T.name
,COUNT(*) AS DescendantCount
FROM
CTE_Distinct
INNER JOIN #T AS T ON T.ID = CTE_Distinct.StartID
GROUP BY
CTE_Distinct.StartID
,T.name
ORDER BY CTE_Distinct.StartID;
Result
+---------+--------------+-----------------+
| StartID | name | DescendantCount |
+---------+--------------+-----------------+
| 1 | Euler | 11 |
| 2 | Lagrange | 10 |
| 3 | Laplace | 9 |
| 4 | Fourier | 8 |
| 5 | Poisson | 8 |
| 6 | Dirichlet | 7 |
| 7 | Lipschitz | 6 |
| 8 | Klein | 5 |
| 9 | Lindemann | 2 |
| 10 | Furtwangler | 2 |
| 11 | Hilbert | 1 |
| 12 | Taussky-Todd | 1 |
+---------+--------------+-----------------+
Here DescendantCount counts the node itself as a descendant. You can subtract 1 from this result if you want to see 0 instead of 1 for the leaf nodes.
Here is SQL Fiddle.

Related

Replace value in column based on another column

I have the following table:
+----+--------+------------+----------------------+
| ID | Name | To_Replace | Replaced |
+----+--------+------------+----------------------+
| 1 | Fruits | 1 | Fruits |
| 2 | Apple | 1-2 | Fruits-Apple |
| 3 | Citrus | 1-3 | Fruits-Citrus |
| 4 | Orange | 1-3-4 | Fruits-Citrus-Orange |
| 5 | Empire | 1-2-5 | Fruits-Apple-Empire |
| 6 | Fuji | 1-2-6 | Fruits-Apple-Fuji |
+----+--------+------------+----------------------+
How can I create the column Replaced ? I thought of creating 10 maximum columns (I know there are no more than 10 nested levels) and fetch the ID from every substring split by '-', and then concatenating them if not null into Replaced, but I think there is a simpler solution.
While what you ask for is technically feasible (probably using a recursive query or a tally), I will take a different stance and suggest that you fix your data model instead.
You should not be storing multiple values as a delimited list in a single database column. This defeats the purpose of a relational database, and makes simple things both unnecessarily complicated and inefficient.
Instead, you should have a separate table to store that data, which each replacement id on a separate row, and possibly a column that indicates the sequence of each element in the list.
For your sample data, this would look like:
id replace_id seq
1 1 1
2 1 1
2 2 2
3 1 1
3 3 2
4 1 1
4 3 2
4 4 3
5 1 1
5 2 2
5 5 3
6 1 1
6 2 2
6 6 3
Now you can efficiently generate the expected result with either a join, a subquery, or a lateral join. Assuming that your table is called mytable and that the mapping table is mymapping, the lateral join solution would be:
select t.*, r.*
from mytable t
outer apply (
select string_agg(t1.name) within group(order by m.seq) replaced
from mymapping m
inner join mytable t1 on t1.id = m.replace_id
where m.id = t.id
) x
You can try something like this:
DECLARE #Data TABLE ( ID INT, [Name] VARCHAR(10), To_Replace VARCHAR(10) );
INSERT INTO #Data ( ID, [Name], To_Replace ) VALUES
( 1, 'Fruits', '1' ),
( 2, 'Apple', '1-2' ),
( 3, 'Citrus', '1-3' ),
( 4, 'Orange', '1-3-4' ),
( 5, 'Empire', '1-2-5' ),
( 6, 'Fuji', '1-2-6' );
SELECT
*
FROM #Data AS d
OUTER APPLY (
SELECT STRING_AGG ( [Name], '-' ) AS Replaced FROM #Data WHERE ID IN (
SELECT CAST ( [value] AS INT ) FROM STRING_SPLIT ( d.To_Replace, '-' )
)
) List
ORDER BY ID;
Returns
+----+--------+------------+----------------------+
| ID | Name | To_Replace | Replaced |
+----+--------+------------+----------------------+
| 1 | Fruits | 1 | Fruits |
| 2 | Apple | 1-2 | Fruits-Apple |
| 3 | Citrus | 1-3 | Fruits-Citrus |
| 4 | Orange | 1-3-4 | Fruits-Citrus-Orange |
| 5 | Empire | 1-2-5 | Fruits-Apple-Empire |
| 6 | Fuji | 1-2-6 | Fruits-Apple-Fuji |
+----+--------+------------+----------------------+
UPDATE
Ensure the id list order is maintained when aggregating names.
DECLARE #Data TABLE ( ID INT, [Name] VARCHAR(10), To_Replace VARCHAR(10) );
INSERT INTO #Data ( ID, [Name], To_Replace ) VALUES
( 1, 'Fruits', '1' ),
( 2, 'Apple', '1-2' ),
( 3, 'Citrus', '1-3' ),
( 4, 'Orange', '1-3-4' ),
( 5, 'Empire', '1-2-5' ),
( 6, 'Fuji', '1-2-6' ),
( 7, 'Test', '6-2-7' );
SELECT
*
FROM #Data AS d
OUTER APPLY (
SELECT STRING_AGG ( [Name], '-' ) AS Replaced FROM (
SELECT TOP 100 PERCENT
Names.[Name]
FROM ( SELECT CAST ( '<ids><id>' + REPLACE ( d.To_Replace, '-', '</id><id>' ) + '</id></ids>' AS XML ) AS id_list ) AS xIds
CROSS APPLY (
SELECT
x.f.value('.', 'INT' ) AS name_id,
ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL ) ) AS row_id
FROM xIds.id_list.nodes('//ids/id') x(f)
) AS ids
INNER JOIN #Data AS Names ON Names.ID = ids.name_id
ORDER BY row_id
) AS x
) List
ORDER BY ID;
Returns
+----+--------+------------+----------------------+
| ID | Name | To_Replace | Replaced |
+----+--------+------------+----------------------+
| 1 | Fruits | 1 | Fruits |
| 2 | Apple | 1-2 | Fruits-Apple |
| 3 | Citrus | 1-3 | Fruits-Citrus |
| 4 | Orange | 1-3-4 | Fruits-Citrus-Orange |
| 5 | Empire | 1-2-5 | Fruits-Apple-Empire |
| 6 | Fuji | 1-2-6 | Fruits-Apple-Fuji |
| 7 | Test | 6-2-7 | Fuji-Apple-Test |
+----+--------+------------+----------------------+
I'm sure there's optimization that can be done here, but this solution seems to guarantee the list order is kept.

UPDATE based on multiple "WHERE IN" conditions

Let's say I have a table I want to update based on multiple conditions. Each of these conditions is an equal-sized array, and the only valid cases are the ones which match the same index in the arrays.
That is, if we use the following SQL clause
UPDATE Foo
SET bar = 1
WHERE a IN ( 1, 2, 3, 4, 5)
AND b IN ( 6, 7, 8, 9, 0)
AND c IN ('a', 'b', 'c', 'd', 'e')
bar will be set to 1 for any row which has, for example, a = 1, b = 8, c = 'e'.
That is not what I want.
I need a clause where only a = 1, b = 6, c = 'a' or a = 2, b = 7, c = 'b' (etc.) works.
Obviously I could rewrite the clause as
UPDATE Foo
SET bar = 1
WHERE (a = 1 AND b = 6 AND c = 'a')
OR (a = 2 AND b = 7 AND c = 'b')
OR ...
This would work, but it's hardly extensible. Given the values of the conditions are variable and obtained programmatically, it'd be far better if I could set each array in one place instead of having to build a string-building loop to get that WHERE call right.
So, is there a better, more elegant way to have the same behavior as this last block?
Use the Table Values Constructor :
UPDATE f
SET bar = 1
WHERE EXISTS (
SELECT * FROM (VALUES (1,6,'a'),(2,7,'b'),(3,8,'c')) AS Trios(a,b,c)
WHERE Trios.a = f.a AND Trios.b = f.b AND Trios.c = f.c
)
You can use values() and join:
UPDATE f
SET bar = 1
FROM Foo f JOIN
(VALUES (1, 6, 'a'),
(2, 7, 'b'),
. . .
) v(a, b, c)
ON f.a = v.a AND f.b = v.b AND f.c = v.c;
Try this might work
DECLARE #Temp AS Table ( a int, b int, c varchar(50))
INSERT INTO #Temp(a,b,c)
VALUES(1, 6, 'a'),
(2, 7, 'b'),
(3, 8, 'c'),
(4, 9, 'd'),
(5, 0, 'e')
UPDATE F
SET bar = 1
FROM FOO F INNER JOIN #Temp T
ON F.a = T.a AND F.b = T.b AND F.c = T.c
When you read the data don't save it as separated values but as a single string and then use the following:
update foo
set bar = 1
where concat(a,b,c) in ('16a','27b','38c','49d','50e')
it may not be the most elegant way but it is very practical and simple.
I could be entirely off the mark here--I'm not sure if you're passing in a set of values or what-have-you--but my first thought is using a series of CTEs.
I'm making considerable assumptions about your data, but here's an example you can run in SSMS based on my thoughts of your question.
-- Create #Data and insert some, er... data ---
DECLARE #Data TABLE ( id INT IDENTITY(100,1) PRIMARY KEY, a VARCHAR(1), b VARCHAR(1), c VARCHAR(1) );
INSERT INTO #Data ( a ) VALUES ('1'), ('2'), ('3'), ('4'), ('5');
INSERT INTO #Data ( b ) VALUES ('6'), ('7'), ('8'), ('9'), ('0');
INSERT INTO #Data ( c ) VALUES ('a'), ('b'), ('c'), ('d'), ('e');
So let's assume this is your data. I've kept it simple to make it easier to understand.
+-----+---+---+---+
| id | a | b | c |
+-----+---+---+---+
| 100 | 1 | | |
| 101 | 2 | | |
| 102 | 3 | | |
| 103 | 4 | | |
| 104 | 5 | | |
| 105 | | 6 | |
| 106 | | 7 | |
| 107 | | 8 | |
| 108 | | 9 | |
| 109 | | 0 | |
| 110 | | | a |
| 111 | | | b |
| 112 | | | c |
| 113 | | | d |
| 114 | | | e |
+-----+---+---+---+
Query the data with aligned "array" indexes:
;WITH CTE_A AS (
SELECT
id,
ROW_NUMBER() OVER ( ORDER BY id ) AS a_row_id,
a
FROM #Data WHERE a IS NOT NULL
)
, CTE_B AS (
SELECT
id,
ROW_NUMBER() OVER ( ORDER BY id ) AS b_row_id,
b
FROM #Data WHERE b IS NOT NULL
)
, CTE_C AS (
SELECT
id,
ROW_NUMBER() OVER ( ORDER BY id ) AS c_row_id,
c
FROM #Data WHERE c IS NOT NULL
)
SELECT
CTE_A.id, CTE_A.a_row_id, CTE_A.a
, CTE_B.id, CTE_B.b_row_id, CTE_B.b
, CTE_C.id, CTE_C.c_row_id, CTE_C.c
FROM CTE_A
JOIN CTE_B ON CTE_A.a_row_id = CTE_B.b_row_id
JOIN CTE_C ON CTE_A.a_row_id = CTE_C.c_row_id;
Which returns:
+-----+----------+---+-----+----------+---+-----+----------+---+
| id | a_row_id | a | id | b_row_id | b | id | c_row_id | c |
+-----+----------+---+-----+----------+---+-----+----------+---+
| 100 | 1 | 1 | 105 | 1 | 6 | 110 | 1 | a |
| 101 | 2 | 2 | 106 | 2 | 7 | 111 | 2 | b |
| 102 | 3 | 3 | 107 | 3 | 8 | 112 | 3 | c |
| 103 | 4 | 4 | 108 | 4 | 9 | 113 | 4 | d |
| 104 | 5 | 5 | 109 | 5 | 0 | 114 | 5 | e |
+-----+----------+---+-----+----------+---+-----+----------+---+
Again, assumptions made on your data (in particular an id exists that can be sorted), but this basically pivots it by linking the a, b and c values on their relative "index" (ROW_NUMBER). By using ROW_NUMBER in this way, we can create a makeshift array index value ( a_row_id, b_row_id, c_row_id ) that can be used to join the resulting values.
This example can easily be changed to an UPDATE statement.
Does this address your question?

Query to group fully recursive relationships in many-to-many junction table

Apologies if this is a duplicate somewhere, searching here and the web seems to have similar but not exact matches to my problem so I decided to post.
I'm calling this a fully recursive grouping in a many-to-many relationship. I have tried writing joins and ctes to do this but without full recursion I'm only getting one level deep, and I'm not exactly thrilled about trying to write nested dynamic cursors.
Assume expressions can be derived from this "junction" table to find distinct students / classes, therefore the question can be summarized by using only one set.
SELECT * FROM student_class
+-------------+----------+--------------+
| student_id | class_id | group_number |
+-------------+----------+--------------+
| 1 | A | null |
| 1 | C | null |
| 2 | A | null |
| 2 | B | null |
| 2 | C | null |
| 3 | E | null |
| 4 | B | null |
| 4 | F | null |
+-------------+----------+--------------+
The question is how to populate a group number through a recursive relationship for each student and for each class. Ex: if student_id 1 has class_id A, then what other students have class_id A? For those other students, what other classes do they have? For each of those other classes, which other students have those classes? Then keep recursing through results until no more dependencies are found.
So in this example the final update would only contain two groups, read like this since no other students have class_id C and student_id 3 has no other classes:
+-------------+----------+--------------+
| student_id | class_id | group_number |
+-------------+----------+--------------+
| 1 | A | 1 |
| 1 | C | 1 |
| 2 | A | 1 |
| 2 | B | 1 |
| 2 | C | 1 |
| 3 | E | 2 |
| 4 | B | 1 |
| 4 | F | 1 |
+-------------+----------+--------------+
This is really tricky. It is a graph walking algorithm (which is not that hard). But you have to define the graph. You can define a link between two classes by using a self join. That can then be used for a recursive CTE.
So, to get equivalences classes (kind of punny here), you can do:
with cs as (
select *
from (values (1, 'A'), (1, 'C'), (2, 'A'), (2, 'B'), (2, 'C'), (3, 'E'), (4, 'B'), (4, 'F')) v(student, class)
),
cc as (
select distinct cs1.class as class1, cs2.class as class2
from cs cs1 join
cs cs2
on cs1.student = cs2.student
),
cte as (
select cc.class1 as class, cc.class2 as grp, cast(',' + cc.class1 + ',' as varchar(max)) as grps
from cc
union all
select cte.class, cc.class2,
cast(grps + cc.class2 + ',' as varchar(max)) as grps
from cte join
cc
on cc.class1 = cte.grp and cte.grps not like '%,' + cc.class1 + ',%'
)
select cte.class, min(cte.grp)
from cte
group by cte.class;
If you want to convert them to numbers:
select cte.class, min(cte.grp),
dense_rank() over (order by min(cte_grp)) as group_number
from cte
group by cte.class;

Oracle Connect_is_leaf similar in SQL server

Here is my query which is in Oracle PL/SQL syntax, How can I Change it to SQL server format?
Any alternatives for Connect_by_isleaf?
(
select PARTY_KEY, ltrim(sys_connect_by_path(alt_name, '|'), '|') AS alt_name_list
from
(select PARTY_KEY, alt_name, row_number() over(partition by PARTY_KEY order by alt_name) rno
from (
select party_key, (select alt_name_type_desc from "CRMS"."PRJ_APP_ALT_NAME_TYPE" where alt_name_type_cd = alt_name_type) || ' - ' || alt_name as alt_name
from "CDD_PROFILES"."PRJ_PRF_ALT_NAME" order by party_key, alt_name_type
) alt
)
where connect_by_isleaf = 1
connect by PARTY_KEY = prior PARTY_KEY
and rno = prior rno+1
start with rno = 1
)
tried to use With AS clause but it is not working somehow.
Thanks in advance
The equivalent in SQL Server is called a "recursive CTE".
You can read about it here:
https://learn.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-2017
Oracle Hierarchical queries can be rewritten as recursive CTE statements in databases that support them (SQL Server included). A classic set of hierarchical data would be an organization hierarchy such as the one below:
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE ORGANIZATIONS
([ID] int primary key
, [ORG_NAME] varchar(30)
, [ORG_TYPE] varchar(30)
, [PARENT_ID] int foreign key references organizations)
;
INSERT INTO ORGANIZATIONS
([ID], [ORG_NAME], [ORG_TYPE], [PARENT_ID])
VALUES
(1, 'ACME Corp', 'Company', NULL),
(2, 'Finance', 'Division', 1),
(6, 'Accounts Payable', 'Department', 2),
(7, 'Accounts Receivables', 'Department', 2),
(8, 'Payroll', 'Department', 2),
(3, 'Operations', 'Division', 1),
(4, 'Human Resources', 'Division', 1),
(10, 'Benefits Admin', 'Department', 4),
(5, 'Marketing', 'Division', 1),
(9, 'Sales', 'Department', 5)
;
In the recursive t1 below the select statement before the union all is the anchor query and the select statement after the union all is the recursive part. The recursive part has exactly one reference to t1 in its from clause. The org_path column simulates oracles sys_connect_by_path function concatenating the org_names together. The level column simulates oracles LEVEL pseudo column and is utilized in the output query to determine the leaf status (is_leaf column) similar to oracles connect_by_isleaf pseudo column:
with t1(id, org_name, org_type, parent_id, org_path, level) as (
select o.*
, cast('|' + org_name as varchar(max))
, 1
from organizations o
where parent_id is null
union all
select o.*
, t1.org_path+cast('|'+o.org_name as varchar(max))
, t1.level+1
from organizations o
join t1
on t1.id = o.parent_id
)
select t1.*
, case when t1.level < lead(t1.level) over (order by org_path) then 0 else 1 end is_leaf
from t1 order by org_path
Results:
| id | org_name | org_type | parent_id | org_path | level | is_leaf |
|----|----------------------|------------|-----------|-------------------------------------------|-------|---------|
| 1 | ACME Corp | Company | (null) | |ACME Corp | 1 | 0 |
| 2 | Finance | Division | 1 | |ACME Corp|Finance | 2 | 0 |
| 6 | Accounts Payable | Department | 2 | |ACME Corp|Finance|Accounts Payable | 3 | 1 |
| 7 | Accounts Receivables | Department | 2 | |ACME Corp|Finance|Accounts Receivables | 3 | 1 |
| 8 | Payroll | Department | 2 | |ACME Corp|Finance|Payroll | 3 | 1 |
| 4 | Human Resources | Division | 1 | |ACME Corp|Human Resources | 2 | 0 |
| 10 | Benefits Admin | Department | 4 | |ACME Corp|Human Resources|Benefits Admin | 3 | 1 |
| 5 | Marketing | Division | 1 | |ACME Corp|Marketing | 2 | 0 |
| 9 | Sales | Department | 5 | |ACME Corp|Marketing|Sales | 3 | 1 |
| 3 | Operations | Division | 1 | |ACME Corp|Operations | 2 | 1 |
To select just the leaf nodes, change the output query from above to another CTE (T2) dropping the order by clause or moving it to final output query and limiting by the is_leaf column:
with t1(id, org_name, org_type, parent_id, org_path, level) as (
select o.*
, cast('|' + org_name as varchar(max))
, 1
from organizations o
where parent_id is null
union all
select o.*
, t1.org_path+cast('|'+o.org_name as varchar(max))
, t1.level+1
from organizations o
join t1
on t1.id = o.parent_id
), t2 as (
select t1.*
, case when t1.level < lead(t1.level) over (order by org_path) then 0 else 1 end is_leaf
from t1
)
select * from t2 where is_leaf = 1
Results:
| id | org_name | org_type | parent_id | org_path | level | is_leaf |
|----|----------------------|------------|-----------|-------------------------------------------|-------|---------|
| 6 | Accounts Payable | Department | 2 | |ACME Corp|Finance|Accounts Payable | 3 | 1 |
| 7 | Accounts Receivables | Department | 2 | |ACME Corp|Finance|Accounts Receivables | 3 | 1 |
| 8 | Payroll | Department | 2 | |ACME Corp|Finance|Payroll | 3 | 1 |
| 10 | Benefits Admin | Department | 4 | |ACME Corp|Human Resources|Benefits Admin | 3 | 1 |
| 9 | Sales | Department | 5 | |ACME Corp|Marketing|Sales | 3 | 1 |
| 3 | Operations | Division | 1 | |ACME Corp|Operations | 2 | 1 |
Alternatively if you realize that leaf nodes can be identified by their lack of child nodes, you can flip this on its head and start with the leaf nodes, and search up the tree, retaining all the original record values, building out the org_path in reverse, and passing along the next parent id as next_id. In the final output, stage, selecting only those records whose next_id is null will yield the same results as the prior query:
with t1(id, org_name, org_type, parent_id, org_path, level, next_id) as (
select o.*
, cast('|'+org_name as varchar(max))
, 1
, parent_id
from organizations o
where not exists (select 1 from organizations c where c.parent_id = o.id)
union all
select t1.id
, t1.org_name
, t1.org_type
, t1.parent_id
, cast('|'+p.org_name as varchar(max))+t1.org_path
, level+1
, p.parent_id
from organizations p
join t1
on t1.next_id = p.id
)
select * from t1 where next_id is null order by org_path
Results:
| id | org_name | org_type | parent_id | org_path | level | next_id |
|----|----------------------|------------|-----------|-------------------------------------------|-------|---------|
| 6 | Accounts Payable | Department | 2 | |ACME Corp|Finance|Accounts Payable | 3 | (null) |
| 7 | Accounts Receivables | Department | 2 | |ACME Corp|Finance|Accounts Receivables | 3 | (null) |
| 8 | Payroll | Department | 2 | |ACME Corp|Finance|Payroll | 3 | (null) |
| 10 | Benefits Admin | Department | 4 | |ACME Corp|Human Resources|Benefits Admin | 3 | (null) |
| 9 | Sales | Department | 5 | |ACME Corp|Marketing|Sales | 3 | (null) |
| 3 | Operations | Division | 1 | |ACME Corp|Operations | 2 | (null) |
One of these two methods may prove more performant than the other, but you'll need to try them each out on your data to see which one works better.

How can I get linked list like (Parent - Child) mapped columns in a single Query?

Consider The following Table
+--------+-------+--+
| Parent | Child | |
+--------+-------+--+
| 1 | 2 | |
| 10 | 13 | |
| 2 | 3 | |
| 3 | 4 | |
| 13 | 14 | |
| 4 | 5 | |
| 15 | 16 | |
| 5 | 1 | |
+--------+-------+--+
In this table I'm following the hierarchy of parent child. From this table I want a result as the below table
+--------+-------+--+
| Parent | Child | |
+--------+-------+--+
| 1 | 2 | |
| 2 | 3 | |
| 3 | 4 | |
| 4 | 5 | |
| 5 | 1 | |
+--------+-------+--+
I want to get the hierarchy in my code (1-2-3-4-5-1). At present I'm querying for each child after getting its parent (Sometimes, Child can be any of previous Parents like 5-1). For a long hierarchy it will execute a number of queries. How can I make this more efficient?
;with cte(parent,child) as (
select parent, child
from sometable
where parent = 1 --- seed
UNION ALL
select t.parent, t.child
from sometable t
join cte on cte.child = t.parent
)
select *
from cte;
To avoid infinite loops, you will have to store the list of traversed ids:
;with cte(parent,child,traversed) as (
select parent, child, ',' + right(parent,10) + ','
from sometable
where parent = 1 --- seed
UNION ALL
select t.parent, t.child, cte.traversed + right(t.parent,10) + ','
from sometable t
join cte on cte.child = t.parent
where not cte.traversed like ',%' + t.parent + '%,'
)
select parent, child
from cte;
But it won't run anywhere near as fast since it's having to do the LIKE checks.
Please try:
DECLARE #table as TABLE(Parent int, Child int)
insert into #table values
(1, 2),
(10, 13),
(2, 3),
(3, 4),
(13, 14),
(4, 5),
(5, 1)
select * from #table
declare #ParentID int
set #ParentID=1
;WITH T(Parent, Child)AS
(
SELECT Parent, Child from #table where Parent=#ParentID
UNION ALL
SELECT T1.Parent, T1.Child FROM #table T1 INNER JOIN T ON T1.Parent=T.Child
WHERE T.Child<>#ParentID
)
select * from T
order by Parent
The manual covers that: http://msdn.microsoft.com/en-us/library/ms186243(v=sql.105).aspx
SO really shouldn't be used to for asking questions that already have good answers in the manuals.