I'm working on a DotNetNuke module that includes a tree-style navigation menu.
So far, I have it working, in the sense that child-nodes are connected to their correct parent-nodes, but the node-siblings are still out of order. There's a field called TabOrder, used to determine the order of siblings, but due to the recursion, I can't get them sorted properly.
I'm trying to do this in a SQL Server stored procedure, which may be a mistake, but I feel I'm so close that there must be a solution. Does anyone have any idea what I'm doing wrong?
I'd appreciate any ideas you have. Thanks in advance.
Solution:
I finally found a solution to my question. The key was to recursively create a Tab Lineage (TabLevel + TabOrder) from the Root Tab to the Leaf Tabs. Once that was created, I was able to order the returned records properly.
However, as I was coming back to post this I saw MarkXA's answer, which is probably the best solution. I didn't know the method GetNavigationNodes even existed.
I think he is correct that using GetNavigationNodes is a more future-proof solution, but for the time being I'll use my SQL-based solution. --What can I say? I learn the hard way.
Here it is:
ALTER procedure [dbo].[Nav_GetTabs]
#CurrentTabID int = 0
AS
--============================================================
--create and populate #TabLineage table variable with Tab Lineage
--
--"Lineage" consists of the concatenation of TabLevel & TabOrder, concatenated recursively from the root to leaf.
--The lineage is VERY important, making it possible to properly order the Tab links in the navigation module.
--This will be used as a lookup table to match Tabs with their lineage.
--============================================================
DECLARE #TabLineage table
(
TabID int,
Lineage varchar(100)
);
WITH TabLineage AS
(
--start with root Tabs
SELECT T.TabID, T.ParentID, CAST(REPLICATE('0', 5 - LEN(CAST(T2.[Level] as varchar(10)) + CAST(T2.TabOrder as varchar(10)))) + CAST(T2.[Level] as varchar(10)) + CAST(T2.TabOrder as varchar(10)) as varchar(100)) AS Lineage
FROM Tabs T
INNER JOIN Tabs T2 ON T.TabID = T2.TabID
INNER JOIN TabPermission TP ON T.TabID = TP.TabID
WHERE T.ParentID IS NULL
AND T.IsDeleted = 0
AND T.IsVisible = 1
AND TP.RoleID = -1
UNION ALL
--continue recursively, from parent to child Tabs
SELECT T.TabID, T.ParentID, CAST(TL.Lineage + REPLICATE('0', 5 - LEN(CAST(T2.[Level] as varchar(10)) + CAST(T2.TabOrder as varchar(10)))) + CAST(T2.[Level] as varchar(10)) + CAST(T2.TabOrder as varchar(10)) as varchar(100)) AS Lineage
FROM Tabs T
INNER JOIN Tabs T2 ON T.TabID = T2.TabID
INNER JOIN TabPermission TP ON T.TabID = TP.TabID
INNER JOIN TabLineage TL ON T.ParentID = TL.TabID
WHERE T.IsDeleted = 0
AND T.IsVisible = 1
AND TP.RoleID = -1
)
--insert results of recursive query into temporary table
INSERT #TabLineage
SELECT TL.TabID, TL.Lineage FROM TabLineage TL ORDER BY TL.Lineage
OPTION (maxrecursion 10); --to increase number of traversed generations, increase "maxrecursion"
--============================================================
--create and populate #Ancestor table variable with #CurrentTab ancestors
--
--"Ancestors" are Tabs following the path from #CurrentTab to the root Tab it's descended from (inclusively).
--These are Tab links we want to see in the navigation.
--============================================================
DECLARE #Ancestor table
(
TabID int
);
WITH Ancestor AS
(
--start with #CurrentTab
SELECT T.TabID, T.ParentID FROM Tabs T WHERE T.TabID = #CurrentTabID
UNION ALL
--continue recursively, from child to parent Tab
SELECT T.TabID, T.ParentID
FROM Ancestor A INNER JOIN Tabs T ON T.TabID = A.ParentID
)
--insert results of recursive query into temporary table
INSERT #Ancestor
SELECT A.TabID FROM Ancestor A
OPTION (maxrecursion 10); --to increase number of traversed generations, increase "maxrecursion"
--============================================================
--retrieve Tabs to display in navigation
--This section UNIONs three query results together, giving us what we want:
-- 1. All Tabs at Level 0.
-- 2. All Tabs in #CurrentTab's lineage.
-- 3. All Tabs which are children of Tabs in #CurrentTab's lineage.
--============================================================
WITH TabNav (TabID, TabLevel, TabName, Lineage) AS
(
--retrieve all Tabs at Level 0 -- (Root Tabs)
(SELECT T.TabID, T.[Level] AS TabLevel, T.TabName, TL.Lineage
FROM Tabs T
INNER JOIN TabPermission TP ON (T.TabID = TP.TabID AND TP.RoleID = -1)
INNER JOIN #TabLineage TL ON T.TabID = TL.TabID
WHERE T.IsDeleted = 0
AND T.IsVisible = 1
AND T.[Level] = 0
UNION
--retrieve Tabs in #CurrentTab's lineage
SELECT T.TabID, T.[Level] AS TabLevel, T.TabName, TL.Lineage
FROM Tabs T
INNER JOIN TabPermission TP ON (T.TabID = TP.TabID AND TP.RoleID = -1)
INNER JOIN #Ancestor A ON T.TabID = A.TabID
INNER JOIN #TabLineage TL ON T.TabID = TL.TabID
WHERE T.IsDeleted = 0
AND T.IsVisible = 1
UNION
--retrieve Tabs which are children of Tabs in #CurrentTab's lineage
SELECT T.TabID, T.[Level] AS TabLevel, T.TabName, TL.Lineage
FROM Tabs T
INNER JOIN TabPermission TP ON (T.TabID = TP.TabID AND TP.RoleID = -1)
INNER JOIN #Ancestor A ON T.ParentID = A.TabID
INNER JOIN #TabLineage TL ON T.TabID = TL.TabID
WHERE T.IsDeleted = 0
AND T.IsVisible = 1)
)
--finally, return the Tabs to be included in the navigation module
SELECT TabID, TabLevel, TabName FROM TabNav ORDER BY Lineage;
--============================================================
The answer is "don't use SQL". There's already a method DotNetNuke.UI.Navigation.GetNavigationNodes that does this for you, and if you use it then your module won't break if and when the database schema changes. Even if you need to do something that GetNavigationNodes won't handle, you're still better off retrieving the pages via the API to be futureproof. Going directly to the database is just asking for trouble :)
here is a boiler plate (not based on the given OP's code) example of a recursive tree CTE, which shows how to sort a tree:
DECLARE #Contacts table (id varchar(6), first_name varchar(10), reports_to_id varchar(6))
INSERT #Contacts VALUES ('1','Jerome', NULL ) -- tree is as follows:
INSERT #Contacts VALUES ('2','Joe' ,'1') -- 1-Jerome
INSERT #Contacts VALUES ('3','Paul' ,'2') -- / \
INSERT #Contacts VALUES ('4','Jack' ,'3') -- 2-Joe 9-Bill
INSERT #Contacts VALUES ('5','Daniel','3') -- / \ \
INSERT #Contacts VALUES ('6','David' ,'2') -- 3-Paul 6-David 10-Sam
INSERT #Contacts VALUES ('7','Ian' ,'6') -- / \ / \
INSERT #Contacts VALUES ('8','Helen' ,'6') -- 4-Jack 5-Daniel 7-Ian 8-Helen
INSERT #Contacts VALUES ('9','Bill ' ,'1') --
INSERT #Contacts VALUES ('10','Sam' ,'9') --
DECLARE #Root_id varchar(6)
--get all nodes 2 and below
SET #Root_id=2
PRINT '#Root_id='+COALESCE(''''+#Root_id+'''','null')
;WITH StaffTree AS
(
SELECT
c.id, c.first_name, c.reports_to_id, c.reports_to_id as Manager_id, cc.first_name AS Manager_first_name, 1 AS LevelOf
FROM #Contacts c
LEFT OUTER JOIN #Contacts cc ON c.reports_to_id=cc.id
WHERE c.id=#Root_id OR (#Root_id IS NULL AND c.reports_to_id IS NULL)
UNION ALL
SELECT
s.id, s.first_name, s.reports_to_id, t.id, t.first_name, t.LevelOf+1
FROM StaffTree t
INNER JOIN #Contacts s ON t.id=s.reports_to_id
WHERE s.reports_to_id=#Root_id OR #Root_id IS NULL OR t.LevelOf>1
)
SELECT * FROM StaffTree ORDER BY LevelOf, first_name
OUTPUT:
#Root_id='2'
id first_name reports_to_id Manager_id Manager_first_name LevelOf
------ ---------- ------------- ---------- ------------------ -----------
2 Joe 1 1 Jerome 1
6 David 2 2 Joe 2
3 Paul 2 2 Joe 2
5 Daniel 3 3 Paul 3
8 Helen 6 6 David 3
7 Ian 6 6 David 3
4 Jack 3 3 Paul 3
(7 row(s) affected)
The key is the LevelOf column. See how it is just a literal 1 when selecting the main parent in the CTE. The LevelOf column is then incremented in the UNION ALL portion of the recursive CTE. Each recursive call (not row) to the CTE will hit that UNION ALL one time and the increment. Not a whole lot more to it than that.
Related
I have run into this a couple of times where a client is able to import data into a catalog with parent child relationships and I run into problems with said relationships. I need to find a way to prevent the following:
Object 1 has a child of Object 2
Object 2 has a child of Object 3
Object 3 has a child of Object 1
This throws the server into an infinite recursive loop and ultimately brings it to its knees. I can't seem to wrap my head around a SQL query that I could use to detect such recursive madness. The problem is prevalent enough that I need to find some solution. I've tried queries using CTE, nested selects/sub-selects and just can't seem to write one that will solve this issue. Any help would be greatly appreciated.
with recursive parents as (
select
s.id,
s.parent_id,
1 as depth
from categories s
where s.id = <passed in id>
union all
select
t.id,
t.parent_id,
c.depth + 1 as depth
from categories t
inner join parents c
on t.id = c.parent_id
where t.id <> t.parent_id)
select distinct parent_id from parents where parent_id <> 0 order by depth desc
This is what I finally came up with to "detect" a cycle condition
with recursive find_cycle as (
select
categories_id,
parent_id,
0 depth
from
categories
where categories_id = <passed in id>
union all
select
f.categories_id,
c.parent_id,
f.depth + 1
from
categories c
inner join find_cycle f
ON f.parent_id = c.categories_id
where c.parent_id <> c.categories_id
and f.parent_id <> f.categories_id
)
select
f.parent_id as categories_id,
c.parent_id
from find_cycle f
inner join categories c
on f.parent_id = c.categories_id
where exists (
select
1
from find_cycle f
inner join categories c
on f.parent_id = c.categories_id
where f.parent_id = <passed in id>)
order by depth desc;
It will return rows with the offending path and no rows if no cycle detected. Thanks for all the tips folks.
Here is the MariaDB function I came up with that will return 0 if there is not a cycle and 1 if there is a cycle for the id passed in to the function.
create function `detect_cycle`(id int, max_depth int) RETURNS tinyint(1)
begin
declare cycle_exists int default 0;
select (case when count(*) = 1 then 0 else 1 end) into cycle_exists
from
(
with recursive find_cycle as (
select
categories_id,
parent_id,
0 depth
from
categories
where categories_id = id
union all
select
f.categories_id,
c.parent_id,
f.depth + 1
from
categories c
inner join find_cycle f
ON f.parent_id = c.categories_id
where
c.parent_id <> c.categories_id
and f.parent_id <> f.categories_id
and f.depth < max_depth
)
select
c.parent_id
from find_cycle f
inner join categories c
on f.parent_id = c.categories_id
order by depth desc
limit 1
) __temp
where parent_id = 0;
return cycle_exists;
end;
This can then be called by executing
select categories_id, detect_cycle(categories_id, 5) as cycle_exists
from categories
where categories_id = <whatever id you want to check for a cycle condition>;
Here is a stored procedure that will accomplish the same thing but is generic enough to handle any table, id column, parent column combination.
CREATE PROCEDURE `detect_cycle`(table_name varchar(64), id_column varchar(32), parent_id_column varchar(32), max_depth int)
BEGIN
declare id int default 0;
declare sql_query text default '';
declare where_clause text default '';
declare done bool default false;
declare id_cursor cursor for select root_id from __temp_ids;
declare continue handler for not found set done = true;
drop temporary table if exists __temp_ids;
create temporary table __temp_ids(root_id int not null primary key);
set sql_query = concat('
insert into __temp_ids
select
`',id_column,'`
from ',table_name);
prepare statement from sql_query;
execute statement;
drop temporary table if exists __temp_cycle;
create temporary table __temp_cycle (id int not null, parent_id int not null);
open id_cursor;
id_loop: loop
fetch from id_cursor into id;
if done then
leave id_loop;
end if;
set where_clause = concat('where `',id_column,'` = ',id);
set sql_query = concat('
insert into __temp_cycle
select
t.`',id_column,'`,
t.`',parent_id_column,'`
from
(
with recursive find_cycle as (
select
`',id_column,'`,
`',parent_id_column,'`,
0 depth
from
`',table_name,'`
',where_clause,'
union all
select
f.`',id_column,'`,
c.`',parent_id_column,'`,
f.depth + 1
from
`',table_name,'` c
inner join find_cycle f
ON f.`',parent_id_column,'` = c.`',id_column,'`
where
c.`',parent_id_column,'` <> c.`',id_column,'`
and f.`',parent_id_column,'` <> f.`',id_column,'`
and f.depth < ',max_depth,'
)
select
c.`',id_column,'`,
c.`',parent_id_column,'`
from find_cycle f
inner join `',table_name,'` c
on f.`',parent_id_column,'` = c.`',id_column,'`
order by depth desc
limit 1
) t
where t.`',parent_id_column,'` > 0');
prepare statement from sql_query;
execute statement;
end loop;
close id_cursor;
deallocate prepare statement;
select distinct
*
from __temp_cycle;
drop temporary table if exists __temp_ids;
drop temporary table if exists __temp_cycle;
END
usage:
call detect_cycle(table_name, id_column, parent_id_column, max_depth);
This will return a result set of all cycle conditions within the given table.
Looks like you have this figured out to stop a cycling event but are looking for ways to identify a cycle. In that case, consider using a path:
with recursive parents as (
select
s.id,
s.parent_id,
1 as depth,
CONCAT(s.id,'>',s.parent_id) as path,
NULL as cycle_detection
from categories s
where s.id = <passed in id>
union all
select
t.id,
t.parent_id,
c.depth + 1 as depth,
CONCAT(c.path, '>', t.parent_id),
CASE WHEN c.path LIKE CONCAT('%',t.parent_id,'>%') THEN 'cycle' END
from categories t
inner join parents c
on t.id = c.parent_id
where t.id <> t.parent_id)
select distinct parent_id, cycle_detection from parents where parent_id <> 0 order by depth desc
I may be a bit off my syntax since it's been forever since I wrote mysql/mariadb syntax, but this is the basic idea. Capture the path that the recursion took and then see if your current item is already in the path.
If the depth of the resulting tree is not extremely deep then you can detect cycles by storing the bread crumbs that the recursive CTE is walking. Knowing the bread crumbs you can detect cycles easily.
For example:
with recursive
n as (
select id, parent_id, concat('/', id, '/') as path
from categories where id = 2
union all
select c.id, c.parent_id, concat(n.path, c.id, '/')
from n
join categories c on c.parent_id = n.id
where n.path not like concat('%/', c.id, '/%') -- cycle pruning here!
)
select * from n;
Result:
id parent_id path
--- ---------- -------
2 1 /2/
3 2 /2/3/
1 3 /2/3/1/
See running example at DB Fiddle.
I have one table category_code having data like
SELECT Item, Code, Prefix from category_codes
Item Code Prefix
Bangles BL BL
Chains CH CH
Ear rings ER ER
Sets Set ST
Rings RING RG
Yellow GOld YG YG........
I have another table item_categories having data like
select code,name from item_categories
code name
AQ.TM.PN AQ.TM.PN
BL.YG.CH.ME.PN BL.YG.CH.ME.PN
BS.CZ.ST.YG.PN BS.CZ.ST.YG.PN
CR.YG CR.YG.......
i want to update item_categories.name column corresponding to category_code.item column like
code name
BL.YG.CH.ME.PN Bangles.Yellow Gold.Chains.. . . .
Please suggest good solution for that. Thanks in advance.
First, split the code into several rows, join with the category code and then, concat the result to update the table.
Here an example, based on the data you gave
create table #category_code (item varchar(max), code varchar(max), prefix varchar(max));
create table #item_categories (code varchar(max), name varchar(max));
insert into #category_code (item, code, prefix) values ('Bangles','BL','BL'),('Chains','CH','CH'),('Ear rings','ER','ER'), ('Sets','Set','ST'),('Rings','RING','RG'), ('Yellow gold','YG','YG');
insert into #item_categories (code, name) values ('AQ.TM,PN','AQ.TM.PN'),('BL.YG.CH.ME.PN','BL.YG.CH.ME.PN'),('BS.CZ.ST.YG.PN','BS.CZ.ST.YG.PN')
;with splitted as ( -- split the codes into individual code
select row_number() over (partition by ic.code order by ic.code) as id, ic.code, x.value, cc.item
from #item_categories ic
outer apply string_split(ic.code, '.') x -- SQL Server 2016+, otherwise, use another method to split the data
left join #category_code cc on cc.code = x.value -- some values are missing in you example, but can use an inner join
)
, joined as ( -- then joined them to concat the name
select id, convert(varchar(max),code) as code, convert(varchar(max),coalesce(item + ',','')) as Item
from splitted
where id = 1
union all
select s.id, convert(varchar(max), s.code), convert(varchar(max), j.item + coalesce(s.item + ',',''))
from splitted s
inner join joined j on j.id = s.id - 1 and j.code = s.code
)
update #item_categories
set name = substring (j.item ,1,case when len(j.item) > 1 then len(j.item)-1 else 0 end)
output deleted.name, inserted.name
from #item_categories i
inner join joined j on j.code = i.code
inner join (select code, max(id)maxid from joined group by code) mj on mj.code = j.code and mj.maxid = j.id
Hopefully I'm missing a simple solution to this.
I have two tables. One contains a list of companies. The second contains a list of publishers. The mapping between the two is many to many. What I would like to do is bundle or group all of the companies in table A which have any relationship to a publisher in table B and vise versa.
The final result would look something like this (GROUPID is the key field). Row 1 and 2 are in the same group because they share the same company. Row 3 is in the same group because the publisher Y was already mapped over to company A. Row 4 is in the group because Company B was already mapped to group 1 through Publisher Y.
Said simply, any time there is any kind of shared relationship across Company and Publisher, that pair should be assigned to the same group.
ROW GROUPID Company Publisher
1 1 A Y
2 1 A X
3 1 B Y
4 1 B Z
5 2 C W
6 2 C P
7 2 D W
Fiddle
Update:
My bounty version: Given the table in the fiddle above of simply Company and Publisher pairs, populate the GROUPID field above. Think of it as creating a Family ID that encompasses all related parents/children.
SQL Server 2012
I thought about using recursive CTE, but, as far as I know, it's not possible in SQL Server to use UNION to connect anchor member and a recursive member of recursive CTE (I think it's possible to do in PostgreSQL), so it's not possible to eliminate duplicates.
declare #i int
with cte as (
select
GroupID,
row_number() over(order by Company) as rn
from Table1
)
update cte set GroupID = rn
select #i = ##rowcount
-- while some rows updated
while #i > 0
begin
update T1 set
GroupID = T2.GroupID
from Table1 as T1
inner join (
select T2.Company, min(T2.GroupID) as GroupID
from Table1 as T2
group by T2.Company
) as T2 on T2.Company = T1.Company
where T1.GroupID > T2.GroupID
select #i = ##rowcount
update T1 set
GroupID = T2.GroupID
from Table1 as T1
inner join (
select T2.Publisher, min(T2.GroupID) as GroupID
from Table1 as T2
group by T2.Publisher
) as T2 on T2.Publisher = T1.Publisher
where T1.GroupID > T2.GroupID
-- will be > 0 if any rows updated
select #i = #i + ##rowcount
end
;with cte as (
select
GroupID,
dense_rank() over(order by GroupID) as rn
from Table1
)
update cte set GroupID = rn
sql fiddle demo
I've also tried a breadth first search algorithm. I thought it could be faster (it's better in terms of complexity), so I'll provide a solution here. I've found that it's not faster than SQL approach, though:
declare #Company nvarchar(2), #Publisher nvarchar(2), #GroupID int
declare #Queue table (
Company nvarchar(2), Publisher nvarchar(2), ID int identity(1, 1),
primary key(Company, Publisher)
)
select #GroupID = 0
while 1 = 1
begin
select top 1 #Company = Company, #Publisher = Publisher
from Table1
where GroupID is null
if ##rowcount = 0 break
select #GroupID = #GroupID + 1
insert into #Queue(Company, Publisher)
select #Company, #Publisher
while 1 = 1
begin
select top 1 #Company = Company, #Publisher = Publisher
from #Queue
order by ID asc
if ##rowcount = 0 break
update Table1 set
GroupID = #GroupID
where Company = #Company and Publisher = #Publisher
delete from #Queue where Company = #Company and Publisher = #Publisher
;with cte as (
select Company, Publisher from Table1 where Company = #Company and GroupID is null
union all
select Company, Publisher from Table1 where Publisher = #Publisher and GroupID is null
)
insert into #Queue(Company, Publisher)
select distinct c.Company, c.Publisher
from cte as c
where not exists (select * from #Queue as q where q.Company = c.Company and q.Publisher = c.Publisher)
end
end
sql fiddle demo
I've tested my version and Gordon Linoff's to check how it's perform. It looks like CTE is much worse, I couldn't wait while it's complete on more than 1000 rows.
Here's sql fiddle demo with random data. My results were:
128 rows:
my RBAR solution: 190ms
my SQL solution: 27ms
Gordon Linoff's solution: 958ms
256 rows:
my RBAR solution: 560ms
my SQL solution: 1226ms
Gordon Linoff's solution: 45371ms
It's random data, so results may be not very consistent. I think timing could be changed by indexes, but don't think it could change a whole picture.
old version - using temporary table, just calculating GroupID without touching initial table:
declare #i int
-- creating table to gather all possible GroupID for each row
create table #Temp
(
Company varchar(1), Publisher varchar(1), GroupID varchar(1),
primary key (Company, Publisher, GroupID)
)
-- initializing it with data
insert into #Temp (Company, Publisher, GroupID)
select Company, Publisher, Company
from Table1
select #i = ##rowcount
-- while some rows inserted into #Temp
while #i > 0
begin
-- expand #Temp in both directions
;with cte as (
select
T2.Company, T1.Publisher,
T1.GroupID as GroupID1, T2.GroupID as GroupID2
from #Temp as T1
inner join #Temp as T2 on T2.Company = T1.Company
union
select
T1.Company, T2.Publisher,
T1.GroupID as GroupID1, T2.GroupID as GroupID2
from #Temp as T1
inner join #Temp as T2 on T2.Publisher = T1.Publisher
), cte2 as (
select
Company, Publisher,
case when GroupID1 < GroupID2 then GroupID1 else GroupID2 end as GroupID
from cte
)
insert into #Temp
select Company, Publisher, GroupID
from cte2
-- don't insert duplicates
except
select Company, Publisher, GroupID
from #Temp
-- will be > 0 if any row inserted
select #i = ##rowcount
end
select
Company, Publisher,
dense_rank() over(order by min(GroupID)) as GroupID
from #Temp
group by Company, Publisher
=> sql fiddle example
Your problem is a graph-walking problem of finding connected subgraphs. It is a little more challenging because your data structure has two types of nodes ("companies" and "pubishers") rather than one type.
You can solve this with a single recursive CTE. The logic is as follows.
First, convert the problem into a graph with only one type of node. I do this by making the nodes companies and the edges linkes between companies, using the publisher information. This is just a join:
select t1.company as node1, t2.company as node2
from table1 t1 join
table1 t2
on t1.publisher = t2.publisher
)
(For efficiency sake, you could also add t1.company <> t2.company but that is not strictly necessary.)
Now, this is a "simple" graph walking problem, where a recursive CTE is used to create all connections between two nodes. The recursive CTE walks through the graph using join. Along the way, it keeps a list of all nodes visited. In SQL Server, this needs to be stored in a string.
The code needs to ensure that it doesn't visit a node twice for a given path, because this can result in infinite recursion (and an error). If the above is called edges, the CTE that generates all pairs of connected nodes looks like:
cte as (
select e.node1, e.node2, cast('|'+e.node1+'|'+e.node2+'|' as varchar(max)) as nodes,
1 as level
from edges e
union all
select c.node1, e.node2, c.nodes+e.node2+'|', 1+c.level
from cte c join
edges e
on c.node2 = e.node1 and
c.nodes not like '|%'+e.node2+'%|'
)
Now, with this list of connected nodes, assign each node the minimum of all the nodes it is connected to, including itself. This serves as an identifier of connected subgraphs. That is, all companies connected to each other via the publishers will have the same minimum.
The final two steps are to enumerate this minimum (as the GroupId) and to join the GroupId back to the original data.
The full (and I might add tested) query looks like:
with edges as (
select t1.company as node1, t2.company as node2
from table1 t1 join
table1 t2
on t1.publisher = t2.publisher
),
cte as (
select e.node1, e.node2,
cast('|'+e.node1+'|'+e.node2+'|' as varchar(max)) as nodes,
1 as level
from edges e
union all
select c.node1, e.node2,
c.nodes+e.node2+'|',
1+c.level
from cte c join
edges e
on c.node2 = e.node1 and
c.nodes not like '|%'+e.node2+'%|'
),
nodes as (
select node1,
(case when min(node2) < node1 then min(node2) else node1 end
) as grp
from cte
group by node1
)
select t.company, t.publisher, grp.GroupId
from table1 t join
(select n.node1, dense_rank() over (order by grp) as GroupId
from nodes n
) grp
on t.company = grp.node1;
Note that this works on finding any connected subgraphs. It does not assume that any particular number of levels.
EDIT:
The question of performance for this is vexing. At a minimum, the above query will run better with an index on Publisher. Better yet is to take #MikaelEriksson's suggestion, and put the edges in a separate table.
Another question is whether you look for equivalency classes among the Companies or the Publishers. I took the approach of using Companies, because I think that has better "explanability" (my inclination to respond was based on numerous comments that this could not be done with CTEs).
I am guessing that you could get reasonable performance from this, although that requires more knowledge of your data and system than provided in the OP. It is quite likely, though, that the best performance will come from a multiple query approach.
Here is my solution SQL Fiddle
The nature of the relationships require looping as I figure.
Here is the SQL:
--drop TABLE Table1
CREATE TABLE Table1
([row] int identity (1,1),GroupID INT NULL,[Company] varchar(2), [Publisher] varchar(2))
;
INSERT INTO Table1
(Company, Publisher)
select
left(newid(), 2), left(newid(), 2)
declare #i int = 1
while #i < 8
begin
;with cte(Company, Publisher) as (
select
left(newid(), 2), left(newid(), 2)
from Table1
)
insert into Table1(Company, Publisher)
select distinct c.Company, c.Publisher
from cte as c
where not exists (select * from Table1 as t where t.Company = c.Company and t.Publisher = c.Publisher)
set #i = #i + 1
end;
CREATE NONCLUSTERED INDEX IX_Temp1 on Table1 (Company)
CREATE NONCLUSTERED INDEX IX_Temp2 on Table1 (Publisher)
declare #counter int=0
declare #row int=0
declare #lastnullcount int=0
declare #currentnullcount int=0
WHILE EXISTS (
SELECT *
FROM Table1
where GroupID is null
)
BEGIN
SET #counter=#counter+1
SET #lastnullcount =0
SELECT TOP 1
#row=[row]
FROM Table1
where GroupID is null
order by [row] asc
SELECT #currentnullcount=count(*) from table1 where groupid is null
WHILE #lastnullcount <> #currentnullcount
BEGIN
SELECT #lastnullcount=count(*)
from table1
where groupid is null
UPDATE Table1
SET GroupID=#counter
WHERE [row]=#row
UPDATE t2
SET t2.GroupID=#counter
FROM Table1 t1
INNER JOIN Table1 t2 on t1.Company=t2.Company
WHERE t1.GroupID=#counter
AND t2.GroupID IS NULL
UPDATE t2
SET t2.GroupID=#counter
FROM Table1 t1
INNER JOIN Table1 t2 on t1.publisher=t2.publisher
WHERE t1.GroupID=#counter
AND t2.GroupID IS NULL
SELECT #currentnullcount=count(*)
from table1
where groupid is null
END
END
SELECT * FROM Table1
Edit:
Added indexes as I would expect on the real table and be more in line with the other data sets Roman is using.
You are trying to find all of the connected components of your graph, which can only be done iteratively. If you know the maximum width of any connected component (i.e. the maximum number of links you will have to take from one company/publisher to another), you could in principle do it something like this:
SELECT
MIN(x2.groupID) AS groupID,
x1.Company,
x1.Publisher
FROM Table1 AS x1
INNER JOIN (
SELECT
MIN(x2.Company) AS groupID,
x1.Company,
x1.Publisher
FROM Table1 AS x1
INNER JOIN Table1 AS x2
ON x1.Publisher = x2.Publisher
GROUP BY
x1.Publisher,
x1.Company
) AS x2
ON x1.Company = x2.Company
GROUP BY
x1.Publisher,
x1.Company;
You have to keep nesting the subquery (alternating joins on Company and Publisher, and with the deepest subquery saying MIN(Company) rather than MIN(groupID)) to the maximum iteration depth.
I don't really recommend this, though; it would be cleaner to do this outside of SQL.
Disclaimer: I don't know anything about SQL Server 2012 (or any other version); it may have some kind of additional scripting ability to let you do this iteration dynamically.
This is a recursive solution, using XML:
with a as ( -- recursive result, containing shorter subsets and duplicates
select cast('<c>' + company + '</c>' as xml) as companies
,cast('<p>' + publisher + '</p>' as xml) as publishers
from Table1
union all
select a.companies.query('for $c in distinct-values((for $i in /c return string($i),
sql:column("t.company")))
order by $c
return <c>{$c}</c>')
,a.publishers.query('for $p in distinct-values((for $i in /p return string($i),
sql:column("t.publisher")))
order by $p
return <p>{$p}</p>')
from a join Table1 t
on ( a.companies.exist('/c[text() = sql:column("t.company")]') = 0
or a.publishers.exist('/p[text() = sql:column("t.publisher")]') = 0)
and ( a.companies.exist('/c[text() = sql:column("t.company")]') = 1
or a.publishers.exist('/p[text() = sql:column("t.publisher")]') = 1)
), b as ( -- remove the shorter versions from earlier steps of the recursion and the duplicates
select distinct -- distinct cannot work on xml types, hence cast to nvarchar
cast(companies as nvarchar) as companies
,cast(publishers as nvarchar) as publishers
,DENSE_RANK() over(order by cast(companies as nvarchar), cast(publishers as nvarchar)) as groupid
from a
where not exists (select 1 from a as s -- s is a proper subset of a
where (cast('<s>' + cast(s.companies as varchar)
+ '</s><a>' + cast(a.companies as varchar) + '</a>' as xml)
).value('if((count(/s/c) > count(/a/c))
and (some $s in /s/c/text() satisfies
(some $a in /a/c/text() satisfies $s = $a))
) then 1 else 0', 'int') = 1
)
and not exists (select 1 from a as s -- s is a proper subset of a
where (cast('<s>' + cast(s.publishers as nvarchar)
+ '</s><a>' + cast(a.publishers as nvarchar) + '</a>' as xml)
).value('if((count(/s/p) > count(/a/p))
and (some $s in /s/p/text() satisfies
(some $a in /a/p/text() satisfies $s = $a))
) then 1 else 0', 'int') = 1
)
), c as ( -- cast back to xml
select cast(companies as xml) as companies
,cast(publishers as xml) as publishers
,groupid
from b
)
select Co.company.value('(./text())[1]', 'varchar') as company
,Pu.publisher.value('(./text())[1]', 'varchar') as publisher
,c.groupid
from c
cross apply companies.nodes('/c') as Co(company)
cross apply publishers.nodes('/p') as Pu(publisher)
where exists(select 1 from Table1 t -- restrict to only the combinations that exist in the source
where t.company = Co.company.value('(./text())[1]', 'varchar')
and t.publisher = Pu.publisher.value('(./text())[1]', 'varchar')
)
The set of companies and the set of publishers are kept in XML fields in the intermediate steps, and there is some casting between xml and nvarchar necessary due to some limitations of SQL Server (like not being able to group or use distinct on XML columns.
Bit late to the challenge, and since SQLFiddle seems to be down ATM I'll have to guess your data-structures. Nevertheless, it seemed like a fun challenge (and it was =) so here's what I made from it :
Setup:
IF OBJECT_ID('t_link') IS NOT NULL DROP TABLE t_link
IF OBJECT_ID('t_company') IS NOT NULL DROP TABLE t_company
IF OBJECT_ID('t_publisher') IS NOT NULL DROP TABLE t_publisher
IF OBJECT_ID('tempdb..#link_A') IS NOT NULL DROP TABLE #link_A
IF OBJECT_ID('tempdb..#link_B') IS NOT NULL DROP TABLE #link_B
GO
CREATE TABLE t_company ( company_id int IDENTITY(1, 1) NOT NULL PRIMARY KEY,
company_name varchar(100) NOT NULL)
GO
CREATE TABLE t_publisher (publisher_id int IDENTITY(1, 1) NOT NULL PRIMARY KEY,
publisher_name varchar(100) NOT NULL)
CREATE TABLE t_link (company_id int NOT NULL FOREIGN KEY (company_id) REFERENCES t_company (company_id),
publisher_id int NOT NULL FOREIGN KEY (publisher_id) REFERENCES t_publisher (publisher_id),
PRIMARY KEY (company_id, publisher_id),
group_id int NULL
)
GO
-- example content
-- ROW GROUPID Company Publisher
--1 1 A Y
--2 1 A X
--3 1 B Y
--4 1 B Z
--5 2 C W
--6 2 C P
--7 2 D W
INSERT t_company (company_name) VALUES ('A'), ('B'), ('C'), ('D')
INSERT t_publisher (publisher_name) VALUES ('X'), ('Y'), ('Z'), ('W'), ('P')
INSERT t_link (company_id, publisher_id)
SELECT company_id, publisher_id
FROM t_company, t_publisher
WHERE (company_name = 'A' AND publisher_name = 'Y')
OR (company_name = 'A' AND publisher_name = 'X')
OR (company_name = 'B' AND publisher_name = 'Y')
OR (company_name = 'B' AND publisher_name = 'Z')
OR (company_name = 'C' AND publisher_name = 'W')
OR (company_name = 'C' AND publisher_name = 'P')
OR (company_name = 'D' AND publisher_name = 'W')
GO
/*
-- volume testing
TRUNCATE TABLE t_link
DELETE t_company
DELETE t_publisher
DECLARE #company_count int = 1000,
#publisher_count int = 450,
#links_count int = 800
INSERT t_company (company_name)
SELECT company_name = Convert(varchar(100), NewID())
FROM master.dbo.fn_int_list(1, #company_count)
UPDATE STATISTICS t_company
INSERT t_publisher (publisher_name)
SELECT publisher_name = Convert(varchar(100), NewID())
FROM master.dbo.fn_int_list(1, #publisher_count)
UPDATE STATISTICS t_publisher
-- Random links between the companies & publishers
DECLARE #count int
SELECT #count = 0
WHILE #count < #links_count
BEGIN
SELECT TOP 30 PERCENT row_id = IDENTITY(int, 1, 1), company_id = company_id + 0
INTO #link_A
FROM t_company
ORDER BY NewID()
SELECT TOP 30 PERCENT row_id = IDENTITY(int, 1, 1), publisher_id = publisher_id + 0
INTO #link_B
FROM t_publisher
ORDER BY NewID()
INSERT TOP (#links_count - #count) t_link (company_id, publisher_id)
SELECT A.company_id,
B.publisher_id
FROM #link_A A
JOIN #link_B B
ON A.row_id = B.row_id
WHERE NOT EXISTS ( SELECT *
FROM t_link old
WHERE old.company_id = A.company_id
AND old.publisher_id = B.publisher_id)
SELECT #count = #count + ##ROWCOUNT
DROP TABLE #link_A
DROP TABLE #link_B
END
*/
Actual grouping:
IF OBJECT_ID('tempdb..#links') IS NOT NULL DROP TABLE #links
GO
-- apply grouping
-- init
SELECT row_id = IDENTITY(int, 1, 1),
company_id,
publisher_id,
group_id = 0
INTO #links
FROM t_link
-- don't see an index that would be actually helpful here right-away, using row_id to avoid HEAP
CREATE CLUSTERED INDEX idx0 ON #links (row_id)
--CREATE INDEX idx1 ON #links (company_id)
--CREATE INDEX idx2 ON #links (publisher_id)
UPDATE #links
SET group_id = row_id
-- start grouping
WHILE ##ROWCOUNT > 0
BEGIN
UPDATE #links
SET group_id = new_group_id
FROM #links upd
CROSS APPLY (SELECT new_group_id = Min(group_id)
FROM #links new
WHERE new.company_id = upd.company_id
OR new.publisher_id = upd.publisher_id
) x
WHERE upd.group_id > new_group_id
-- select * from #links
END
-- remove 'holes'
UPDATE #links
SET group_id = (SELECT COUNT(DISTINCT o.group_id)
FROM #links o
WHERE o.group_id <= upd.group_id)
FROM #links upd
GO
UPDATE t_link
SET group_id = new.group_id
FROM t_link upd
LEFT OUTER JOIN #links new
ON new.company_id = upd.company_id
AND new.publisher_id = upd.publisher_id
GO
SELECT row = ROW_NUMBER() OVER (ORDER BY group_id, company_name, publisher_name),
l.group_id,
c.company_name, -- c.company_id,
p.publisher_name -- , p.publisher_id
from t_link l
JOIN t_company c
ON l.company_id = c.company_id
JOIN t_publisher p
ON p.publisher_id = l.publisher_id
ORDER BY 1
At first sight this approach hasn't been tried yet by anyone else, interesting to see how this can be done in a variety of ways... (preferred not to read them upfront as it would spoil the puzzle =)
Results look as expected (as far as I understand the requirements and the example) and performance isn't too shabby either although there is no real indication on the amount of records this should work on; not sure how it would scale but don't expect too many problems either...
In SQL Server 2008;
I have a tree. I need to get all child nodes of node n (see diagram) and all child nodes of these child nodes, etc until the leaf nodes which is fairly trivial. I also need to be able to say 'Take node o, go up the tree until we reach m and because m is a child of node n set some property of node o to some property of node m. Node o could be 3 levels deep (as illustrated) or 45 levels deep, x levels deep.
This gets all children of a given node (or area)
--Return all sub-area structure of an area:
WITH temp_areas (ParentAreaID, AreaID, [Name], [Level]) AS
(
SELECT ParentAreaID, AreaID, [Name], 0
FROM lib_areas WHERE AreaID = #AreaID
UNION ALL
SELECT B.ParentAreaID, B.AreaID, B.[Name], A.Level + 1
FROM temp_areas AS A, lib_areas AS B
WHERE A.AreaID = B.ParentAreaID
)
INSERT INTO #files (id) SELECT fileid FROM lib_filesareasxref where areaid in (select areaid from temp_areas)
while exists (select * from #files)
begin
select top 1
#ID = id
from
#files ORDER BY id DESC
delete from #files where id = #id
This will track back from #node_o until it reaches #node_m or it reaches the top of the tree (if #node_m is not above #node_o).
WITH
parents
AS
(
SELECT
A.ParentAreaID, A.AreaID, A.[Name], 0
FROM
lib_areas AS A
WHERE
A.AreaID = #node_o
UNION ALL
SELECT
A.ParentAreaID, A.AreaID, A.[Name], B.Level + 1
FROM
lib_areas AS A
INNER JOIN
parents AS B
ON A.AreaID = B.ParentAreaID
WHERE
B.AreaID <> #node_m
)
SELECT
*
FROM
parents
I'd propose using a HierarchyID data type in your table, and using the GetAncestor method
I'm trying to write a recursive query in SQL Server that basically lists a parent-child hierarchy from a given parent. A parent can have multiple children and a child can belong to multiple parents so it is stored in a many-to-many relation.
I modified the following query from another somewhat related question, however this doesn't go all the way up to the tree and only selects the first level child...
DECLARE #ObjectId uniqueidentifier
SET #ObjectId = '1A213431-F83D-49E3-B5E2-42AA6EB419F1';
WITH Tree AS
(
SELECT A.*
FROM Objects_In_Objects A
WHERE A.ParentObjectId = #ObjectId
UNION ALL
SELECT B.*
FROM Tree A
JOIN Objects_In_Objects B
ON A.ParentObjectId = B.ObjectId
)
SELECT *
FROM Tree
INNER JOIN Objects ar on tree.ObjectId = ar.ObjectId
Does anyone know how to modify the query to go all the way down the 'tree'? Or is this not possible using the above construction?
Objects
Columns: ObjectId | Name
Objects_In_Objects
Columns: ObjectId | ParentObjectId
Sample data:
Objects
ObjectId | Name
1A213431-F83D-49E3-B5E2-42AA6EB419F1 | Main container
63BD908B-54B7-4D62-BE13-B888277B7365 | Sub container
71526E15-F713-4F03-B707-3F5529D6B25E | Sub container 2
ADA9A487-7256-46AD-8574-0CE9475315E4 | Object in multiple containers
Objects In Objects
ObjectId | ParentObjectId
ADA9A487-7256-46AD-8574-0CE9475315E4 | 71526E15-F713-4F03-B707-3F5529D6B25E
ADA9A487-7256-46AD-8574-0CE9475315E4 | 63BD908B-54B7-4D62-BE13-B888277B7365
63BD908B-54B7-4D62-BE13-B888277B7365 | 1A213431-F83D-49E3-B5E2-42AA6EB419F1
71526E15-F713-4F03-B707-3F5529D6B25E | 1A213431-F83D-49E3-B5E2-42AA6EB419F1
Such a recursive CTE (Common Table Expression) will goo all the way .
Try this:
;WITH Tree AS
(
SELECT A.ObjectID, A.ObjectName, o.ParentObjectID, 1 AS 'Level'
FROM dbo.Objects A
INNER JOIN dbo.Objects_In_Objects o ON A.ObjectID = o.ParentObjectID
WHERE A.ObjectId = #ObjectId -- use the A.ObjectId here
UNION ALL
SELECT A2.ObjectID, A2.ObjectName, B.ParentObjectID, t.Level + 1 AS 'Level'
FROM Tree t
INNER JOIN dbo.Objects_In_Objects B ON B.ParentObjectID = t.ObjectID
INNER JOIN dbo.Objects A2 ON A2.ObjectId = B.ObjectId
)
SELECT *
FROM Tree
INNER JOIN dbo.Objects ar on tree.ObjectId = ar.ObjectId
If you change this - does this work for you now? (I added a Level column - typically helps to understand the "depth" in the hierarchy for every row)
I do seem to get the proper output on my SQL Server instance, at least...
declare #Objects_In_Objects table
(
ObjectID uniqueidentifier,
ParentObjectId uniqueidentifier
)
declare #Objects table
(
ObjectId uniqueidentifier,
Name varchar(50)
)
insert into #Objects values
('1A213431-F83D-49E3-B5E2-42AA6EB419F1', 'Main container'),
('63BD908B-54B7-4D62-BE13-B888277B7365', 'Sub container'),
('71526E15-F713-4F03-B707-3F5529D6B25E', 'Sub container 2'),
('ADA9A487-7256-46AD-8574-0CE9475315E4', 'Object in multiple containers')
insert into #Objects_In_Objects values
('ADA9A487-7256-46AD-8574-0CE9475315E4', '71526E15-F713-4F03-B707-3F5529D6B25E'),
('ADA9A487-7256-46AD-8574-0CE9475315E4', '63BD908B-54B7-4D62-BE13-B888277B7365'),
('63BD908B-54B7-4D62-BE13-B888277B7365', '1A213431-F83D-49E3-B5E2-42AA6EB419F1'),
('71526E15-F713-4F03-B707-3F5529D6B25E', '1A213431-F83D-49E3-B5E2-42AA6EB419F1')
DECLARE #ObjectId uniqueidentifier
SET #ObjectId = '1A213431-F83D-49E3-B5E2-42AA6EB419F1';
WITH Tree AS
(
SELECT A.ObjectID,
A.ParentObjectId
FROM #Objects_In_Objects A
WHERE A.ParentObjectId = #ObjectId
UNION ALL
SELECT B.ObjectID,
B.ParentObjectId
FROM Tree A
JOIN #Objects_In_Objects B
ON B.ParentObjectId = A.ObjectId
)
SELECT *
FROM Tree
INNER JOIN #Objects ar on tree.ObjectId = ar.ObjectId;
Is this what you are looking for? https://data.stackexchange.com/stackoverflow/q/111357/
Can this help you ?
http://www.aghausman.net/sql_server/storingretrieving-hierarchical-data-in-sql-server-database.html