Find whether graph has a cycle - sql

I want to find out whether it is possible to find cycles in Hierarchical or Chain data with SQL.
E.g. I have following schema:
http://sqlfiddle.com/#!3/27269
create table node (
id INTEGER
);
create table edges (
id INTEGER,
node_a INTEGER,
node_b INTEGER
);
create table graph (
id INTEGER,
edge_id INTEGER);
INSERT INTO node VALUES (1) , (2), (3), (4);
INSERT INTO edges VALUES (1, 1, 2), (2, 2, 3) , (3, 3, 4) , (4, 4, 1);
-- first graph [id = 1] with cycle (1 -> 2 -> 3 -> 4 -> 1)
INSERT INTO graph VALUES (1, 1), (1, 2), (1, 3), (1, 4);
-- second graph [id =2] without cycle (1 -> 2 -> 3)
INSERT INTO graph VALUES (2, 1), (2, 2), (2, 3);
In graph table records with same ID belong to one graph.
I need a query that will return IDs of all graphs that have at least one cycle.
So for example above query should return 1, which is the id of the first graph;

First, I assume this is a directed graph. An undirected graph has a trivial cycle if it contains a single edge.
The only tricky part to the recursive CTE is stopping when you've hit a cycle -- so you don't get infinite recursion.
Try this:
with cte as (
select e.object_a, e.object_b, iscycle = 0
from edges e
union all
select cte.object_a, e.object_b,
(case when cte.object_a = e.object_b then 1 else 0 end) as iscycle
from cte join
edges e
on cte.object_b = e.object_a
where iscycle = 0
)
select max(iscycle)
from cte;

I wrote SQL query based on #gordon-linoff answer. In some cases I had infinite loop, so I added column with node_path and then I was checking if the current connection had appeared in that column.
This is this script:
create table edges (
node_a varchar(20),
node_b varchar(20)
);
INSERT INTO edges VALUES ('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'D'), ('D', 'K'), ('K', 'A')
GO
with cte as (
SELECT
e.node_a
, e.node_b
, 0 as depth
, iscycle = 0
, CAST(e.node_a +' -> '+ e.node_b AS varchar(MAX)) as nodes_path
FROM edges e
UNION ALL
SELECT
cte.node_a
, e.node_b
, depth + 1
, (case when cte.node_a = e.node_b then 1 else 0 end) as iscycle
, CAST(cte.nodes_path+' -> '+ e.node_b AS varchar(MAX)) as nodes_path
FROM cte
JOIN edges e ON cte.node_b = e.node_a AND cte.nodes_path NOT LIKE '%' + CAST(cte.node_a+' -> '+ e.node_b AS varchar(500)) + '%'
where iscycle = 0
)
SELECT * -- max(iscycle)
FROM cte
option (maxrecursion 300) --just for safety :)
I don't know if it is efficient where are millions of records, so if you can see that I could write this query more optimized, please share with your opinion.

Related

SQL Server Query to fetch nested data

I have a table like this -
declare #tmpData as table
(
MainId int,
RefId int
)
INSERT INTO #tmpData
(MainId,
RefId)
VALUES (1, NULL),
(2, 1),
(3, 2),
(4, 3),
(5, NULL),
(6, 5);
SO, If I pass a value for example - 1
then it should return all rows where value 1 is linked directly or indirectly.
And result should be - (Here 1 is ref with MainId 2, and 2 is ref with Main Id 3 and so on...) MaiId 5 and 6 is not related to 1 so output is -
Any one please provide sql server query for the same. Thanks
I tried by applying left join with same table on MainId and RefId.
But not got desired output.
You need a recursive CTE (dbfiddle)
WITH R
AS (SELECT t.MainId,
t.RefId
FROM #tmpData t
WHERE t.MainId = 1
UNION ALL
SELECT t.MainId,
t.RefId
FROM #tmpData t
JOIN R
ON t.RefId = r.MainId)
SELECT *
FROM R

Find and classify sequential patterns for a distinct group T-SQL

I need help finding and classifying sequential patterns for each distinct key.
From the data I have, I need to create a new table that contains the key and a pattern identifier that belongs to that key.
From the example below the pattern is as follows:
Key #1 and #3 have the values 1, 2 and 3. The Key #3 has the values 8,
9 and 10. When a distinct pattern exists for a key I.E (1, 2, 3) I
need to create an entry on the table for the key # and that specific
pattern (1, 2, 3)
Data:
key value
1 1
1 2
1 3
2 8
2 9
2 10
3 1
3 2
3 3
Expected Output:
key pattern
1 1
2 2
3 1
Fiddle:
http://sqlfiddle.com/#!6/4fe39
Example table:
CREATE TABLE yourtable
([key] int, [value] int)
;
INSERT INTO yourtable
([key], [value])
VALUES
(1, 1),
(1, 2),
(1, 3),
(2, 8),
(2, 9),
(2, 10),
(3, 1),
(3, 2),
(3, 3)
;
You can concatenate the values together in several ways. The traditional method in SQL Server uses for xml:
select k.key,
stuff( (select ',' + cast(t.id as varchar(255))
from t
where k.key = t.key
for xml path ('')
order by t.id
), 1, 1, ''
) as ids
from (select distinct key from t) k;
You can convert this to a unique number using a CTE/subquery:
with cte as (
select k.key,
stuff( (select ',' + cast(t.id as varchar(255))
from t
where k.key = t.key
for xml path ('')
order by t.id
), 1, 1, ''
) as ids
from (select distinct key from t) k
)
select cte.*, dense_rank() over (order by ids) as ids_id
from cte;

SQL Coalesce with missing values

I have two tables, master and child. The master's primary key MM is an INT. The child table has a compound key of two columns and value column:
MM (INT)
POS (INT, values 1-32)
VV (INT, values 1-9)
Sample master table data:
(1, other data)
(2, other data)
(3, other data)
Sample child table data
(1, 1,2)
(1, 2,2)
(1, 4,1)
(1,15,1)
(2, 4,5)
(2, 5,3)
(2,31,7)
(3,3,1)
(4,18,2)
{4,19,5)
For a report I could like to de-normalize the data with an output like this:
(1,'22010000000000010000000000000000')
(2,'00053000000000000000000000000070')
(3,'00100000000000000000000000000000')
(4,'00000000000000000025000000000000')
I was thinking to use a select query with coalesce like this but the output is not not exactly what I want:
(1,'22110')
(2,'537')
(3,'1')
(4,'25')
How do I fill in the missing data with zeros?
One way I can think to do this uses a decimal value with a precision of 32 and sum() and then convert back to a zero-padded string:
select mm,
right(replicate('0', 32) + cast(sum(val) as varchar(32)), 32)
from (select c.*,
cast(cast(val as varchar(32)) + replicate('0', 32 - pos) as decimal(32, 0)) as val
from child c
) c
group by mm;
EDIT:
The above isn't generalizable (say, above 38 characters or to use letters as well as digits). Here is a more generalizable, but longer version:
select c.mm,
(max(case when pos = 1 then valc else '0' end) +
max(case when pos = 2 then valc else '0' end) +
max(case when pos = 3 then valc else '0' end) +
. . .
max(case when pos = 32 then valc else '0' end) +
)
from (select c.*, cast(val as varchar(255)) as valc
from child c
) c
group by c.mm;
I should note that if you want to handle a master with no children, then use a left join. That aspect of the problem seems less interesting than combining the values in the appropriate positions.
Try it like this
DECLARE #master TABLE(MM INT,OtherData VARCHAR(100));
INSERT INTO #master VALUES
(1, 'Other Data 1')
,(2, 'Other Data 2')
,(3, 'Other Data 3');
DECLARE #child TABLE(MM INT, POS INT, VV INT)
INSERT INTO #child VALUES
(1, 1,2)
,(1, 2,2)
,(1, 4,1)
,(1,15,1)
,(2, 4,5)
,(2, 5,3)
,(2,31,7)
,(3,3,1)
,(4,18,2)
,(4,19,5);
--One CTE to get 32 numbers
WITH Numbers(Nr) AS
(SELECT TOP 32 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM sys.objects) --get 32 numbers
--another CTE to get distinct MMs
,MMs AS
(
SELECT c.MM
,m.OtherData
FROM #child AS c
LEFT JOIN #master AS m ON c.MM=m.MM
GROUP BY c.MM,m.OtherData
)
--In this CTE The CROSS JOIN with the Numbers will create a list of 32 rows, which carry in all positions with a corresponding child its number. COALESCE will set a zero in the place of all NULLs
,Masked AS
(
SELECT MMs.MM
,MMs.OtherData
,Nr
,COALESCE(VV,0) AS Val
FROM MMs
CROSS JOIN Numbers
LEFT JOIN #child AS c1 ON c1.MM=MMs.MM AND c1.POS=Nr
)
-The final SELECT uses FOR XML PATH to get the 32 numbers in rows back to a string
SELECT *
,(
SELECT Masked.Val AS [*]
FROM Masked
WHERE Masked.MM=MMs.MM
FOR XML PATH('')
)
FROM MMs
The result
1 22010000000000100000000000000000
2 00053000000000000000000000000070
3 00100000000000000000000000000000
4 00000000000000000250000000000000

EXISTS and NOT EXISTS in a correlated subquery

I've been trying to work out how to do a particular query for a day or so now and it has gotten to the point where I need some outside help. Hence my question.
Given the following data;
DECLARE #Data AS TABLE
(
OrgId INT,
ThingId INT
)
DECLARE #ReplacementData AS TABLE
(
OldThingId INT,
NewThingId INT
)
INSERT INTO #Data (OrgId, ThingId)
VALUES (1, 2), (1, 3), (1, 4),
(2, 1), (2, 4),
(3, 3), (3, 4)
INSERT INTO #ReplacementData (OldThingId, NewThingId)
VALUES (3, 4), (2, 5)
I want to find any organisation that has a "thing" that has been replaced as denoted in the #ReplacementData table variable. I'd want to see the org id, the thing it is that they have that has been replaced and the id of the thing that should replace it. So for example given the data above, I should see;
Org id, Thing Id, Replacement Thing Id org doesn't have but should have
1, 2, 5 -- As Org 1 has 2, but not 5
I've had many attempts at trying to get this working, and I just can't seem to get my head around how to go about it. The following are a couple of my attempts, but I think I am just way off;
-- Attempt using correlated subqueries and EXISTS clauses
-- Show all orgs that have the old thing, but not the new thing
-- Ideally, limit results to OrgId, OldThingId and the NewThingId that they should now have too
SELECT *
FROM #Data d
WHERE EXISTS (SELECT *
FROM #Data oldstuff
WHERE oldstuff.OrgId = d.OrgId
AND oldstuff.ThingId IN
(SELECT OldThingID
FROM #ReplacementData))
AND NOT EXISTS (SELECT *
FROM #Data oldstuff
WHERE oldstuff.OrgId = d.OrgId
AND oldstuff.ThingId IN
(SELECT NewThingID
FROM #ReplacementData))
-- Attempt at using a JOIN to only include those old things that the org has (via the where clause)
-- Also try exists to show missing new things.
SELECT *
FROM #Data d
LEFT JOIN #ReplacementData rd ON rd.OldThingId = d.ThingId
WHERE NOT EXISTS (
SELECT *
FROM #Data dta
INNER JOIN #ReplacementData rep ON rep.NewThingId = dta.ThingId
WHERE dta.OrgId = d.OrgId
)
AND rd.OldThingId IS NOT NULL
Any help on this is much appreciated. I may well be going about it completely wrong, so please let me know if there is a better way of tackling this type of problem.
Try this out and let me know.
DECLARE #Data AS TABLE
(
OrgId INT,
ThingId INT
)
DECLARE #ReplacementData AS TABLE
(
OldThingId INT,
NewThingId INT
)
INSERT INTO #Data (OrgId, ThingId)
VALUES (1, 2), (1, 3), (1, 4),
(2, 1), (2, 4),
(3, 3), (3, 4)
INSERT INTO #ReplacementData (OldThingId, NewThingId)
VALUES (3, 4), (2, 5)
SELECT D.OrgId, RD.*
FROM #Data D
JOIN #ReplacementData RD
ON D.ThingId=RD.OldThingId
LEFT OUTER JOIN #Data EXCLUDE
ON D.OrgId = EXCLUDE.OrgId
AND RD.NewThingId = EXCLUDE.ThingId
WHERE EXCLUDE.OrgId IS NULL

Multiple parents tree (or digraph) implementation sql server 2005

I need to implement a multi-parented tree (or digraph) onto SQL Server 2005.
I've read several articles, but most of them uses single-parented trees with a unique root like the following one.
-My PC
-Drive C
-Documents and Settings
-Program Files
-Adobe
-Microsoft
-Folder X
-Drive D
-Folder Y
-Folder Z
In this one, everything derives from a root element (My PC).
In my case, a child could have more than 1 parent, like the following:
G A
\ /
B
/ \
X C
/ \
D E
\ /
F
So I have the following code:
create table #ObjectRelations
(
Id varchar(20),
NextId varchar(20)
)
insert into #ObjectRelations values ('G', 'B')
insert into #ObjectRelations values ('A', 'B')
insert into #ObjectRelations values ('B', 'C')
insert into #ObjectRelations values ('B', 'X')
insert into #ObjectRelations values ('C', 'E')
insert into #ObjectRelations values ('C', 'D')
insert into #ObjectRelations values ('E', 'F')
insert into #ObjectRelations values ('D', 'F')
declare #id varchar(20)
set #id = 'A';
WITH Objects (Id, NextId) AS
( -- This is the 'Anchor' or starting point of the recursive query
SELECT rel.Id,
rel.NextId
FROM #ObjectRelations rel
WHERE rel.Id = #id
UNION ALL -- This is the recursive portion of the query
SELECT rel.Id,
rel.NextId
FROM #ObjectRelations rel
INNER JOIN Objects -- Note the reference to CTE table name (Recursive Join)
ON rel.Id = Objects.NextId
)
SELECT o.*
FROM Objects o
drop table #ObjectRelations
Which returns the following SET:
Id NextId
-------------------- --------------------
A B
B C
B X
C E
C D
D F
E F
Expected result SET:
Id NextId
-------------------- --------------------
G B
A B
B C
B X
C E
C D
D F
E F
Note that the relation G->B is missing, because it asks for an starting object (which doesn't work for me also, because I don't know the root object from the start) and using A as the start point will ignore the G->B relationship.
So, this code doesn't work in my case because it asks for a starting object, which is obvious in a SINGLE-parent tree (will always be the root object). But in multi-parent tree, you could have more than 1 "root" object (like in the example, G and A are the "root" objects, where root is an object which doesn't have a parent (ancestor)).
So I'm kind of stucked in here... I need to modify the query to NOT ask for a starting object and recursively traverse the entire tree.
I don't know if that's possible with the (Id, NextId) implementation... may be I need to store it like a graph using some kind of Incidence matrix, adjacency matrix or whatever (see http://willets.org/sqlgraphs.html).
Any help? What do you think guys?
Thank you very much for your time =)
Cheers!
Sources:
Source 1
Source 2
Source 3
Well, I finally came up with the following solution.
It's the way I found to support multi-root trees and also cycling digraphs.
create table #ObjectRelations
(
Id varchar(20),
NextId varchar(20)
)
/* Cycle */
/*
insert into #ObjectRelations values ('A', 'B')
insert into #ObjectRelations values ('B', 'C')
insert into #ObjectRelations values ('C', 'A')
*/
/* Multi root */
insert into #ObjectRelations values ('G', 'B')
insert into #ObjectRelations values ('A', 'B')
insert into #ObjectRelations values ('B', 'C')
insert into #ObjectRelations values ('B', 'X')
insert into #ObjectRelations values ('C', 'E')
insert into #ObjectRelations values ('C', 'D')
insert into #ObjectRelations values ('E', 'F')
insert into #ObjectRelations values ('D', 'F')
declare #startIds table
(
Id varchar(20) primary key
)
;WITH
Ids (Id) AS
(
SELECT Id
FROM #ObjectRelations
),
NextIds (Id) AS
(
SELECT NextId
FROM #ObjectRelations
)
INSERT INTO #startIds
/* This select will not return anything since there are not objects without predecessor, because it's a cyclic of course */
SELECT DISTINCT
Ids.Id
FROM
Ids
LEFT JOIN
NextIds on Ids.Id = NextIds.Id
WHERE
NextIds.Id IS NULL
UNION
/* So let's just pick anyone. (the way I will be getting the starting object for a cyclic doesn't matter for the regarding problem)*/
SELECT TOP 1 Id FROM Ids
;WITH Objects (Id, NextId, [Level], Way) AS
( -- This is the 'Anchor' or starting point of the recursive query
SELECT rel.Id,
rel.NextId,
1,
CAST(rel.Id as VARCHAR(MAX))
FROM #ObjectRelations rel
WHERE rel.Id IN (SELECT Id FROM #startIds)
UNION ALL -- This is the recursive portion of the query
SELECT rel.Id,
rel.NextId,
[Level] + 1,
RecObjects.Way + ', ' + rel.Id
FROM #ObjectRelations rel
INNER JOIN Objects RecObjects -- Note the reference to CTE table name (Recursive Join)
ON rel.Id = RecObjects.NextId
WHERE RecObjects.Way NOT LIKE '%' + rel.Id + '%'
)
SELECT DISTINCT
Id,
NextId,
[Level]
FROM Objects
ORDER BY [Level]
drop table #ObjectRelations
Could be useful for somebody. It is for me =P
Thanks
If you want to use all root objects as starting objects, you should first update your data to include information about the root objects (and the leaves). You should add the following inserts:
insert into #ObjectRelations values (NULL, 'G')
insert into #ObjectRelations values (NULL, 'A')
insert into #ObjectRelations values ('X', NULL)
insert into #ObjectRelations values ('F', NULL)
Of course you could also write your anchor query in such a way that you select as root nodes the records that have an Id that does not occur as a NextId, but this is easier.
Next, modify your anchor query to look like this:
SELECT rel.Id,
rel.NextId
FROM #ObjectRelations rel
WHERE rel.Id IS NULL
If you run this query, you'll see that you get a lot of duplicates, a lot of arcs occur multiple times. This is because you now have two results from your anchor query and therefore the tree is traversed two times.
This can be fixed by changing your select statement to this (note the DISTINCT):
SELECT DISTINCT o.*
FROM Objects o
If you dont want to do the inserts suggested by Ronald,this would do!.
WITH CTE_MultiParent (ID, ParentID)
AS
(
SELECT ID, ParentID FROM #ObjectRelations
WHERE ID NOT IN
(
SELECT DISTINCT ParentID FROM #ObjectRelations
)
UNION ALL
SELECT ObjR.ID, ObjR.ParentID FROM #ObjectRelations ObjR INNER JOIN CTE_MultiParent
ON CTE_MultiParent.ParentID = ObjR.Id
)
SELECT DISTINCT * FROM CTE_MultiParent