Recursive SQL query to find all matching identifiers - sql

I have a table with following structure
CREATE TABLE Source
(
[ID1] INT,
[ID2] INT
);
INSERT INTO Source ([ID1], [ID2])
VALUES (1, 2), (2, 3), (4, 5),
(2, 5), (6, 7)
Example of Source and Result tables:
Source table basically stores which id is matching which another id. From the diagram it can be seen that 1, 2, 3, 4, 5 are identical. And 6, 7 are identical. I need a SQL query to get a Result table with all matches between ids.
I found this item on the site - Recursive query in SQL Server
similar to my task, but with a different result.
I tried to edit the code for my task, but it does not work. "The statement terminated. The maximum recursion 100 has been exhausted before statement completion."
;WITH CTE
AS
(
SELECT DISTINCT
M1.ID1,
M1.ID1 as ID2
FROM Source M1
LEFT JOIN Source M2
ON M1.ID1 = M2.ID2
WHERE M2.ID2 IS NULL
UNION ALL
SELECT
C.ID2,
M.ID1
FROM CTE C
JOIN Source M
ON C.ID1 = M.ID1
)
SELECT * FROM CTE ORDER BY ID1
Thanks a lot for the help!

This is a challenging question. You are trying to walk through a graph in two directions. There are two key ideas:
Add "reverse" edges, so the graph behaves like a digraph but with edges in both directions.
Keep a list of edges that have been visited. In SQL Server, strings are one method.
So:
with s as (
select id1, id2 from source
union -- on purpose
select id2, id1 from source
),
cte as (
select s.id1, s.id2, ',' + cast(s.id1 as varchar(max)) + ',' + cast(s.id2 as varchar(max)) + ',' as ids
from s
union all
select cte.id1, s.id2, ids + cast(s.id2 as varchar(max)) + ','
from cte join
s
on cte.id2 = s.id1
where cte.ids not like '%,' + cast(s.id2 as varchar(max)) + ',%'
)
select *
from cte
order by 1, 2;
Here is a db<>fiddle.

Since all node connections are bidirectional - add reversed relations to the original list
Find all possible paths from each node; almost usual recursion, the only difference is - we need to keep root id1
Avoid cycles - we need to be aware of it because we don't have directions
source:
;with src as(
select id1, id2 from source
union
-- reversed connections
select id2, id1 from source
), rec as (
select id1, id2, CAST(CONCAT('/', src.id1, '/', src.id2, '/') as varchar(8000)) path
from src
union all
-- keep the root id1 from the start of each path
select rec.id1, src.id2, CAST(CONCAT(rec.path, src.id2, '/') as varchar(8000))
from rec
-- usual recursion
inner join src on src.id1 = rec.id2
-- avoid cycles
where rec.path not like CONCAT('%/', src.id2, '/%')
)
select id1, id2, path
from rec
order by 1, 2
output
| id1 | id2 | path |
|-----|-----|-----------|
| 1 | 2 | /1/2/ |
| 1 | 3 | /1/2/3/ |
| 1 | 4 | /1/2/5/4/ |
| 1 | 5 | /1/2/5/ |
| 2 | 1 | /2/1/ |
| 2 | 3 | /2/3/ |
| 2 | 4 | /2/5/4/ |
| 2 | 5 | /2/5/ |
| 3 | 1 | /3/2/1/ |
| 3 | 2 | /3/2/ |
| 3 | 4 | /3/2/5/4/ |
| 3 | 5 | /3/2/5/ |
| 4 | 1 | /4/5/2/1/ |
| 4 | 2 | /4/5/2/ |
| 4 | 3 | /4/5/2/3/ |
| 4 | 5 | /4/5/ |
| 5 | 1 | /5/2/1/ |
| 5 | 2 | /5/2/ |
| 5 | 3 | /5/2/3/ |
| 5 | 4 | /5/4/ |
| 6 | 7 | /6/7/ |
| 7 | 6 | /7/6/ |
http://sqlfiddle.com/#!18/76114/13
source table will contain about 100,000 records
There is nothing that can help you with this. The task is unpleasant - finding all possible connections. Almost CROSS JOIN. With even more connections in the end.

Looks like I came up with a similar answer as the other posters. My approach was to insert the existing value pairs, and then insert the reverse of each pair.
Once you expand the list of value pairs, you can transverse the table to find all the pairs.
CREATE TABLE #Source
([ID1] int, [ID2] int);
INSERT INTO #Source
(
[ID1]
,[ID2]
)
VALUES
(1, 2)
,(2, 3)
,(4, 5)
,(2, 5)
,(6, 7)
INSERT INTO #Source
(
[ID1]
,[ID2]
)
SELECT
[ID2]
,[ID1]
FROM #Source
;WITH expanded AS
(
SELECT DISTINCT
ID1 = s1.ID1
,ID2 = s1.ID2
FROM #Source s1
LEFT JOIN #Source s2 ON s1.ID2 = s2.ID1
UNION
SELECT DISTINCT
ID1 = s1.ID1
,ID2 = s2.ID2
FROM #Source s1
LEFT JOIN #Source s2 ON s1.ID2 = s2.ID1
WHERE s1.ID1 <> s2.ID2
)
,recur AS
(
SELECT DISTINCT
e1.ID1
,e1.ID2
FROM expanded e1
LEFT JOIN expanded e2 ON e1.ID2 = e2.ID1
WHERE e1.ID1 <> e1.ID2
UNION ALL
SELECT DISTINCT
e1.ID1
,e2.ID2
FROM expanded e1
INNER JOIN expanded e2 ON e1.ID2 = e2.ID1
WHERE e1.ID1 <> e2.ID2
)
SELECT DISTINCT
ID1, ID2
FROM recur
ORDER BY ID1, ID2
DROP TABLE #Source

This is a way to get that output by brute force, but may not be the best solution with a different/larger data set:
select sub1.rnk as ID1
,sub2.rnk as ID2
from
(
select a.*
,rank() over (partition by 1 order by id1, id2) as RNK
from source a
) sub1
cross join
(
select a.*
,rank() over (partition by 1 order by id1, id2) as RNK
from source a
) sub2
where sub1.rnk <> sub2.rnk
union all
select id1 as ID1
,id2 as ID2
from source
where id1 = 6
union all
select id2 as ID1
,id1 as ID2
from source
where id1 = 6;

Related

How can i find all linked rows by array values in postgres?

i have a table like this:
id | arr_val | grp
-----------------
1 | {10,20} | -
2 | {20,30} | -
3 | {50,5} | -
4 | {30,60} | -
5 | {1,5} | -
6 | {7,6} | -
I want to find out which rows are in a group together.
In this example 1,2,4 are one group because 1 and 2 have a common element and 2 and 4. 3 and 5 form a group because they have a common element. 6 Has no common elments with anybody else. So it forms a group for itself.
The result should look like this:
id | arr_val | grp
-----------------
1 | {10,20} | 1
2 | {20,30} | 1
3 | {50,5} | 2
4 | {30,60} | 1
5 | {1,5} | 2
6 | {7,6} | 3
I think i need recursive cte because my problem is graphlike but i am not sure how to that.
Additional info and background:
The Table has ~2500000 rows.
In reality the problem i try to solve has more fields and conditions for finding a group:
id | arr_val | date | val | grp
---------------------------------
1 | {10,20} | -
2 | {20,30} | -
Not only do the element of a group need to be linked by common elements in arr_val. They all need to have the same value in val and need to be linked by a timespan in date (gaps and islands). I solved the other two but now the condition of my question was added. If there is an easy way to do all three together in one query that would be awesome but it is not necessary.
----Edit-----
While both answer work for the example of five rows they do not work for a table with a lot more rows. Both answers have the problem that the number of rows in the recursive part explodes and only reduce them at the end.
A solutiuon should work for data like this too:
id | arr_val | grp
-----------------
1 | {1} | -
2 | {1} | -
3 | {1} | -
4 | {1} | -
5 | {1} | -
6 | {1} | -
7 | {1} | -
8 | {1} | -
9 | {1} | -
10 | {1} | -
11 | {1} | -
more rows........
Is there a solution to that problem?
You can handle this as a recursive CTE. Define the edges between the ids based on common values. Then traverse the edges and aggregate:
with recursive nodes as (
select id, val
from t cross join
unnest(arr_val) as val
),
edges as (
select distinct n1.id as id1, n2.id as id2
from nodes n1 join
nodes n2
on n1.val = n2.val
),
cte as (
select id1, id2, array[id1] as visited, 1 as lev
from edges
where id1 = id2
union all
select cte.id1, e.id2, visited || e.id2,
lev + 1
from cte join
edges e
on cte.id2 = e.id1
where e.id2 <> all(cte.visited)
),
vals as (
select id1, array_agg(distinct id2 order by id2) as id2s
from cte
group by id1
)
select *, dense_rank() over (order by id2s) as grp
from vals;
Here is a db<>fiddle.
Here is an approach at this graph-walking problem:
with recursive cte as (
select id, arr_val, array[id] path from mytable
union all
select t.id, t.arr_val, c.path || t.id
from cte c
inner join mytable t on t.arr_val && c.arr_val and not t.id = any(c.path)
)
select c.id, c.arr_val, dense_rank() over(order by min(x.id)) grp
from cte c
cross join lateral unnest(c.path) as x(id)
group by c.id, c.arr_val
order by c.id
The common-table-expression walks the graph, recursively looking for "adjacent" nodes to the current node, while keeping track of already-visited nodes. Then the outer query aggregates, identifies groups using the least node per path, and finally ranks the groups.
Demo on DB Fiddle:
id | arr_val | grp
-: | :------ | --:
1 | {10,20} | 1
2 | {20,30} | 1
3 | {50,5} | 2
4 | {30,60} | 1
5 | {1,5} | 2
6 | {7,6} | 3
While Gordon Linoffs solution is the fastest i found for a small amount of data where the groups are not so big it will not work for bigger datasets and bigger groups.
I changed his solution to make it work.
I moved the edges to an indexed table:
create table edges
(
id1 integer not null,
id2 integer not null,
constraint staffel_group_nodes_pk
primary key (id1, id2)
);
insert into edges(id1, id2) with
nodes(id, arr_val) as (
select id, arr_val
from my_table
)
select n1.id as id1, n2.id as id2
from nodes n1
join
nodes n2
on n1.arr_val && n2.arr_val ;
Tha alone didnt help. I changed his recursive part too:
with recursive
cte as (
select id1, array [id1] as visited
from edges
where id1 = id2
union all
select unnested.id1, array_agg(distinct unnested.vis) as visited
from (
select cte.id1,
unnest(cte.visited || e.id2) as vis
from cte
join
staffel_group_edges e
on e.id1 = any (cte.visited)
and e.id2 <> all (cte.visited)) as unnested
group by unnested.id1
),
vals as (
select id1, array_agg(distinct vis) as id2s
from (
select cte.id1,
unnest(cte.visited) as vis
from cte) as unnested
group by unnested.id1
)
select id1,id2s, dense_rank() over (order by id2s) as grp
from vals;
Every step i group all searches by their starting point. This reduces the amount of parallel walked paths a lot and works surprisingly fast.

SQL Server recursive query to show path of parents

I am working with SQL Server statements and have one table like:
| item | value | parentItem |
+------+-------+------------+
| 1 | 2test | 2 |
| 2 | 3test | 3 |
| 3 | 4test | 4 |
| 5 | 1test | 1 |
| 6 | 3test | 3 |
| 7 | 2test | 2 |
And I would like to get the below result using a SQL Server statement:
| item1 | value1 |
+-------+--------------------------+
| 1 | /4test/3test/2test |
| 2 | /4test/3test |
| 3 | /4test |
| 5 | /4test/3test/2test/1test |
| 6 | /4test/3test |
| 7 | /4test/3test/2test |
I didn't figure out the correct SQL to get all the values for all the ids according to parentItem.
I have tried this SQL :
with all_path as
(
select item, value, parentItem
from table
union all
select a.item, a.value, a.parentItem
from table a, all_path b
where a.item = b.parentItem
)
select
item as item1,
stuff(select '/' + value
from all_path
order by item asc
for xml path ('')), 1, 0, '') as value1
from
all_path
But got the "value1" column in result like
/4test/4test/4test/3test/3test/3test/3test/2test/2test/2test/2test
Could you please help me with that? Thanks a lot.
based on the expected output you gave, use the recursive part to concatenate the value
;with yourTable as (
select item, value, parentItem
from (values
(1,'2test',2)
,(2,'3test',3)
,(3,'4test',4)
,(5,'1test',1)
,(6,'3test',3)
,(7,'2test',2)
)x (item,value,parentItem)
)
, DoRecursivePart as (
select 1 as Pos, item, convert(varchar(max),value) value, parentItem
from yourTable
union all
select drp.pos +1, drp.item, convert(varchar(max), yt.value + '/' + drp.value), yt.parentItem
from yourTable yt
inner join DoRecursivePart drp on drp.parentItem = yt.item
)
select drp.item, '/' + drp.value
from DoRecursivePart drp
inner join (select item, max(pos) mpos
from DoRecursivePart
group by item) [filter] on [filter].item = drp.item and [filter].mpos = drp.Pos
order by item
gives
item value
----------- ------------------
1 /4test/3test/2test
2 /4test/3test
3 /4test
5 /4test/3test/2test/1test
6 /4test/3test
7 /4test/3test/2test
Here's the sample data
drop table if exists dbo.test_table;
go
create table dbo.test_table(
item int not null,
[value] varchar(100) not null,
parentItem int not null);
insert dbo.test_table values
(1,'test1',2),
(2,'test2',3),
(3,'test3',4),
(5,'test4',1),
(6,'test5',3),
(7,'test6',2);
Here's the query
;with recur_cte(item, [value], parentItem, h_level) as (
select item, [value], parentItem, 1
from dbo.test_table tt
union all
select rc.item, tt.[value], tt.parentItem, rc.h_level+1
from dbo.test_table tt join recur_cte rc on tt.item=rc.parentItem)
select rc.item,
stuff((select '/' + cast(parentItem as varchar)
from recur_cte c2
where rc.item = c2.item
order by h_level desc FOR XML PATH('')), 1, 1, '') [value1]
from recur_cte rc
group by item;
Here's the results
item value1
1 4/3/2
2 4/3
3 4
5 4/3/2/1
6 4/3
7 4/3/2

SQL select all rows in a single row's "history"

I have a table that looks like this:
ID | PARENT_ID
--------------
0 | NULL
1 | 0
2 | NULL
3 | 1
4 | 2
5 | 4
6 | 3
Being an SQL noob, I'm not sure if I can accomplish what I would like in a single command.
What I would like is to start at row 6, and recursively follow the "history", using the PARENT_ID column to reference the ID column.
The result (in my mind) should look something like:
6|3
3|1
1|0
0|NULL
I already tried something like this:
SELECT T1.ID
FROM Table T1, Table T2
WHERE T1.ID = 6
OR T1.PARENT_ID = T2.PARENT_ID;
but that just gave me a strange result.
With a recursive cte.
If you want to start from the maximum id:
with recursive cte (id, parent_id) as (
select t.*
from (
select *
from tablename
order by id desc
limit 1
) t
union all
select t.*
from tablename t inner join cte c
on t.id = c.parent_id
)
select * from cte
See the demo.
If you want to start specifically from id = 6:
with recursive cte (id, parent_id) as (
select *
from tablename
where id = 6
union all
select t.*
from tablename t inner join cte c
on t.id = c.parent_id
)
select * from cte;
See the demo.
Results:
| id | parent_id |
| --- | --------- |
| 6 | 3 |
| 3 | 1 |
| 1 | 0 |
| 0 | |

SQL - Combine data from several columns into one column

I am creating complicated CTE Query. In MSSQL
Which result will be something like that
| Id1 | Id2 | Id3 |
| 1 | 2 | 3 |
| 5 | 4 | 1 |
| 6 | 5 | 2 |
And now I need to combine all data into on column something like that
| Ids |
| 1 |
| 2 |
| 3 |
| 5 |
| 4 |
| 1 |
| 6 |
| 5 |
| 2 |
I want to try avoid union all and select by each column
Thanks
My favorite way of doing this uses cross apply:
select v.id
from t cross apply
(values (t.id1), (t.id2), (t.id3)) v(id);
Like the version using unpivot this only reads the table once. A version using union all would scan the table three times. However, cross apply is much more powerful than unpivot and requires less typing.
AFAIK, there is no different options other than usuing UNION operation. Basic purpose of UNION operation is that only ... combining records from multiple sources/result sets. So you can do like
select Id1 from tbl1
union
select Id3 from tbl1
union
select Id2 from tbl1
You could use UNPIVOT
SELECT Ids
FROM
(
SELECT Id1, Id2, Id3
FROM CTE
) d
UNPIVOT
(
Ids for id in (Id1, Id2, Id3)
) u
Use UNPIVOT table to get your result :
CREATE TABLE #table( Id1 INT ,Id2 INT , Id3 INT )
INSERT INTO #table( Id1 ,Id2 , Id3 )
SELECT 1 , 2 , 3 UNION ALL
SELECT 5 , 4 , 1 UNION ALL
SELECT 6 , 5 , 2
SELECT _Result [Result]
FROM
(
SELECT Id1 ,Id2 , Id3
FROM #table
)A
UNPIVOT
(
_Result FOR Id IN (Id1 , Id2 , Id3)
) UNPvt

Get hierarchical structure using SQL Server

I have a self-referencing table with a primary key, id and a foreign key parent_id.
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PK | NULL | IDENTITY |
| parent_id | int(11) | YES | | NULL | |
| name | varchar(255) | YES | | NULL | |
+------------+--------------+------+-----+---------+----------------+
I have got a table as following (reduce data for more clear)
Table MySiteMap
Id Name parent_id
1 A NULL
2 B 1
3 C 1
4 D 1
20 B1 2
21 B2 2
30 C1 3
31 C2 3
40 D1 4
41 D2 4
I would like get the hierarchical structure using SQL Server query:
A
|
B
|
| B1
| B2
C
|
| C1
| C2
D
|
| D1
| D2
Any suggestions?
You can use Common Table Expressions.
WITH LeveledSiteMap(Id, Name, Level)
AS
(
SELECT Id, Name, 1 AS Level
FROM MySiteMap
WHERE Parent_Id IS NULL
UNION ALL
SELECT m.Id, m.Name, l.Level + 1
FROM MySiteMap AS m
INNER JOIN LeveledSiteMap AS l
ON m.Parent_Id = l.Id
)
SELECT *
FROM LeveledSiteMap
Use this:
;WITH CTE(Id, Name, parent_id, [Level], ord) AS (
SELECT
MySiteMap.Id,
CONVERT(nvarchar(255), MySiteMap.Name) AS Name,
MySiteMap.parent_id,
1,
CONVERT(nvarchar(255), MySiteMap.Id) AS ord
FROM MySiteMap
WHERE MySiteMap.parent_id IS NULL
UNION ALL
SELECT
MySiteMap.Id,
CONVERT(nvarchar(255), REPLICATE(' ', [Level]) + '|' + REPLICATE(' ', [Level]) + MySiteMap.Name) AS Name,
MySiteMap.parent_id,
CTE.[Level] + 1,
CONVERT(nvarchar(255),CTE.ord + CONVERT(nvarchar(255), MySiteMap.Id)) AS ord
FROM MySiteMap
JOIN CTE ON MySiteMap.parent_id =CTE.Id
WHERE MySiteMap.parent_id IS NOT NULL
)
SELECT Name
FROM CTE
ORDER BY ord
For this:
A
| B
| B1
| B2
| C
| C1
| C2
| D
| D1
| D2
I started with a query, (but when I check it now it is similar to Mark.)
I will add it anyway, while I created also a sqlfiddle with mine and Mark query.
WITH tList (id,name,parent_id,nameLevel)
AS
(
SELECT t.id, t.name, t.parent_id, 1 AS nameLevel
FROM t as t
WHERE t.parent_id IS NULL
UNION ALL
SELECT tnext.id, tnext.name, tnext.parent_id, tList.nameLevel + 1
FROM t AS tnext
INNER JOIN tList AS tlist
ON tnext.parent_id = tlist.id
)
SELECT id,name,isnull(parent_id,0) 'parent_id',nameLevel FROM tList order by nameLevel;
A good blog:
SQL Query – How to get data in Hierarchical Structure?
i know changing the structure of a table is always a critical operation but since sql server 2008 introduced the HierarchyId Datatype i really like workig with it. Maybe have a look at:
http://www.codeproject.com/Articles/37171/HierarchyID-Data-Type-in-SQL-Server
http://www.codeproject.com/Tips/740553/Hierarchy-ID-in-SQL-Server
I am sure you will understand quickly how to use this datatype and his functions. The SQL Code using this datatype is more structured and has better performance than CTE's.