SQL query for topological sort - sql

I have a directed acyclic graph:
DROP TABLE IF EXISTS #Edges
CREATE TABLE #Edges(from_node int, to_node int);
INSERT INTO #Edges VALUES (1,2),(1,3),(1,4),(5,1);
I want to list all nodes, always listing a to node before its from node.
For example: 2, 3, 4, 1, 5.
It is also called a topological ordering. How can it be done in SQL ?

You can use a recursive CTE to calculate the depth. Then order by the depth:
with cte as (
select e.from_node, e.to_node, 1 as lev
from edges e
where not exists (select 1 from edges e2 where e2.to_node = e.from_node)
union all
select e.from_node, e.to_node, lev + 1
from cte join
edges e
on e.from_node = cte.to_node
)
select *
from cte
order by lev desc;
EDIT:
I notice that you do not have "1" in your edges list. To handle this:
with cte as (
select 1 as from_node, e.from_node as to_node, 1 as lev
from edges e
where not exists (select 1 from edges e2 where e2.to_node = e.from_node)
union all
select e.from_node, e.to_node, lev + 1
from cte join
edges e
on e.from_node = cte.to_node
-- where lev < 5
)
select *
from cte
order by lev desc;
Here is a db<>fiddle.

DROP TABLE IF EXISTS #topological_sorted
CREATE TABLE #topological_sorted(id int identity(1,1) primary key, n int);
WITH rcte(n) AS (
SELECT e1.to_node
FROM #Edges AS e1
LEFT JOIN #Edges AS e2 ON e1.to_node = e2.from_node
WHERE e2.from_node IS NULL
UNION ALL
SELECT e.from_node
FROM #Edges AS e
JOIN rcte ON e.to_node = rcte.n
)
INSERT INTO #topological_sorted(n)
SELECT *
FROM rcte;
SELECT * FROM #topological_sorted
nodes might be listed several times. We only want to keep the fist occurence:
DROP TABLE IF EXISTS #topological_sorted_2
SELECT *, MIN(id) OVER (PARTITION BY n) AS idm
INTO #topological_sorted_2
FROM #topological_sorted
ORDER BY id;
SELECT * FROM #topological_sorted_2
WHERE id=idm
ORDER BY id;

I found this question running into similar problem. Since both #Ludovic's and #Gordon's answers didn't fully answered all my particular questions and there's not enough room in comments I decided to summarize my own answer.
Recursive query solution is good
Basically, #Gordon's answer is based on graph traversal of all paths. The cte.lev column actually represents length of path from some start node to cte.to_node.
What about loops?
What's not clear is what to return when multiple paths into particular node are possible (i.e. if the undirected version of DAG had loops). For example in following graph
1
^
|\
2 \
^ /
|/
3
the node 1 is reachable from initial node 3 at distance 1 directly and at distance 2 via node 2. Hence the node 1 is expanded twice with different value of path length.
Let v be the greatest value of the two. The v can generally be defined as length of the longest path from start node to given node. This value corresponds with topological ordering. It essentially splits nodes into chunks so that for any two nodes n1, n2 with values v1, v2 respectively, the node n1 is before n2 when v1<v2 and ordering of n1,n2 is arbitrary when v1=v2. (I have no exact proof but by contradiction if this ordering wouldn't hold there would have to be counter-directed edge or edge within chunk so the value v wouldn't be the length of the longest path.)
Hence the SQL is (original example fiddle, my looped example fiddle)
with cte as (
select 0 as from_node, e.from_node as to_node, 1 as lev
from edges e
where not exists (select 1 from edges e2 where e2.to_node = e.from_node)
union all
select e.from_node, e.to_node, lev + 1
from cte join
edges e
on e.from_node = cte.to_node
)
select to_node, max(lev)
from cte
group by to_node
order by max(lev)
(which is close to #Ludovic's answer but Ludovic relies on ordering by id which IMHO cannot guarantee the proper ordering in general case.
Optimization
The recursive CTE now generates rows with to_node and length of the path to it. If some node was reached by multiple paths of same length, each of that paths expands to new rows at another level of recursion, which generates duplicate rows and for some graphs it can lead to combinatorial explosion. For example in following graph (let the edges be directed from left to right)
B E
/ \ / \
A D G
\ / \ /
C F
the D node is reached from A via two paths but algorithm does not take it into consideration hence E has two paths as well as F, G has even four paths.
For SQL-based solution in ideal world, adding distinct would suffice, which would eliminate duplicate expansion of D-E and D-F edges:
select distinct 0 as from_node, e.from_node as to_node, 1 as lev
from edges e
where not exists (select 1 from edges e2 where e2.to_node = e.from_node)
union all
select distinct e.from_node, e.to_node, lev + 1
from cte join
edges e
on e.from_node = cte.to_node
Unfortunately this doesn't work because of DISTINCT operator is not allowed in the recursive part of a recursive common table expression 'cte'. error in SQLServer. (I actually work with Oracle where the result is analogous - ORA-32486 unsupported operation in recursive branch of recursive WITH clause.) Similarly neither the group by nor some query nesting tricks can be used.
In this point I gave up with SQLServer but for Oracle there exists one more solution based on window functions. In the recursive part of query it is possible to define bunch of duplicate rows as a partition, number rows within that partition and choose only one of potentially many duplicates.
with edges (from_node,to_node) as (
select 'A','B' from dual union all
select 'A','C' from dual union all
select 'B','D' from dual union all
select 'C','D' from dual union all
select 'D','E' from dual union all
select 'D','F' from dual union all
select 'E','G' from dual union all
select 'F','G' from dual
)
, cte (from_node, to_node, lev, dup) as (
select distinct null as from_node, e.from_node as to_node, 0 as lev, 1 as dup
from edges e
where not exists (select 1 from edges e2 where e2.to_node = e.from_node)
union all
select e.from_node, e.to_node, cte.lev + 1
, row_number() over (partition by e.to_node, cte.lev order by null) as dup
from cte
join edges e on e.from_node = cte.to_node
where cte.dup = 1
)
select to_node, lev from cte where dup = 1 order by lev
The drawback is that the row_number of current level of recursion cannot be filtered in where condition. Hence we must stand that duplicate rows pass and expand into next level of recursion where they are finally pruned. However this heuristics is still useful - I was querying the Oracle dba_dependencies table and the query didn't terminate at all without it.
I didn't found the way to make this small trick work in SQLServer since SQLServer handles window function in recursive queries differently. Sorry for messing question with Oracle issues but I consider this topic interesting for anyone who finds this question.

I needed a topological sort for a SQLite application and the following works for SQLite 3.37.0, using #Tomáš's code and data. In SQLite, DISTINCT works within a recursive CTE. I have added an additional dependency between his nodes 'C' and 'F' to make things a little more interesting, but it works the same without this edge.
I need to determine the order of processing entities in a dependency management system, similar to #Ludovic's need, so I changed the sorting order to DESC so the first item returned is the first item to process.
DROP TABLE IF EXISTS edges;
CREATE TABLE edges(from_node int, to_node int);
INSERT INTO edges VALUES ('A','B'),('A','C'),('B','D'),('C','D')
, ('D','E'),('D','F'),('E','G'),('F','G')
, ('C','F');
with recursive cte as (
select distinct 0 as from_node, e.from_node as to_node, 1 as lev
from edges e
where not exists (select 1 from edges e2 where e2.to_node = e.from_node)
union all
select e.from_node, e.to_node, lev + 1
from cte join
edges e
on e.from_node = cte.to_node
)
select to_node, max(lev) from cte group by to_node order by max(lev) desc
;
Result:
to_node max(lev)
------- --------
G 5
F 4
E 4
D 3
C 2
B 2
A 1

Related

Count Similar Substrings SQL query

I've tried a few scenarios and googled a lot, but still can't find a solution.
I have a table of user names with entries something like the below:
UserName
Cakes420
18Jack01
18Jack04
16Jack22
22Jack16
Mapple7609
Chrom44
chrom22
chrom77
013Cake
016Cake
122Cake
123Cake87
So I need a query that checks for all records that share 4 or more (in sequence) characters in the table.
So I need to return something like :
Characters
Times Used
Names Sharing
Cake
5
Cakes420, 013Cake, 016Cake, 122Cake, 123Cake87
Chro
3
Chrom44, chrom22, chrom77
or anything similar as I'd prefer not to repeat patterns, but hey, at this stage if it returns the values properly, I don't mind.
The shared characters can naturally appear in any place in the string, which is what makes this so difficult.
Should you do this in T-SQL? Probably not.
Can you do this in T-SQL? Yes.
Sample data
create table Names
(
Name nvarchar(20)
);
insert into Names (Name) values
('Cakes420'),
('18Jack01'),
('18Jack04'),
('16Jack22'),
('22Jack16'),
('Mapple7609'),
('Chrom44'),
('chrom22'),
('chrom77'),
('013Cake'),
('016Cake'),
('122Cake'),
('123Cake87');
Solution
Using STRING_AGG() for easy concatenation. Available from SQL Server 2017. Alternatives available for older SQL versions (use the search box on this site, there are many examples).
with rcte as
(
select n.Name,
convert(nvarchar(4), substring(n.Name, 1, 4)) as Part,
1 as PartFrom
from Names n
where len(n.Name) >= 4
union all
select r.Name,
convert(nvarchar(4), substring(r.Name, r.PartFrom+1, r.PartFrom+4)),
r.PartFrom+1
from rcte r
where len(r.Name) >= r.PartFrom+4
),
cte_count as
(
select r.Part,
count(1) as PartCount
from rcte r
where r.Part not like '%[0-9]%' -- exclude parts with numbers in them
group by r.Part
having count(1) > 1
)
select c.Part,
c.PartCount,
string_agg(r.Name, ', ') as Names
from cte_count c
join rcte r
on r.Part = c.Part
group by c.Part,
c.PartCount
order by c.Part;
Result
Part PartCount Names
---- --------- ----------------------------------------------
Cake 5 Cakes420, 123Cake87, 122Cake, 016Cake, 013Cake
Chro 3 Chrom44, chrom22, chrom77
hrom 3 chrom77, chrom22, Chrom44
Jack 4 22Jack16, 16Jack22, 18Jack04, 18Jack01
Fiddle to see it in action with the intermediate CTE results.
Let's use Itzik Ben-Gan's Tally Function to break out a list of substrings, then group them. This is called N-Gram, after the more common Trigram which is 3-character substrings.
I've removed one extra cross-join from the function to speed it up slightly, it's now good for up to varchar(65536):
CREATE OR ALTER FUNCTION dbo.GetNums(#num AS BIGINT)
RETURNS TABLE
AS
RETURN
WITH
L0 AS ( SELECT 1 AS c
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c) ),
L1 AS ( SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B ),
L2 AS ( SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B ),
Nums AS ( SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
FROM L2 )
SELECT TOP(#num)
rownum AS rn
FROM Nums
ORDER BY rownum;
GO
DECLARE #substringLen int = 4;
SELECT
Characters,
[Times Used] = COUNT(*),
[Names Sharing] = STRING_AGG(Username, ', ')
FROM (
SELECT DISTINCT
-- remove DISTINCT if you want to know about multiple in a single username
t.Username,
Characters = SUBSTRING(t.Username, n.rn, #substringLen)
FROM myTable t
CROSS APPLY dbo.GetNums (LEN(t.UserName) - #substringLen + 1) n
) t
GROUP BY t.Characters
HAVING COUNT(*) > 1

Why does dbms_random.value return the same value in graph queries (connect by)?

On Oracle 11.2.0.4.0, when I run the following query then each row gets a different result:
select r.n from (
select trunc(dbms_random.value(1, 100)) n from dual
) r
connect by level < 100; -- returns random values
But as soon as I use the obtained random value in a join or subquery then each row gets the same value from dbms_random.value:
select r.n, (select r.n from dual) from (
select trunc(dbms_random.value(1, 100)) n from dual
) r
connect by level < 100; -- returns the same value each time
Is it possible to make the second query return random values for each row?
UPDATE
My example was maybe over-simplified, here's what I am trying to do:
with reservations(val) as (
select 1 from dual union all
select 3 from dual union all
select 4 from dual union all
select 5 from dual union all
select 8 from dual
)
select * from (
select rnd.val, CONNECT_BY_ISLEAF leaf from (
select trunc(dbms_random.value(1, 10)) val from dual
) rnd
left outer join reservations res on res.val = rnd.val
connect by res.val is not null
)
where leaf = 1;
But with reservations which can go from 1 to 1.000.000.000 (and more).
Sometimes that query returns correctly (if it immediately picked a random value for which there was no reservation) or give an out of memory error because it always tries with the same value of dbms_random.value.
Your comment "...and I want to avoid concurrency problems" made me think.
Why don't you just try to insert a random number, watch out for duplicate violations, and retry until successful? Even a very clever solution that looks up available numbers might come up with identical new numbers in two separate sessions. So, only an inserted and committed reservation number is safe.
You can move the connect-by clause inside the subquery:
select r.n, (select r.n from dual) from (
select trunc(dbms_random.value(1, 100)) n from dual
connect by level < 100
) r;
N (SELECTR.NFROMDUAL)
---------- -------------------
90 90
69 69
15 15
53 53
8 8
3 3
...
what I try to do is generate a sequence of random numbers and find the first one for which I don't have a record in some table
You could potentially do something like:
select r.n
from (
select trunc(dbms_random.value(1, 100)) n from dual
connect by level < 100
) r
where not exists (
select id from your_table where id = r.n
)
and rownum = 1;
but it will generate all 100 random values before checking any of them, which is a bit wasteful; and as you might not find a gap in those 100 (and there may be duplicates within those hundred) you either need a much larger range which is also expensive, though doesn't need to be so many random calls:
select min(r.n) over (order by dbms_random.value) as n
from (
select level n from dual
connect by level < 100 -- or entire range of possible values
) r
where not exists (
select id from your_table where id = r.n
)
and rownum = 1;
Or repeat a single check until a match is found.
Another approach is to have a look-up table of all possible IDs with a column indicating if they are used or free, maybe with a bitmap index; and then use that to find the first (or any random) free value. But then you have to maintain that table too, and update atomically as you use and release the IDs in your main table, which means making things more complicated and serialising access - though you probably can't avoid that anyway really if you don't want to use a sequence. You could probably use a materialised view to simplify things.
And if you have a relatively small number of gaps (and you really want to reuse those) then you could possibly only search for a gap within the assigned range and then fall back to a sequencer if there are no gaps. Say you only have values in the range 1 to 1000 currently used, with a few missing; you could look for a free value in that 1-100 range, and if there are none then use a sequence to get 1001 instead, rather than always including your entire possible range of values in your gap search. That would also fill in gaps in preference to extending the used range, which may or may not be useful. (I'm not sure if "I don't need those numbers to be consecutive" means they should not be consecutive, or that it doesn't matter).
Unless you particularly have a business need to fill in the gaps and for the assigned values to not be consecutive, though, I'd just use a sequence and ignore the gaps.
I managed to obtain a correct result with the following query but I am not sure if this approach is really advisable:
with
reservations(val) as (
select 1 from dual union all
select 3 from dual union all
select 4 from dual union all
select 5 from dual union all
select 8 from dual
),
rand(v) as (
select trunc(dbms_random.value(1, 10)) from dual
),
next_res(v, ok) as (
select v, case when exists (select 1 from reservations r where r.val = rand.v) then 0 else 1 end from rand
),
recursive(i, v, ok) AS (
select 0, 0, 0 from dual
union all
select i + 1, next_res.v, next_res.ok from recursive, next_res where i < 100 /*maxtries*/ and recursive.ok = 0
)
select v from recursive where ok = 1;

Teradata SQL Reverse Parent Child Hierarchy

I know how to build a hierarchy starting with the root node (i.e. where parent_id is null or something like that), but I can't find anything on how to build a hierarchy upward from the final child/edge node. I'd like to start with a child and build all the way back up to the top. Assume I don't know how many levels, or who the parent is, and we'll have to use SQL to figure it out.
Here is my base table:
old_entity_key,new_entity_key
1,2
2,3
3,4
4,5
5,6
Desired output:
new_entity_key,path
2,1/2
3,1/2/3
4,1/2/3/4
5,1/2/3/4/5
6,1/2/3/4/5/6
This is also acceptable:
new_entity_key,path
2,2/1
3,3/2/1
4,4/3/2/1
5,5/4/3/2/1
6,6/5/4/3/2/1
Here is the CTE I've started with:
with recursive history as (
select
old_entity_key,
new_entity_key,
cast(old_entity_key||'/'||new_entity_key as varchar(1000)) as path
from table
where new_entity_key not in (select old_entity_key from table)
and cast(start_time as date) between current_date - interval '3' day and current_date
union all
select
c.old_entity_key,
c.new_entity_key,
p.new_entity_key||'/'||c.path
from history c
join table p on p.new_entity_key = c.old_entity_key
)
select new_entity_key, old_entity_key, substr(path, 1, instr(path, '/') - 1) as original_entity_key, path
from history s;
The problem with the above query is that it runs forever. I think I've created an infinite loop. I've also tried using the below where filter in the bottom query of the union to try to find the root node, but Teradata gives me an error:
where p.new_entity_key in (select old_entity_key from table)
Any help would be greatly appreciated.
You'll need some sort of counter, and I think your join logic in your CTE doesn't make sense. I threw together a very simple volatile table example:
create volatile table tb
(old_entity_key char(1),
new_entity_key char(1),
rn integer)
on commit preserve rows;
insert into tb values ('1','2',1);
insert into tb values ('2','3',2);
insert into tb values ('3','4',3);
Now we can put together a recursive CTE:
with recursive history as (
select
old_entity_key,
new_entity_key,
cast(old_entity_key||'/'||new_entity_key as varchar(1000)) as path,
rn
from tb t
where
rn = 1
union all
select
t.old_entity_key,
t.new_entity_key,
h.path || '/' || t.new_entity_key,
t.rn
from
tb t
join history h
on t.rn = h.rn + 1
)
select * from history order by rn
The important things here are:
Limit your first pass (accomplished here by rn=1).
The second pass needs to pick up the "next" row, based on the previous row (t.rn = h.rn + 1)

Ordering a SQL query based on the value in a column determining the value of another column in the next row

My table looks like this:
Value Previous Next
37 NULL 42
42 37 3
3 42 79
79 3 NULL
Except, that the table is all out of order. (There are no duplicates, so that is not an issue.) I was wondering if there was any way to make a query that would order the output, basically saying "Next row 'value' = this row 'next'" as it's shown above ?
I have no control over the database and how this data is stored. I am just trying to retrieve it and organize it. SQL Server I believe 2008.
I realize that this wouldn't be difficult to reorganize afterwards, but I was just curious if I could write a query that just did that out of the box so I wouldn't have to worry about it.
This should do what you need:
WITH CTE AS (
SELECT YourTable.*, 0 Depth
FROM YourTable
WHERE Previous IS NULL
UNION ALL
SELECT YourTable.*, Depth + 1
FROM YourTable JOIN CTE
ON YourTable.Value = CTE.Next
)
SELECT * FROM CTE
ORDER BY Depth;
[SQL Fiddle] (Referential integrity and indexes omitted for brevity.)
We use a recursive common table expression (CTE) to travel from the head of the list (WHERE Previous IS NULL) to the trailing nodes (ON YourTable.Value = CTE.Next) and at the same time memorize the depth of the recursion that was needed to reach the current node (in Depth).
In the end, we simply sort by the depth of recursion that was needed to reach each of the nodes (ORDER BY Depth).
Use a recursive query, with the one i list here you can have multiple paths along your linked list:
with cte (Value, Previous, Next, Level)
as
(
select Value, Previous, Next, 0 as Level
from data
where Previous is null
union all
select d.Value, d.Previous, d.Next, Level + 1
from data d
inner join cte c on d.Previous = c.Value
)
select * from cte
fiddle here
If you are using Oracle, try Starts with- connect by
select ... start with initial-condition connect by
nocycle recursive-condition;
EDIT: For SQL-Server, use WITH syntax as below:
WITH rec(value, previous, next) AS
(SELECT value, previous, next
FROM table1
WHERE previous is null
UNION ALL
SELECT nextRec.value, nextRec.previous, nextRec.next
FROM table1 as nextRec, rec
WHERE rec.next = nextRec.value)
SELECT value, previous, next FROM rec;
One way to do this is with a join:
select t.*
from t left outer join
t tnext
on t.next = tnext.val
order by tnext.value
However, won't this do?
select t.*
from t
order by t.next
Something like this should work:
With Parent As (
Select
Value,
Previous,
Next
From
table
Where
Previous Is Null
Union All
Select
t.Value,
t.Previous,
t.Next
From
table t
Inner Join
Parent
On Parent.Next = t.Value
)
Select
*
From
Parent
Example

View to identify grouped values or object

As an example I have 5 objects. An object is the red dots bound together or adjacent to each other. In other words X+1 or X-1 or Y+1 or Y-1.
I need to create a MS SQL VIEW with will contain the first XY coordinate of each object like:
X,Y
=======
1. 1,1
2. 1,8
3. 4,3
4. 5,7
5. 6,5
I can't figure out how to group it in a VIEW (NOT using stored procedure). Anybody have any idea would be of great help.
Thanks
The other answer is already pretty long, so I'm leaving it as-is. This answer is much better, simpler and also correct whereas the other one has some edge-cases that will produce a wrong answer - I shall leave that exercise to the reader.
Note: Line breaks are added for clarity. The entire block is a single query
;with Walker(StartX,StartY,X,Y,Visited) as (
select X,Y,X,Y,CAST('('+right(X,3)+','+right(Y,3)+')' as Varchar(Max))
from puzzle
union all
select W.StartX,W.StartY,P.X,P.Y,W.Visited+'('+right(P.X,3)+','+right(P.Y,3)+')'
from Walker W
join Puzzle P on
(W.X=P.X and W.Y=P.Y+1 OR -- these four lines "collect" a cell next to
W.X=P.X and W.Y=P.Y-1 OR -- the current one in any direction
W.X=P.X+1 and W.Y=P.Y OR
W.X=P.X-1 and W.Y=P.Y)
AND W.Visited NOT LIKE '%('+right(P.X,3)+','+right(P.Y,3)+')%'
)
select X, Y, Visited
from
(
select W.X, W.Y, W.Visited, rn=row_number() over (
partition by W.X,W.Y
order by len(W.Visited) desc)
from Walker W
left join Walker Other
on Other.StartX=W.StartX and Other.StartY=W.StartY
and (Other.Y<W.Y or (Other.Y=W.Y and Other.X<W.X))
where Other.X is null
) Z
where rn=1
The first step is to set up a "walker" recursive table expression that will start at every
cell and travel as far as it can without retracing any step. Making sure that cells are not revisited is done by using the visited column, which stores each cell that has been visited from every starting point. In particular, this condition AND W.Visited NOT LIKE '%('+right(P.X,3)+','+right(P.Y,3)+')%' rejects cells that it has already visited.
To understand how the rest works, you need to look at the result generated by the "Walker" CTE by running "Select * from Walker order by StartX, StartY" after the CTE. A "piece" with 5 cells appears in at least 5 groups, each with a different (StartX,StartY), but each group has all the 5 (X,Y) pieces with different "Visited" paths.
The subquery (Z) uses a LEFT JOIN + IS NULL to weed the groups down to the single row in each group that contains the "first XY coordinate", defined by the condition
Other.StartX=W.StartX and Other.StartY=W.StartY
and (Other.Y<W.Y or (Other.Y=W.Y and Other.X<W.X))
The intention is for each cell that can be visited starting from (StartX, StartY), to compare against each other cell in the same group, and to find the cell where NO OTHER cell is on a higher row, or if they are on the same row, is to the left of this cell. This still leaves us with too many results, however. Consider just a 2-cell piece at (3,4) and (4,4):
StartX StartY X Y Visited
3 4 3 4 (3,4) ******
3 4 4 4 (3,4)(4,4)
4 4 4 4 (4,4)
4 4 3 4 (4,4)(3,4) ******
2 rows remain with the "first XY coordinate" of (3,4), marked with ******. We only need one row, so we use Row_Number and since we're numbering, we might as well go for the longest Visited path, which would give us as many of the cells within the piece as we can get.
The final outer query simply takes the first rows (RN=1) from each similar (X,Y) group.
To show ALL the cells of each piece, change the line
select X, Y, Visited
in the middle to
select X, Y, (
select distinct '('+right(StartX,3)+','+right(StartY,3)+')'
from Walker
where X=Z.X and Y=Z.Y
for xml path('')
) PieceCells
Which give this output
X Y PieceCells
1 1 (1,1)(2,1)(2,2)(3,2)
3 4 (3,4)(4,4)
5 6 (5,6)
7 5 (7,5)(8,5)(9,5)
8 1 (10,1)(8,1)(8,2)(9,1)(9,2)(9,3)
Ok. Its little bit hard. But in any case, I'm sure that in a simpler way this problem can not be solved.
So we have table:
CREATE Table Tbl1(Id int, X int, Y int)
INSERT INTO Tbl1
SELECT 1,1,1 UNION ALL
SELECT 2,1,2 UNION ALL
SELECT 3,1,8 UNION ALL
SELECT 4,1,9 UNION ALL
SELECT 5,1,10 UNION ALL
SELECT 6,2,2 UNION ALL
SELECT 7,2,3 UNION ALL
SELECT 8,2,8 UNION ALL
SELECT 9,2,9 UNION ALL
SELECT 10,3,9 UNION ALL
SELECT 11,4,3 UNION ALL
SELECT 12,4,4 UNION ALL
SELECT 13,5,7 UNION ALL
SELECT 14,5,8 UNION ALL
SELECT 15,5,9 UNION ALL
SELECT 16,6,5
And here is select query
with cte1 as
/*at first we make recursion to define groups of filled adjacent cells*/
/*as output of cte we have a lot of strings like <X>cell(1)X</X><Y>cell(1)Y</Y>...<X>cell(n)X</X><Y>cell(n)Y</Y>*/
(
SELECT id,X,Y,CAST('<X>'+CAST(X as varchar(10))+'</X><Y>'+CAST(Y as varchar(10))+'</Y>' as varchar(MAX)) info
FROM Tbl1
UNION ALL
SELECT b.id,a.X,a.Y,CAST(b.info + '<X>'+CAST(a.X as varchar(10))+'</X><Y>'+CAST(a.Y as varchar(10))+'</Y>' as varchar(MAX))
FROM Tbl1 a JOIN cte1 b
ON ((((a.X=b.X+1) OR (a.X=b.X-1)) AND a.Y=b.Y) OR (((a.Y=b.Y+1) OR (a.Y=b.Y-1)) AND a.X=b.X))
AND a.id<>b.id
AND
b.info NOT LIKE
('%'+('<X>'+CAST(a.X as varchar(10))+'</X><Y>'+CAST(a.Y as varchar(10))+'</Y>')+'%')
),
cte2 as
/*In this query, we select only the longest sequence of cell connections (first filter)*/
/*And we convert the string to a new standard (x,y | x,y | x,y |...| x,y) (for further separation)*/
(
SELECT *, ROW_NUMBER()OVER(ORDER BY info) cellGroupId
FROM(
SELECT REPLACE(REPLACE(REPLACE(REPLACE(info,'</Y><X>','|'),'</X><Y>',','),'<X>',''),'</Y>','') info
FROM(
SELECT info, MAX(LEN(info))OVER(PARTITION BY id)maxlen FROM cte1
) AS tmpTbl
WHERE maxlen=LEN(info)
)AS tmpTbl
),
cte3 as
/*In this query, we separated strings like (x,y | x,y | x,y |...| x,y) to many (x,y)*/
(
SELECT cellGroupId, CAST(LEFT(XYInfo,CHARINDEX(',',XYInfo)-1) as int) X, CAST(RIGHT(XYInfo,LEN(XYInfo)-CHARINDEX(',',XYInfo)) as int) Y
FROM(
SELECT cellGroupId, tmpTbl2.n.value('.','varchar(MAX)') XYinfo
FROM
(SELECT CAST('<r><c>' + REPLACE(info,'|','</c><c>')+'</c></r>' as XML) n, cellGroupId FROM cte2) AS tmpTbl1
CROSS APPLY n.nodes('/r/c') tmpTbl2(n)
) AS tmpTbl
),
cte4 as
/*In this query, we finally determined group of individual objects*/
(
SELECT cellGroupId,X,Y
FROM(
SELECT cellGroupId,X,Y,ROW_NUMBER()OVER(PARTITION BY X,Y ORDER BY cellGroupId ASC)rn
FROM(
SELECT *,
MAX(SumOfAdjacentCellsByGroup)OVER(PARTITION BY X,Y) Max_SumOfAdjacentCellsByGroup_ByXY /*calculated max value of <the sum of the cells in the group> by each cell*/
FROM(
SELECT *, SUM(1)OVER(PARTITION BY cellGroupId) SumOfAdjacentCellsByGroup /*calculated the sum of the cells in the group*/
FROM cte3
)AS TmpTbl
)AS TmpTbl
/*We got rid of the subgroups (i.e. [(1,2)(2,2)(2,3)] its subgroup of [(1,2)(1,1)(2,2)(2,3)])*/
/*it was second filter*/
WHERE SumOfAdjacentCellsByGroup=Max_SumOfAdjacentCellsByGroup_ByXY
)AS TmpTbl
/*We got rid of the same groups (i.e. [(1,1)(1,2)(2,2)(2,3)] its same as [(1,2)(1,1)(2,2)(2,3)])*/
/*it was third filter*/
WHERE rn=1
)
SELECT X,Y /*result*/
FROM(SELECT a.X,a.Y, ROW_NUMBER()OVER(PARTITION BY cellGroupId ORDER BY id)rn
FROM cte4 a JOIN Tbl1 b ON a.X=b.X AND a.Y=b.Y)a /*connect back*/
WHERE rn=1 /*first XY coordinate*/
Let's assume your coordinates are stored in X,Y form, something like this:
CREATE Table Puzzle(
id int identity, Y int, X int)
INSERT INTO Puzzle VALUES
(1,1),(1,2),(1,8),(1,9),(1,10),
(2,2),(2,3),(2,8),(2,9),
(3,9),
(4,3),(4,4),
(5,7),(5,8),(5,9),
(6,5)
This query then shows your Puzzle in board form (run in TEXT mode in SQL Management Studio)
SELECT (
SELECT (
SELECT CASE WHEN EXISTS (SELECT *
FROM Puzzle T
WHERE T.X=X.X and T.Y=Y.Y)
THEN 'X' ELSE '.' END
FROM (values(0),(1),(2),(3),(4),(5),
(6),(7),(8),(9),(10),(11)) X(X)
ORDER BY X.X
FOR XML PATH('')) + Char(13) + Char(10)
FROM (values(0),(1),(2),(3),(4),(5),(6),(7)) Y(Y)
ORDER BY Y.Y
FOR XML PATH(''), ROOT('a'), TYPE
).value('(/a)[1]','varchar(max)')
It gives you this
............
.XX.....XXX.
..XX....XX..
.........X..
...XX.......
.......XXX..
.....X......
............
This query done in 4 stages will give you the result of the TopLeft cell, if you define it as the Leftmost cell of the TopMost row.
-- the first table expression joins cells together on the Y-axis
;WITH FlattenOnY(Y,XLeft,XRight) AS (
-- start with all pieces
select Y,X,X
from puzzle
UNION ALL
-- keep connecting rightwards from each cell as far as possible
select B.Y,A.XLeft,B.X
from FlattenOnY A
join puzzle B on A.Y=B.Y and A.XRight+1=B.X
)
-- the second table expression flattens the results from the first, so that
-- it represents ALL the start-end blocks on each row of the Y-axis
,YPieces(Y,XLeft,XRight) as (
--
select Y,XLeft,Max(XRight)
from(
select Y,Min(XLeft)XLeft,XRight
from FlattenOnY
group by XRight,Y)Z
group by XLeft,Y
)
-- here, select * from YPieces will return the "blocks" such as
-- Row 1: 1-2 & 8-10
-- Row 2: 2-3 (equals Y,XLeft,XRight of 2,2,3)
-- etc
-- the third expression repeats the first, except it now combines on the X-axis
,FlattenOnX(Y,XLeft,CurPieceXLeft,CurPieceXRight,CurPieceY) AS (
-- start with all pieces
select Y,XLeft,XLeft,XRight,Y
from YPieces
UNION ALL
-- keep connecting rightwards from each cell as far as possible
select A.Y,A.XLeft,B.XLeft,B.XRight,B.Y
from FlattenOnX A
join YPieces B on A.CurPieceY+1=B.Y and A.CurPieceXRight>=B.XLeft and B.XRight>=A.CurPieceXLeft
)
-- and again we repeat the 2nd expression as the 4th, for the final pieces
select Y,XLeft X
from (
select *, rn2=row_number() over (
partition by Y,XLeft
order by CurPieceY desc)
from (
select *, rn=row_number() over (
partition by CurPieceXLeft, CurPieceXRight, CurPieceY
order by Y)
from flattenOnX
) Z1
where rn=1) Z2
where rn2=1
The result being
Y X
----------- -----------
1 1
1 8
4 3
5 7
6 5
Or is your representation in flat form something like this? If it is, give us a shout and I'll redo the solution
create table Puzzle (
row int,
[0] bit, [1] bit, [2] bit, [3] bit, [4] bit, [5] bit,
[6] bit, [7] bit, [8] bit, [9] bit, [10] bit, [11] bit
)
insert Puzzle values
(0,0,0,0,0,0,0,0,0,0,0,0,0),
(1,0,1,1,0,0,0,0,0,1,1,1,0),
(2,0,0,1,1,0,0,0,0,1,1,0,0),
(3,0,0,0,0,0,0,0,0,0,1,0,0),
(4,0,0,0,1,1,0,0,0,0,0,0,0),
(5,0,0,0,0,0,0,0,1,1,1,0,0),
(6,0,0,0,0,0,1,0,0,0,0,0,0),
(7,0,0,0,0,0,0,0,0,0,0,0,0)