How to merge ranges from different tables - sql

Giving the following 2 tables:
T1
------------------
From | To | Value
------------------
10 | 20 | XXX
20 | 30 | YYY
30 | 40 | ZZZ
T2
------------------
From | To | Value
------------------
10 | 15 | AAA
15 | 19 | BBB
19 | 39 | CCC
39 | 40 | DDD
What is the best way to get the result below, using T-SQL on SQL Server 2008?
The From/To ranges are sequential (there are no gaps) and the next From always has the same value as the previous To
Desired result
-------------------------------
From | To | Value1 | Value2
-------------------------------
10 | 15 | XXX | AAA
15 | 19 | XXX | BBB
19 | 20 | XXX | CCC
20 | 30 | YYY | CCC
30 | 39 | ZZZ | CCC
39 | 40 | ZZZ | DDD

First I declare data that looks like the data you posted. Please correct me if any assumptions I have made are wrong. Better would be to post your own declaration in the question so we are all working with the same data.
DECLARE #T1 TABLE (
[From] INT,
[To] INT,
[Value] CHAR(3)
);
INSERT INTO #T1 (
[From],
[To],
[Value]
)
VALUES
(10, 20, 'XXX'),
(20, 30, 'YYY'),
(30, 40, 'ZZZ');
DECLARE #T2 TABLE (
[From] INT,
[To] INT,
[Value] CHAR(3)
);
INSERT INTO #T2 (
[From],
[To],
[Value]
)
VALUES
(10, 15, 'AAA'),
(15, 19, 'BBB'),
(19, 39, 'CCC'),
(39, 40, 'DDD');
Here is my select query to generate your expected result:
SELECT
CASE
WHEN [#T1].[From] > [#T2].[From]
THEN [#T1].[From]
ELSE [#T2].[From]
END AS [From],
CASE
WHEN [#T1].[To] < [#T2].[To]
THEN [#T1].[To]
ELSE [#T2].[To]
END AS [To],
[#T1].[Value],
[#T2].[Value]
FROM #T1
INNER JOIN #T2 ON
(
[#T1].[From] <= [#T2].[From] AND
[#T1].[To] > [#T2].[From]
) OR
(
[#T2].[From] <= [#T1].[From] AND
[#T2].[To] > [#T1].[From]
);

Stealing #isme's data setup, I wrote the following:
;With EPs as (
select [From] as EP from #T1
union
select [To] from #T1
union
select [From] from #T2
union
select [To] from #T2
), OrderedEndpoints as (
select EP,ROW_NUMBER() OVER (ORDER BY EP) as rn from EPs
)
select
oe1.EP,
oe2.EP,
t1.Value,
t2.Value
from
OrderedEndpoints oe1
inner join
OrderedEndpoints oe2
on
oe1.rn = oe2.rn - 1
inner join
#T1 t1
on
oe1.EP < t1.[To] and
oe2.EP > t1.[From]
inner join
#T2 t2
on
oe1.EP < t2.[To] and
oe2.EP > t2.[From]
That is, you create a set containing all of the possible end points of periods (EPs), then you "sort" those and assign each one a row number (OrderedEPs).
Then the final query assembles each "adjacent" pair of rows together, and joins back to the original tables to find which rows from each one overlap the selected range.

The below query finds the smallest ranges, then picks the values back out the tables again:
SELECT ranges.from, ranges.to, T1.Value, T2.Value
FROM (SELECT all_from.from, min(all_to.to) as to
FROM (SELECT T1.FROM
FROM T1
UNION
SELECT T2.FROM
FROM T2) all_from
JOIN (SELECT T1.TO
FROM T1
UNION
SELECT T2.FROM
FROM T2) all_to ON all_from.from < all_to.to
GROUP BY all_from.from) ranges
JOIN T1 ON ranges.from >= T1.from AND ranges.to <= T1.to
JOIN T2 ON ranges.from >= T2.from AND ranges.to <= T2.to
ORDER BY ranges.from

Thanks for the answers, but I ended using a CTE, wgich I think is cleaner.
DECLARE #T1 TABLE ([From] INT, [To] INT, [Value] CHAR(3));
DECLARE #T2 TABLE ([From] INT, [To] INT, [Value] CHAR(3));
INSERT INTO #T1 ( [From], [To], [Value]) VALUES (10, 20, 'XXX'), (20, 30, 'YYY'), (30, 40, 'ZZZ');
INSERT INTO #T2 ( [From], [To], [Value]) VALUES (10, 15, 'AAA'), (15, 19, 'BBB'), (19, 39, 'CCC'), (39, 40, 'DDD');
;with merged1 as
(
select
t1.[From] as from1,
t1.[to] as to1,
t1.Value as Value1,
t2.[From] as from2,
t2.[to] as to2,
t2.Value as Value2
from #t1 t1
inner join #T2 t2
on t1.[From] < t2.[To]
and t1.[To] >= t2.[From]
)
,merged2 as
(
select
case when from2>=from1 then from2 else from1 end as [From]
,case when to2<=to1 then to2 else to1 end as [To]
,value1
,value2
from merged1
)
select * from merged2

Related

SQL Join data and get rows that don't match with NULL

I have two tables that I want join as follows:
Table 1
Code1 | Code2 | Date(1) | Amount(1)
A | AA | 201802 | 100
A | AA | 201803 | 50
A | AA | 201804 | 30
Table 2
Code1 | Code2 | Date(2) | Amount(2)
A | AA | 201801 | 20
A | AA | 201802 | 10
A | AA | 201803 | 10
And I want the resulting table to look like this:
Result
Code1 | Code2 | Date(1) | Date(2) | Amount(1) | Amount(2)
A | AA | NULL | 201801 | NULL | 20
A | AA | 201802 | 201802 | 100 | 10
A | AA | 201803 | 201803 | 50 | 10
A | AA | 201804 | NULL | 30 | NULL
So I need to join these two tables
on table1.Code1 = table2.Code1 AND table1.Code2 = table2.Code2 AND table1.Date(1) = table2.Date(2)
But I also want the rows where the dates don't match with a null is the columns related to the non matching table (such as the row for Date(1) = 201804 in my example).
I have tried joining that two tables with left, right and outer join but I still am not successful in getting the rows with the nulls (probably because Code1 and Code2 don't exist for that particular missing row)
Maybe a cross apply could work, but I am not sure how to execute it.
I want the most efficient way in terms of performance because this is a part of a big query containing lots of data and lots of calculations.
UPDATE:
The code I used is:
Select table1.Code 1, table1.Code2, Table1.Date(1), table2.Date(2), table1.Amount(1), table2.amount(2)
FROM Table1
Full Outer Join
table2 ON
table1.Code1 = table2.Code1
AND table1.Code2 = table2.Code2
AND table1.date(1) = table2.date(2)
Which gives me the following result:
Code1 | Code2 | Date(1) | Date(2) | Amount(1) | Amount(2)
A | AA | 201802 | 201802 | 100 | 10
A | AA | 201803 | 201803 | 50 | 10
Which is missing these two rows:
A | AA | NULL | 201801 | NULL | 20
A | AA | 201804 | NULL | 30 | NULL
You may try this.
--sample dataset
DECLARE #tab1 as table (
Code1 varchar(10),
Code2 varchar(10),
Date1 int,
Amount1 int )
insert into #tab1
values
('A', 'AA', 201802, 100),
('A', 'AA', 201803, 50),
('A', 'AA', 201804, 30),
('B', 'AA', 201802, 100) --additional
DECLARE #tab2 as table (
Code1 varchar(10),
Code2 varchar(10),
Date2 int,
Amount2 int )
insert into #tab2
values
('A', 'AA', 201802, 100),
('A', 'AA', 201803, 50),
('A', 'AA', 201801, 30)
query
SELECT *
FROM (
select
coalesce(table1.Code1,table2.Code1) as Code1,
coalesce(table1.Code2,table2.Code2) as Code2,
table1.Date1,
table2.Date2,
table1.Amount1,
table2.amount2
FROM #tab1 as Table1
Full Outer Join #tab2 as table2 ON
table1.Code1 = table2.Code1
AND table1.Code2 = table2.Code2
AND table1.date1= table2.date2
) as t1
CROSS APPLY ( --to exclude records not matched by "Code 1 and Code 2"
SELECT top 1
Code1
FROM #tab2 as t
where t.Code1 = t1.Code1
and t.Code2 = t1.Code2
) as c
ORDER BY t1.Date1
or like this:
select
coalesce(table1.Code1,table2.Code1) as Code1,
coalesce(table1.Code2,table2.Code2) as Code2,
table1.Date1,
table2.Date2,
table1.Amount1,
table2.amount2
FROM #tab1 as Table1
Full Outer Join #tab2 as table2 ON
table1.Code1 = table2.Code1
AND table1.Code2 = table2.Code2
AND table1.date1= table2.date2
where exists (select null --to exclude records not matched by "Code 1 and Code 2"
from #tab2 as t2
where coalesce(table1.Code1,table2.Code1) = t2.Code1
and coalesce(table1.Code2,table2.Code2) = t2.Code2)
ORDER BY table1.Date1
My suggested solution involves a full join and another join to a derived table that contains all the combinations of code1 and code2 that exists in both tables, using the intersect operator.
First, create and populate sample data (Please save us this step in your future questions):
DECLARE #T1 AS TABLE
(
Code1 char(1),
Code2 char(2),
Date1 char(6),
Amount1 int
)
DECLARE #T2 AS TABLE
(
Code1 char(1),
Code2 char(2),
Date2 char(6),
Amount2 int
)
INSERT INTO #T1 (Code1, Code2, Date1, Amount1) VALUES
('A', 'AA', '201802', 100)
,('A', 'AA', '201803', 50)
,('A', 'AA', '201804', 30)
,('B', 'AA', '201802', 30); -- Note: Added to the original sample data
INSERT INTO #T2 (Code1, Code2, Date2, Amount2) VALUES
('A', 'AA', '201801', 20)
,('A', 'AA', '201802', 10)
,('A', 'AA', '201803', 10)
,('A', 'AB', '201802', 10); -- Note: Added to the original sample data
The query:
SELECT ISNULL(T1.Code1, T2.Code1) As Code1,
ISNULL(T1.Code2, T2.Code2) As Code2,
Date1, Date2, Amount1, Amount2
FROM #T1 As T1
FULL JOIN #T2 As T2
ON T1.Code1 = T2.Code1
AND T1.Code2 = T2.Code2
AND T1.Date1 = T2.Date2
-- Remove this next join if you want to get rows where codes don't match
JOIN (
SELECT Code1, Code2
FROM #T1
INTERSECT
SELECT Code1, Code2
FROM #T2
) As CommonCodes
ON CommonCodes.Code1 = ISNULL(T1.Code1, T2.Code1)
AND CommonCodes.Code2 = ISNULL(T1.Code2, T2.Code2)
ORDER BY Date1
Results:
Code1 Code2 Date1 Date2 Amount1 Amount2
A AA NULL 201801 NULL 20
A AA 201802 201802 100 10
A AA 201803 201803 50 10
A AA 201804 NULL 30 NULL
You can see a live demo on rextester.
Your updated query should work if you ISNULL the CodeX columns.
declare #t1 table (Code1 varchar(4), Code2 varchar(4), Date1 date, Amount1 int)
declare #t2 table (Code1 varchar(4), Code2 varchar(4), Date2 date, Amount2 int)
insert into #t1
values
('A', 'AA', '2018-02-01', 100 ),
('A', 'AA', '2018-03-01', 50 ),
('A', 'AA', '2018-04-01', 30 )
insert into #t2
values
('A', 'AA', '2018-01-01', 20 ),
('A', 'AA', '2018-02-01', 10 ),
('A', 'AA', '2018-03-01', 10 )
SELECT
code1
,code2
,date1
,date2
,amount1
,amount2
FROM (
SELECT code1, code2 FROM #t1
INTERSECT
SELECT code1, code2 FROM #t2
) t0
CROSS APPLY (
SELECT
date1, date2, amount1, amount2
FROM #t1 t1
FULL OUTER JOIN #t2 t2 ON t1.Code1 = t2.Code1 and t1.Code2 = t2.Code2 and date1 = date2
WHERE
t0.code1 = isnull(t1.Code1, t2.code1)
and t0.code2 = isnull(t1.Code2, t2.code2)
) tt
ORDER BY
date1, date2

Sum all pair of rows in the table

I need sql query which sums all of the possible raw pairs in the table.
My table looks like this :
ID | Name | Value
1 | A | 100
2 | B | 150
3 | C | 250
4 | D | 600
In this case the query output should be :
FistName | SecondName | Sum
A | B | 250
A | C | 350
A | D | 700
B | C | 400
B | D | 750
C | D | 850
Try this:
select
t1.Name as FirstName,
t2.Name as SecondName,
t1.Value+t2.Value as Sum
from yourtable as t1
inner join yourtable as t2 on (t1.ID<t2.ID)
Just INNER JOIN based on the condition that l.ID < r.ID. This ensures that the row is not joined to itself and there are no duplicate in the form of 1, 2 and 2, 1:
DECLARE #t TABLE (ID INT, Name VARCHAR(100), Value INT);
INSERT INTO #t VALUES
(1, 'A', 100),
(2, 'B', 150),
(3, 'C', 250),
(4, 'D', 600);
SELECT l.Name FirstName, r.Name SecondName, l.Value + r.value [Sum]
FROM #t AS l
INNER JOIN #t AS r ON l.ID < r.ID
ORDER BY FirstName, SecondName
This is one approach that works:
CREATE TABLE #T (ID INT, Name VARCHAR (10), VALUE INT)
INSERT INTO #T VALUES (1, 'A', 100), (2, 'B', '150'), (3, 'C', 250), (4, 'D', 600)
SELECT CASE WHEN T.Name < T2.Name THEN (T.Name + T2.Name)
ELSE (T2.Name + T.Name)
END AS FullName,
SUM (T.Value) AS TotalValue
FROM #T AS T
FULL OUTER JOIN #T AS T2 ON T.Name <> T2.Name
GROUP BY CASE WHEN T.Name < T2.Name THEN (T.Name + T2.Name)
ELSE (T2.Name + T.Name) END
A chance to use the rarely used Cartesian Join, with the cross join operator:
select left.Name, right.Name, left.Value + right.Value as Sum
from DataTable left
cross join DataTable right
order by left.Name, right.Name;
Note, is the input table is large this will lead to a lot of rows. To avoid rows being combined both ways and to themselves ({A,B} and {B,A} appearing, {A,A} appearing) conditions can be added.

Use data to name column

I have 2 tables and I want to run a query where I use a value in one of the tables to change what column dateadd uses.
table1
id value date1 date2 date3
-------|-------|------------|------------|-----------|
1 | 10 | 04/03/2018 | 04/03/2017 |01/03/2016 |
2 | 1 | 04/03/2018 | 05/03/2015 |02/03/2018 |
3 | 2 | 04/03/2016 | 06/03/2016 |03/03/2018 |
4 | 1 | 04/03/2015 | 07/03/2018 |04/03/2017 |
5 | 2 | 04/03/2017 | 09/03/2018 |05/03/2019 |
table2
id value
-------|-------|
1 | date1 |
2 | date3 |
3 | date3 |
4 | date2 |
5 | date1 |
The normal way to do ID 1 would be something like dateadd(month,10,date1). I'm not sure how to do this without me writing it every single time though.
select *
from table1
join table2 on table1.id = table2.id
where DATEADD(month, table1.value, table1.[table2.value]) between '1/1/18' and '12/31/18'
Twelfth's answer is correct. I just wanted to see if his theory works, and it does - here's a working implementation.
declare #table1 table (id int, value int, date1 date, date2 date, date3 date)
declare #table2 table (id int, colname varchar(5))
insert into #table1 values (1,10,'04/03/2018','04/03/2017','01/03/2016')
insert into #table1 values (2,1 ,'04/03/2018','05/03/2015','02/03/2018')
insert into #table1 values (3,2 ,'04/03/2016','06/03/2016','03/03/2018')
insert into #table1 values (4,1 ,'04/03/2015','07/03/2018','04/03/2017')
insert into #table1 values (5,2 ,'04/03/2017','09/03/2018','05/03/2019')
insert into #table2 values (1, 'date1')
insert into #table2 values (2, 'date3')
insert into #table2 values (3, 'date3')
insert into #table2 values (4, 'date2')
insert into #table2 values (5, 'date1')
select id, colname, newdate
from
(
select sq.id, sq.colname, dateadd(month, sq.value, sq.dn) as newdate
from #table1 t1
unpivot
(
dn for colname in ([date1], [date2], [date3])
)sq
inner join #table2 t2 on sq.id = t2.id and sq.colname = t2.colname
)sq where newdate between '1/1/2018' and '12/31/2018'
Output:
id colname newdate
2 date3 2018-03-03
3 date3 2018-05-03
4 date2 2018-08-03
I've had this as theory, you're actually the first questioner I can try to apply it with. The idea is to unpivot your data and then join on the value column.
select id,column_name,value
from table1 t1
unpivot (
value
for column_name in (date1,date2,date3,date4,date5,date6,date7,date8,date9,date10)
) a
inner join table2 t2 on t1.id = t2.id and t2.value = a.column_name
where t2.value
between '1/1/18' and '12/31/18'
I can't guarantee that will work and am curious how it does for you.

Full Outer Join on Incomplete Data (by id variable)

I have two tables (see example data below). I need to keep all of the ID values in table 1 and merge table 1 with table 2 by sequence. The tricky part is that I also have to retain the field value1 from table 1 and value2 from table 2.
table 1 :
ID sequence value1
-------------------------
p1 1 5
p1 2 10
p2 1 15
p2 2 20
table 2 :
sequence value2
-------------------------
1 10
2 20
3 30
4 40
I need the resulting table to appear like so:
ID sequence value1 value2
----------------------------------
p1 1 5 10
p1 2 10 20
p1 3 - 30
p1 4 - 40
p2 1 15 10
p2 2 20 20
p2 3 - 30
p2 4 - 40
I have tried the following sql code, but it doesn't merge the missing values from from value1 field in table 1 and merge it with the values2 field from table 2
select t1.ID, t2.sequence, t1.value1, t2.value2 from
t2 full outer join t1 on t2.sequence=t1.sequence
Any assistance you can provide is greatly appreciated.
You can try something like this:
select coalesce(t1.[id], t3.[id]),
, t2.[sequence]
, t1.[value]
, t2.[value]
from [tbl2] t2
left join [tbl1] t1 on t1.[sequence] = t2.[sequence]
left join (select distinct [id] from [tbl1]) t3 on t1.[id] is null
SQLFiddle
One way with CROSS JOIN and OUTER APPLY:
DECLARE #t1 TABLE(ID CHAR(2), S INT, V1 INT)
DECLARE #t2 TABLE(S INT, V2 INT)
INSERT INTO #t1 VALUES
('p1', 1, 5),
('p1', 2, 10),
('p2', 1, 15),
('p2', 2, 20)
INSERT INTO #t2 VALUES
(1, 10),
(2, 20),
(3, 30),
(4, 40)
SELECT c.ID, t2.S, ca.V1, t2.V2 FROM #t2 t2
CROSS JOIN (SELECT DISTINCT ID FROM #t1) c
OUTER APPLY(SELECT * FROM #t1 t1 WHERE c.ID = t1.ID AND t1.S = t2.S) ca
ORDER BY c.ID, t2.S
Output:
ID S V1 V2
p1 1 5 10
p1 2 10 20
p1 3 NULL 30
p1 4 NULL 40
p2 1 15 10
p2 2 20 20
p2 3 NULL 30
p2 4 NULL 40
Given this schema:
create table #table_1
(
ID varchar(8) not null ,
sequence int not null ,
value int not null ,
primary key clustered ( ID , sequence ) ,
unique nonclustered ( sequence , ID ) ,
)
create table #table_2
(
sequence int not null ,
value int not null ,
primary key clustered ( sequence ) ,
)
go
insert #table_1 values ( 'p1' , 1 , 5 )
insert #table_1 values ( 'p1' , 2 , 5 )
insert #table_1 values ( 'p2' , 1 , 15 )
insert #table_1 values ( 'p2' , 2 , 20 )
insert #table_2 values ( 1 , 10 )
insert #table_2 values ( 2 , 20 )
insert #table_2 values ( 3 , 30 )
insert #table_2 values ( 4 , 40 )
go
This should get you what you want:
select ID = map.ID ,
sequence = map.sequence ,
value1 = t1.value ,
value2 = t2.value
from ( select distinct
t1.ID ,
t2.sequence
from #table_1 t1
cross join #table_2 t2
) map
left join #table_1 t1 on t1.ID = map.ID
and t1.sequence = map.sequence
join #table_2 t2 on t2.sequence = map.sequence
order by map.ID ,
map.sequence
go
Producing:
ID sequence value1 value2
== ======== ====== ======
p1 1 5 10
p1 2 5 20
p1 3 - 30
p1 4 - 40
p2 1 15 10
p2 2 20 20
p2 3 - 30
p2 4 - 40

Merge a two way relation in the same table in SQL Server

Current Data
ID | Name1 | Name2
<guid1> | XMind | MindNode
<guid2> | MindNode | XMind
<guid3> | avast | Hitman Pro
<guid4> | Hitman Pro | avast
<guid5> | PPLive | Hola!
<guid6> | ZenMate | Hola!
<guid7> | Hola! | PPLive
<guid8> | Hola! | ZenMate
Required Output
ID1 | ID2 | Name1 | Name2
<guid1> | <guid2> | XMind | MindNode
<guid3> | <guid4> | avast | Hitman Pro
<guid5> | <guid7> | PPLive | Hola!
<guid6> | <guid8> | Hola! | ZenMate
These are relations between apps. I want to show that Avast and Hitman has a relation but in this view i do not need to show in what "direction" they have an relation. It's a given in this view that the relation goes both ways.
EDIT: Seems like my example was to simple. The solution doesn't work with more data.
DECLARE #a TABLE (ID INT, Name1 VARCHAR(50), Name2 VARCHAR(50))
INSERT INTO #a VALUES ( 1, 'XMind', 'MindNode' )
INSERT INTO #a VALUES ( 2, 'MindNode', 'XMind' )
INSERT INTO #a VALUES ( 3, 'avast', 'Hitman Pro' )
INSERT INTO #a VALUES ( 4, 'Hitman Pro', 'avast' )
INSERT INTO #a VALUES ( 5, 'PPLive Video Accelerator', 'Hola! Better Internet' )
INSERT INTO #a VALUES ( 6, 'ZenMate', 'Hola! Better Internet' )
INSERT INTO #a VALUES ( 7, 'Hola! Better Internet', 'PPLive Video Accelerator' )
INSERT INTO #a VALUES ( 8, 'Hola! Better Internet', 'ZenMate' )
SELECT a1.ID AS ID1 ,
a2.ID AS ID2 ,
a1.Name1 ,
a2.Name1 AS Name2
FROM #a a1
JOIN #a a2 ON a1.Name1 = a2.Name2
AND a1.ID < a2.ID -- avoid duplicates
This works however so i guess it's the Guid that is messing with me.
EDIT AGAIN:
I haven't looked at this for a while and i thought it worked but i just realized it does not. I've struggled all morning with this but i must admit that SQL is not really my strong suite. The thing is this.
DECLARE #a TABLE (ID int, Name1 VARCHAR(50), Name2 VARCHAR(50))
INSERT INTO #a VALUES ( 1, 'XMind', 'MindNode' )
INSERT INTO #a VALUES ( 2, 'MindNode', 'XMind' )
INSERT INTO #a VALUES ( 3, 'avast', 'Hitman Pro' )
INSERT INTO #a VALUES ( 4, 'PPLive Video Accelerator', 'Hola! Better Internet' )
INSERT INTO #a VALUES ( 5, 'ZenMate', 'Hola! Better Internet' )
INSERT INTO #a VALUES ( 6, 'Hitman Pro', 'avast' )
INSERT INTO #a VALUES ( 7, 'Hola! Better Internet', 'PPLive Video Accelerator' )
INSERT INTO #a VALUES ( 8, 'Hola! Better Internet', 'ZenMate' )
INSERT INTO #a VALUES ( 9, 'XX', 'A' )
INSERT INTO #a VALUES ( 10, 'XX', 'BB' )
INSERT INTO #a VALUES ( 11, 'BB', 'XX' )
INSERT INTO #a VALUES ( 12, 'A', 'XX' )
INSERT INTO #a VALUES ( 13, 'XX', 'CC' )
INSERT INTO #a VALUES ( 14, 'CC', 'XX' )
;With CTE as
(
SELECT a1.ID AS ID1 ,
a2.ID AS ID2 ,
a1.Name1 ,
a2.Name1 AS Name2,
CheckSum(Case when a1.Name1>a2.Name1 then a2.Name1+a1.Name1 else a1.Name1+a2.Name1 end) ck, -- just for display
Row_Number() over (Partition by CheckSum(Case when a1.Name1>a2.Name1 then a2.Name1+a1.Name1 else a1.Name1+a2.Name1 end)
order by CheckSum(Case when a1.Name1>a2.Name1 then a2.Name1+a1.Name1 else a1.Name1+a2.Name1 end)) as rn
FROM #a a1
JOIN #a a2 ON a1.Name1 = a2.Name2
)
Select ID1, ID2,Name1, Name2
from CTE C1
where rn=1
When i use this code it sure works fine with the names but it doesn't match the ID's correctly.
The result is
ID1 | ID2 | Name1 | Name2
12 | 9 | A | X (Correct)
7 | 5 | Hola! | ZenMate (Not Correct)
[..]
I've pulled my hair all morning but i can't figure this out. I still use Guid's as ID's and just use Int's here to make it a bit more readable.
DECLARE #a TABLE (ID INT, Name1 VARCHAR(50), Name2 VARCHAR(50))
INSERT INTO #a VALUES ( 1, 'XMind', 'MindNode' )
INSERT INTO #a VALUES ( 2, 'MindNode', 'XMind' )
INSERT INTO #a VALUES ( 3, 'avast', 'Hitman Pro' )
INSERT INTO #a VALUES ( 4, 'Hitman Pro', 'avast' )
SELECT a1.ID AS ID1 ,
a2.ID AS ID2 ,
a1.Name1 ,
a2.Name1 AS Name2
FROM #a a1
JOIN #a a2 ON a1.Name1 = a2.Name2
AND a1.ID < a2.ID -- avoid duplicates
Referring to the amendment and extension of your question, a more complicated solution is required.
We form a CHECKSUM on a1.Name1,a2.Name (to get an identical we exchanged on size).
Using this we generate with ROW_NUMBER (Transact-SQL) a number and use only rows from the result with number 1.
DECLARE #a TABLE (ID uniqueIdentifier, Name1 VARCHAR(50), Name2 VARCHAR(50))
INSERT INTO #a VALUES ( NewID(), 'XMind', 'MindNode' )
INSERT INTO #a VALUES ( NewID(), 'MindNode', 'XMind' )
INSERT INTO #a VALUES ( NewID(), 'avast', 'Hitman Pro' )
INSERT INTO #a VALUES ( NewID(), 'Hitman Pro', 'avast' )
INSERT INTO #a VALUES ( NewID(), 'PPLive Video Accelerator', 'Hola! Better Internet' )
INSERT INTO #a VALUES ( NewID(), 'ZenMate', 'Hola! Better Internet' )
INSERT INTO #a VALUES ( NewID(), 'Hola! Better Internet', 'PPLive Video Accelerator' )
INSERT INTO #a VALUES ( NewID(), 'Hola! Better Internet', 'ZenMate' )
INSERT INTO #a VALUES ( NewID(), 'XX', 'A' )
INSERT INTO #a VALUES ( NewID(), 'A', 'XX' )
INSERT INTO #a VALUES ( NewID(), 'XX', 'BB' )
INSERT INTO #a VALUES ( NewID(), 'BB', 'XX' )
INSERT INTO #a VALUES ( NewID(), 'XX', 'CC' )
INSERT INTO #a VALUES ( NewID(), 'CC', 'XX' )
;With CTE as
(
SELECT a1.ID AS ID1 ,
a2.ID AS ID2 ,
a1.Name1 ,
a2.Name1 AS Name2,
CheckSum(Case when a1.Name1>a2.Name1 then a2.Name1+a1.Name1 else a1.Name1+a2.Name1 end) ck, -- just for display
Row_Number() over (Partition by CheckSum(Case when a1.Name1>a2.Name1 then a2.Name1+a1.Name1 else a1.Name1+a2.Name1 end)
order by CheckSum(Case when a1.Name1>a2.Name1 then a2.Name1+a1.Name1 else a1.Name1+a2.Name1 end)) as rn
FROM #a a1
JOIN #a a2 ON a1.Name1 = a2.Name2
)
Select *
from CTE C1
where rn=1
Edit:
If you only want to get those where both fields are fitting the needed query would simply be:
SELECT a1.ID AS ID1 , a2.ID AS ID2 , a1.Name1 , a2.Name1 AS Name2
FROM #a a1
JOIN #a a2 ON a1.Name1 = a2.Name2 and a1.Name2 = a2.Name1 AND a1.ID < a2.ID
If the output should contain only two-way relations ('XX' + 'A') AND ('A' + 'XX'), try this:
;
WITH m (ID1, ID2, Name1, Name2) AS (
SELECT ID1, ID2, Name1, Name2
FROM (
SELECT a1.ID AS ID1
,a2.ID AS ID2
,a1.Name1 AS Name1
,a2.Name1 AS Name2
,ROW_NUMBER() OVER (PARTITION BY a1.Name1, a2.Name1 ORDER BY (SELECT 1)) AS n
FROM #a AS a1
JOIN #a AS a2
ON a1.Name1 = a2.Name2
AND a1.Name2 = a2.Name1
) AS T
WHERE n = 1
)
SELECT DISTINCT *
FROM (
SELECT ID1, ID2, Name1, Name2
FROM m
WHERE ID1 <= ID2
UNION ALL
SELECT ID2, ID1, Name2, Name1
FROM m
WHERE ID1 > ID2
) AS dm
It produces the output as follows:
+------+-----+--------------------------+-----------------------+
| ID1 | ID2 | Name1 | Name2 |
+------+-----+--------------------------+-----------------------+
| 1 | 2 | XMind | MindNode |
| 3 | 6 | avast | Hitman Pro |
| 4 | 7 | PPLive Video Accelerator | Hola! Better Internet |
| 5 | 8 | ZenMate | Hola! Better Internet |
| 9 | 12 | XX | A |
| 10 | 11 | XX | BB |
| 13 | 14 | XX | CC |
+------+-----+--------------------------+-----------------------+
Just rank your rows with ROW_NUMBER function and use this rank in join instead of original ID column:
DECLARE #a TABLE (ID UNIQUEIDENTIFIER, Name1 VARCHAR(50), Name2 VARCHAR(50))
INSERT INTO #a VALUES ( NEWID(), 'XMind', 'MindNode' )
INSERT INTO #a VALUES ( NEWID(), 'MindNode', 'XMind' )
INSERT INTO #a VALUES ( NEWID(), 'avast', 'Hitman Pro' )
INSERT INTO #a VALUES ( NEWID(), 'Hitman Pro', 'avast' )
INSERT INTO #a VALUES ( NEWID(), 'PPLive Video Accelerator', 'Hola! Better Internet' )
INSERT INTO #a VALUES ( NEWID(), 'ZenMate', 'Hola! Better Internet' )
INSERT INTO #a VALUES ( NEWID(), 'Hola! Better Internet', 'PPLive Video Accelerator' )
INSERT INTO #a VALUES ( NEWID(), 'Hola! Better Internet', 'ZenMate' )
;WITH cte AS(SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT 1)) rn FROM #a)
SELECT a1.ID AS ID1 ,
a2.ID AS ID2 ,
a1.Name1 ,
a2.Name1 AS Name2
FROM cte a1
JOIN cte a2 ON a1.Name1 = a2.Name2 AND
a2.Name1 = a1.Name2 AND
a1.rn < a2.rn
Output:
ID1 ID2 Name1 Name2
Guid Guid XMind MindNode
Guid Guid avast Hitman Pro
Guid Guid PPLive Video Accelerator Hola! Better Internet
Guid Guid ZenMate Hola! Better Internet
I suggest you to use this simple way:
SELECT
t2.ID, t3.ID ID2,
t1.Name1,t1.Name2
FROM (
SELECT DISTINCT
CASE WHEN Name1 <= Name2 THEN Name1 ELSE Name2 END AS Name1,
CASE WHEN Name1 <= Name2 THEN Name2 ELSE Name1 END AS Name2
FROM
#a) t1
JOIN
#a t2 ON t1.Name1+t1.Name2 = t2.Name1+t2.Name2
JOIN
#a t3 ON t1.Name1+t1.Name2 = t3.Name2+t3.Name1
For this:
ID | ID2 | Name1 | Name2
----+-----+-----------------------+---------------------------
12 | 9 | A | XX
3 | 4 | avast | Hitman Pro
11 | 10 | BB | XX
14 | 13 | CC | XX
7 | 5 | Hola! Better Internet | PPLive Video Accelerator
8 | 6 | Hola! Better Internet | ZenMate
2 | 1 | MindNode | XMind
You can solve this using a CROSS APPLY
SELECT a2.ID ID_1,a1.ID ID_2, a2.Name1 , a2.Name2
FROM #a a1
CROSS APPLY
(
SELECT ID, Name2, Name1
FROM #a aa
WHERE aa.Name1 = a1.Name2 AND a1.Name1 = aa.Name2 AND a1.ID > aa.ID
) a2
You can try also:
select min(ID) ID1,
max(ID) ID2,
Name1,
Name2
from ( -- Here I get all the IDs and each couple sorted
-- Change > to < if you don't like the order
select ID,
case
when Name1 > Name2 then Name1
else Name2
end Name1,
case
when Name1 > Name2 then Name2
else Name1
end Name2
from table1
) as t
group by Name1,
Name2
You can even tansform this in a simgle query, without the inner one, but I think in this way it's more readable and you can understand better my approach.