Matching records across multiple possible IDs - sql

I have multiple records with sparsely populated identifiers (I will call these ID Numbers). I can have a maximum of two different ID Numbers per record and want to be able to traverse all the related records together so that I can create a single shared identifier. I want to achieve this in a T-SQL query.
Essentially, here is some sample data:
+-------+-------+--------+-----+------+
| RowId | ID1 | ID2 | ID3 | ID4 |
+-------+-------+--------+-----+------+
| 1 | 11111 | | | |
| 2 | 11111 | | | |
| 3 | 11111 | AAAAA | | |
| 4 | | BBBBBB | BC1 | |
| 5 | | | BC1 | O111 |
| 6 | | GGGGG | BC1 | |
| 7 | | AAAAA | | O111 |
| 8 | | CCCCCC | | |
| 9 | 99999 | | | |
| 10 | 99999 | DDDDDD | | |
| 11 | | | | O222 |
| 12 | | EEEEEE | | O222 |
| 13 | | EEEEEE | | O333 |
+-------+-------+--------+-----+------+
So for example,
11111 is linked to AAAAA in RowId3,
and AAAAA is also linked to O111 in rowId 7.
O111 is linked to BC1 in RowId 5.
BC1 is linked to BBBBBB in RowId 4,
etc.
Also,
I want to create a new single identifier once all of these rows are linked.
Here is the output I want to achieve for all of the data above:
Denormalised:
+---------+-------+--------+-----+------+
| GroupId | ID1 | ID2 | ID3 | ID4 |
+---------+-------+--------+-----+------+
| 1 | 11111 | AAAAA | BC1 | O111 |
| 1 | 11111 | BBBBBB | BC1 | O111 |
| 1 | 11111 | GGGGG | BC1 | O111 |
| 2 | | CCCCCC | | |
| 3 | 99999 | DDDDDD | | |
| 4 | | EEEEEE | | O222 |
| 4 | | EEEEEE | | O333 |
+---------+-------+--------+-----+------+
Normalized (probably better to work with):
+--------+----------+---------+
| IDType | IDNumber | GroupId |
+--------+----------+---------+
| ID1 | 11111 | 1 |
| ID2 | AAAAA | 1 |
| ID2 | BBBBBB | 1 |
| ID2 | GGGGG | 1 |
| ID3 | BC1 | 1 |
| ID4 | O111 | 1 |
| ID2 | CCCCCC | 2 |
| ID1 | 99999 | 3 |
| ID2 | DDDDDD | 3 |
| ID2 | EEEEEE | 4 |
| ID4 | O222 | 4 |
| ID4 | O333 | 4 |
+--------+----------+---------+
I am looking for SQL code to generate the output above or similar normalized structure. Thanks.
EDIT:
Here is some code to create data that matches the sample data in the table above.
DROP TABLE IF EXISTS #ID
CREATE TABLE #ID
(
RowId INT,
ID1 VARCHAR(100),
ID2 VARCHAR(100),
ID3 VARCHAR(100),
ID4 VARCHAR(100)
)
INSERT INTO #ID VALUES
(1,'11111',NULL,NULL,NULL),
(2,'11111',NULL,NULL,NULL),
(3,'11111','AAAAA',NULL,NULL),
(4,NULL,'BBBBBB','BC1',NULL),
(5,NULL,NULL,'BC1','O111'),
(6,NULL,'GGGGG','BC1',NULL),
(7,NULL,'AAAAA',NULL,'O111'),
(8,NULL,'CCCCCC',NULL,NULL),
(9,'99999',NULL,NULL,NULL),
(10,'99999','DDDDDD',NULL,NULL),
(11,NULL,NULL,NULL,'O222'),
(12,NULL,'EEEEEE',NULL,'O222'),
(13,NULL,'EEEEEE',NULL,'O333')

It is easy to get your normalized output.
I'm using my query from How to find all connected subgraphs of an undirected graph with minor modification to convert your data into pairs that define edges of a graph. The query treats the data as edges in a graph and traverses recursively all edges of the graph, stopping when the loop is detected. Then it puts all found loops in groups and gives each group a number.
Your source table has four IDs, but each row can have only two IDs, so we know that each row has a pair of IDs. My query expects this kind of data (pairs of IDs). It is easy to convert four IDs into a pair - use COALESCE.
For detailed explanation of how it works, see How to find all connected subgraphs of an undirected graph.
Query
WITH
CTE_Idents
AS
(
SELECT ID1 AS Ident, 'ID1' AS IDType
FROM #T
UNION
SELECT ID2 AS Ident, 'ID2' AS IDType
FROM #T
UNION
SELECT ID3 AS Ident, 'ID3' AS IDType
FROM #T
UNION
SELECT ID4 AS Ident, 'ID4' AS IDType
FROM #T
)
,CTE_Pairs
AS
(
SELECT COALESCE(ID1, ID2, ID3, ID4) AS Ident1, COALESCE(ID4, ID3, ID2, ID1) AS Ident2
FROM #T
UNION
SELECT COALESCE(ID4, ID3, ID2, ID1) AS Ident1, COALESCE(ID1, ID2, ID3, ID4) AS Ident2
FROM #T
)
,CTE_Recursive
AS
(
SELECT
CAST(CTE_Idents.Ident AS varchar(8000)) AS AnchorIdent
, Ident1
, Ident2
, CAST(',' + Ident1 + ',' + Ident2 + ',' AS varchar(8000)) AS IdentPath
, 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1
UNION ALL
SELECT
CTE_Recursive.AnchorIdent
, CTE_Pairs.Ident1
, CTE_Pairs.Ident2
, CAST(CTE_Recursive.IdentPath + CTE_Pairs.Ident2 + ',' AS varchar(8000)) AS IdentPath
, CTE_Recursive.Lvl + 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1
WHERE
CTE_Recursive.IdentPath NOT LIKE CAST('%,' + CTE_Pairs.Ident2 + ',%' AS varchar(8000))
)
,CTE_RecursionResult
AS
(
SELECT AnchorIdent, Ident1, Ident2
FROM CTE_Recursive
)
,CTE_CleanResult
AS
(
SELECT AnchorIdent, Ident1 AS Ident
FROM CTE_RecursionResult
UNION
SELECT AnchorIdent, Ident2 AS Ident
FROM CTE_RecursionResult
)
SELECT
CTE_Idents.IDType
,CTE_Idents.Ident
,CASE WHEN CA_Data.XML_Value IS NULL
THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END AS GroupMembers
,DENSE_RANK() OVER(ORDER BY
CASE WHEN CA_Data.XML_Value IS NULL
THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END
) AS GroupID
FROM
CTE_Idents
CROSS APPLY
(
SELECT CTE_CleanResult.Ident+','
FROM CTE_CleanResult
WHERE CTE_CleanResult.AnchorIdent = CTE_Idents.Ident
ORDER BY CTE_CleanResult.Ident FOR XML PATH(''), TYPE
) AS CA_XML(XML_Value)
CROSS APPLY
(
SELECT CA_XML.XML_Value.value('.', 'NVARCHAR(MAX)')
) AS CA_Data(XML_Value)
WHERE
CTE_Idents.Ident IS NOT NULL
ORDER BY GroupID, IDType, Ident;
Result
+--------+--------+------------------------------------+---------+
| IDType | Ident | GroupMembers | GroupID |
+--------+--------+------------------------------------+---------+
| ID1 | 11111 | 11111,AAAAA,BBBBBB,BC1,GGGGG,O111, | 1 |
| ID2 | AAAAA | 11111,AAAAA,BBBBBB,BC1,GGGGG,O111, | 1 |
| ID2 | BBBBBB | 11111,AAAAA,BBBBBB,BC1,GGGGG,O111, | 1 |
| ID2 | GGGGG | 11111,AAAAA,BBBBBB,BC1,GGGGG,O111, | 1 |
| ID3 | BC1 | 11111,AAAAA,BBBBBB,BC1,GGGGG,O111, | 1 |
| ID4 | O111 | 11111,AAAAA,BBBBBB,BC1,GGGGG,O111, | 1 |
| ID1 | 99999 | 99999,DDDDDD, | 2 |
| ID2 | DDDDDD | 99999,DDDDDD, | 2 |
| ID2 | CCCCCC | CCCCCC, | 3 |
| ID2 | EEEEEE | EEEEEE,O222,O333, | 4 |
| ID4 | O222 | EEEEEE,O222,O333, | 4 |
| ID4 | O333 | EEEEEE,O222,O333, | 4 |
+--------+--------+------------------------------------+---------+
This is how your data looks like as a graph:
I rendered this image using DOT from https://www.graphviz.org/.
How to convert this nomalized output into denormalized? One way is to unpivot it using the help of IDType, though it might get tricky if the graph can have several loops. You'd better ask another question specifically about converting nomalized dataset into denormalized.

Well, this was a real brain twister ;-) and my solution is just close... Try this:
General remarks:
I do not think, that T-SQL is the right tool for this...
This structure is open to deeply nested chains. Although there are only 4 IDs, the references can lead to unlimited depth, circles and loops
This is - in a way - a gaps and island issue
The query
WITH cte AS
(
SELECT RowId
,A.ID
,A.sourceId
,ROW_NUMBER() OVER(PARTITION BY RowId ORDER BY A.SourceId) AS IdCounter
FROM #ID
CROSS APPLY (VALUES('ID1',ID1),('ID2',ID2),('ID3',ID3),('ID4',ID4)) A(sourceId,ID)
WHERE A.ID IS NOT NULL
)
,AllIDs AS
(
SELECT RowId
,MAX(CASE WHEN IdCounter=1 THEN ID END) AS FirstId
,MAX(CASE WHEN IdCounter=1 THEN sourceId END) AS FirstSource
,MAX(CASE WHEN IdCounter=2 THEN ID END) AS SecondId
,MAX(CASE WHEN IdCounter=2 THEN sourceId END) AS SecondSource
FROM cte
GROUP BY RowId
)
,recCTE AS
(
SELECT RowId
,FirstId
,FirstSource
,SecondId
,SecondSource
,CAST(N'|' + FirstId AS NVARCHAR(MAX)) AS RunningPath
FROM AllIDs WHERE SecondId IS NULL
UNION ALL
SELECT ai.RowId
,ai.FirstId
,ai.FirstSource
,ai.SecondId
,ai.SecondSource
,r.RunningPath + CAST(N'|' + ai.FirstId AS NVARCHAR(MAX))
FROM AllIDs ai
INNER JOIN recCTE r ON ai.RowId<>r.RowId AND (ai.FirstId=r.FirstId OR ai.FirstId=r.SecondId OR ai.SecondId=r.FirstId OR ai.SecondId=r.SecondId )
WHERE r.RunningPath NOT LIKE CONCAT('%|',ai.FirstId,'|%')
)
,FindIslands AS
(
SELECT FirstId
,FirstSource
,SecondId
,SecondSource
,CONCAT(CanonicalPath,'|') AS CanonicalPath
FROM recCTE
CROSS APPLY(SELECT CAST('<x>' + REPLACE(CONCAT(RunningPath,'|',SecondId),'|','</x><x>') + '</x>' AS XML)) A(Casted)
CROSS APPLY(SELECT Casted.query('
for $x in distinct-values(/x[text()])
order by $x
return <x>{concat("|",$x)}</x>
').value('.','nvarchar(max)')) B(CanonicalPath)
)
,MaxPaths AS
(
SELECT fi.CanonicalPath
,x.CanonicalPath AS BestPath
,LEN(x.CanonicalPath) AS PathLength
,ROW_NUMBER() OVER(PARTITION BY fi.CanonicalPath ORDER BY LEN(x.CanonicalPath) DESC) AS SortIndex
FROM FindIslands fi
INNER JOIN FindIslands x ON LEN(x.CanonicalPath)>=LEN(fi.CanonicalPath) AND x.CanonicalPath LIKE CONCAT('%',fi.CanonicalPath,'%' )
--GROUP BY fi.CanonicalPath
)
,AlmostCorrect AS
(
SELECT *
FROM
(
SELECT mp.BestPath,fi.FirstId AS ID,FirstSource AS IDSource
FROM FindIslands fi
INNER JOIN MaxPaths mp On mp.SortIndex=1 AND fi.CanonicalPath=mp.CanonicalPath
UNION ALL
SELECT mp.BestPath,fi.SecondId,SecondSource
FROM FindIslands fi
INNER JOIN MaxPaths mp On mp.SortIndex=1 AND fi.CanonicalPath=mp.CanonicalPath
) t
WHERE ID IS NOT NULL
GROUP BY BestPath,ID,IDSource
)
SELECT * FROm AlmostCorrect;
The result
+--------------------------------+--------+----------+
| BestPath | ID | IDSource |
+--------------------------------+--------+----------+
| |11111|AAAAA|BBBBBB|BC1|GGGGG| | 11111 | ID1 |
+--------------------------------+--------+----------+
| |11111|AAAAA|BBBBBB|BC1|GGGGG| | AAAAA | ID2 |
+--------------------------------+--------+----------+
| |11111|AAAAA|BBBBBB|BC1|GGGGG| | BBBBBB | ID2 |
+--------------------------------+--------+----------+
| |11111|AAAAA|BBBBBB|BC1|GGGGG| | BC1 | ID3 |
+--------------------------------+--------+----------+
| |11111|AAAAA|BBBBBB|BC1|GGGGG| | GGGGG | ID2 |
+--------------------------------+--------+----------+
| |11111|AAAAA|BC1|GGGGG| | BC1 | ID3 |
+--------------------------------+--------+----------+
| |11111|AAAAA|BC1|GGGGG| | GGGGG | ID2 |
+--------------------------------+--------+----------+
| |11111|AAAAA|BC1|O111| | BC1 | ID3 |
+--------------------------------+--------+----------+
| |11111|AAAAA|BC1|O111| | O111 | ID4 |
+--------------------------------+--------+----------+
| |11111|AAAAA|O111| | AAAAA | ID2 |
+--------------------------------+--------+----------+
| |11111|AAAAA|O111| | O111 | ID4 |
+--------------------------------+--------+----------+
| |99999|DDDDDD| | 99999 | ID1 |
+--------------------------------+--------+----------+
| |99999|DDDDDD| | DDDDDD | ID2 |
+--------------------------------+--------+----------+
| |CCCCCC| | CCCCCC | ID2 |
+--------------------------------+--------+----------+
| |EEEEEE|O222|O333| | EEEEEE | ID2 |
+--------------------------------+--------+----------+
| |EEEEEE|O222|O333| | O222 | ID4 |
+--------------------------------+--------+----------+
| |EEEEEE|O222|O333| | O333 | ID4 |
+--------------------------------+--------+----------+
The idea behind:
You can see the result of each intermediate step simply by using SELECT * FROM [cte-name] as last select (out-comment the current last select).
The CTE "cte" will transform your side-by-side structure to a row-based set.
Following your statement, that you have a maximum of two different ID Numbers per record the second CTE "AllIDs" will transform this set to a set with two IDs keeping knowledge of where this ID was taken from.
Now we go into recursion. We start with all IDs, where the second ID is NULL (WARNING, You might not catch all, the recursion anchor might need some more thinking) and find any linked row (either by ID1 or by ID2). While traversing down we create a path of all visited IDs and we stop, if we re-visit one of them.
The cte "FindIslands" will transform this path to XML and use XQuery's FLWOR in order to return the path alphabetically sorted.
The cte "MaxPaths" will find the longest path of a group in order to find paths which are completely embedded within other paths.
The cte "AlmostCorrect" will now re-transform this to a row-based set and pick the rows with the longest path.
What we have achieved:
All your IDs show the same "IDSource" as your own example.
You can see, how the IDs are linked with each other.
What we did not yet achieve:
The paths |11111|AAAAA|BBBBBB|BC1|GGGGG|, |11111|AAAAA|BC1|GGGGG|, |11111|AAAAA|BC1|O111|, |11111|AAAAA|O111| are treated as different, although their fragments are overlapping.
At the moment I'm to tired to think about this... Might be a get an idea tomorrow ;-)

I don't quite understand the structure of the expected result, but the key of your query is to assemble the nodes into subgraphs, while giving each subgraph an ID (you call it GroupId).
I leave the final rendering of the result to you since you probably understand in detail why you want to show it in that way. A few LEFT JOINs will do the trick.
Anyway, here's the query that produces the subgraphs:
with
p as (
select
row_id, row_id as min_id,
cast(concat(':', row_id, ':') as varchar(1000)) as walked,
case when id1 is null then ':' else cast(concat(':', id1, ':') as varchar(1000)) end as i1,
case when id2 is null then ':' else cast(concat(':', id2, ':') as varchar(1000)) end as i2,
case when id3 is null then ':' else cast(concat(':', id3, ':') as varchar(1000)) end as i3,
case when id4 is null then ':' else cast(concat(':', id4, ':') as varchar(1000)) end as i4
from t
union all
select
t.row_id, case when t.row_id < p.min_id then t.row_id else p.min_id end,
cast(concat(walked, t.row_id, ':') as varchar(1000)),
case when t.id1 is null then p.i1 else cast(concat(p.i1, id1, ':') as varchar(1000)) end,
case when t.id2 is null then p.i2 else cast(concat(p.i2, id2, ':') as varchar(1000)) end,
case when t.id3 is null then p.i3 else cast(concat(p.i3, id3, ':') as varchar(1000)) end,
case when t.id4 is null then p.i4 else cast(concat(p.i4, id4, ':') as varchar(1000)) end
from p
join t on p.i1 like concat('%:', t.id1, ':%')
or p.i2 like concat('%:', t.id2, ':%')
or p.i3 like concat('%:', t.id3, ':%')
or p.i4 like concat('%:', t.id4, ':%')
where p.walked not like concat('%:', t.row_id, ':%')
),
g as (
select min_id as min_id, min(walked) as nodes
from p
where not exists (
select 1
from t
where (p.i1 like concat('%:', t.id1, ':%')
or p.i2 like concat('%:', t.id2, ':%')
or p.i3 like concat('%:', t.id3, ':%')
or p.i4 like concat('%:', t.id4, ':%'))
and p.walked not like concat('%:', t.row_id, ':%')
)
group by min_id
)
select row_number() over(order by min_id) as group_id, nodes from g
Result:
group_id nodes
-------- ---------------
1 :1:2:3:7:5:4:6:
2 :8:
3 :10:9:
4 :11:12:13:
For reference, here's the data script I used to test:
create table t (
row_id int,
id1 int,
id2 varchar(10),
id3 varchar(10),
id4 varchar(10)
);
insert into t (row_id, id1, id2, id3, id4) values
(1, '11111', null, null, null),
(2, '11111', null, null, null),
(3, '11111', 'AAAAA', null, null),
(4, null, 'BBBBB', 'BC1', null),
(5, null, null, 'BC1', '0111'),
(6, null, 'GGGGG', 'BC1', null),
(7, null, 'AAAAA', null, '0111'),
(8, null, 'CCCCCC', null, null),
(9, '99999', null, null, null),
(10, '99999', 'DDDDD', null, null),
(11, null, null, null, '0222'),
(12, null, 'EEEEE', null, '0222'),
(13, null, 'EEEEE', null, '0333');
Note: I can imagine the performance of this query being quite slow. A solution in PostgreSQL would be much performant since -- unlike SQL Server -- it implements UNION in recursive CTEs. This could remove entire tree branches much earlier in the graph walk compared to UNION ALL (the only choice in SQL Server).

In Such question 2-3 different sample data help us understand the pattern of data.
It help in writing better query.
DROP TABLE IF EXISTS #ID
CREATE TABLE #ID
(
RowId INT,
ID1 VARCHAR(100),
ID2 VARCHAR(100),
ID3 VARCHAR(100),
ID4 VARCHAR(100)
)
INSERT INTO #ID VALUES
(1,'11111',NULL,NULL,NULL),
(2,'11111',NULL,NULL,NULL),
(3,'11111','AAAAA',NULL,NULL),
(4,NULL,'BBBBBB','BC1',NULL),
(5,NULL,NULL,'BC1','O111'),
(6,NULL,'GGGGG','BC1',NULL),
(7,NULL,'AAAAA',NULL,'O111'),
(8,NULL,'CCCCCC',NULL,NULL),
(9,'99999',NULL,NULL,NULL),
(10,'99999','DDDDDD',NULL,NULL),
(11,NULL,NULL,NULL,'O222'),
(12,NULL,'EEEEEE',NULL,'O222'),
(13,NULL,'EEEEEE',NULL,'O333')
;With CTE as
(
select distinct RowId, IDNumber,IDType
--,ROW_NUMBER()over(order by rowid)rn
from
(select * from #ID)p
unpivot(IDNumber for IDType in(ID1,ID2,ID3,ID4)) as unpvt
)
,CTE2 as
(
select c.*
,ROW_NUMBER()over(partition by rowid order by rowid desc)rn1
from CTE C
)
,CTE3 as
(
select *
,dense_rank()over( order by idnumber)rn3
--,1 rn3
from cte2 c
where rn1=1
and not exists(select 1 from cte2 c1
where c1.RowId=c.RowId and c1.rn1>c.rn1)
)
,CTE4 as
(
select RowId,IDNumber,IDType,rn3 as Groupid
,1 lvl
from cte3 c
where rowid>1
union all
select c.RowId,c.IDNumber,c.IDType,c1.GroupID
,lvl+1
from CTE2 C
inner join CTE4 C1 on (
(c.IDNumber=c1.IDNumber and c.RowId<>c1.RowId )
or (c.RowId=c1.RowId and c.IDNumber<>c1.IDNumber)
)
where lvl<=8
)
select distinct IDNumber,IDType,Groupid
--,RowId
from cte4
order by Groupid
IN CTE I have first UnPivoted the result.
In CTE2 & CTE3 together I am creating GroupID beforehand, according to what I have understood.
CTE4 is recursive .
This script can be optimized after checking 2-3 different sample data.
With CTE4 result it can be again PIVOTED to your DeNormalize form.
I think this is ideal situation to try Cursor with optimized script.

Related

sum of columns and list difference between rows

I am trying to get the difference between rows based on group by SELL_ID on the below table,
table1 - (table formatting courtesy of GitHub)
+---------+---------+----------+----------+------------------+---------+
| seq_ID | REQ_ID | CALL_ID | SELL_ID | REGION | COUNT |
+---------+---------+----------+----------+------------------+---------+
| 1 | 123 | C001 | S1 | AGL | 510563 |
| 2 | 123 | C001 | S1 | USL | 122967 |
| 3 | 123 | C001 | S1 | VALIC | 614106 |
| 4 | 123 | C001 | S2 | Inforce | 1247636 |
| 5 | 123 | C001 | S2 | NB | 0 |
| 6 | 123 | C001 | S3 | Seriatim Summary | 1247636 |
+---------+---------+----------+----------+------------------+---------+
I am trying to get the results as below,
table2 -
+---------+---------+----------+----------+-------+
| seq_ID | REQ_ID | CALL_ID | Summary | COUNT |
+---------+---------+----------+----------+-------+
| 1 | 123 | C001 | S1_vs_S2 | 0 |
| 2 | 123 | C001 | S2_vs_S3 | 0 |
| 3 | 123 | C001 | S3_vs_s1 | 0 |
+---------+---------+----------+----------+-------+
S1_vs_S2 is the difference between (sum(count) from table1 where sell_id='S1') and (sum(count) from table1 where sell_id='S2')
Below is the code that i am using, But couldn't fetch the results,
INSERT INTO table2 (SEQ_ID, REQ_ID,call_id,summary,count)
SELECT min(seq_id) seq_id
, req_id
, call_id
, S1_vs_S2
,((SELECT sum(c2) FROM TABLE_STG_CTRL WHERE source='S1')-
SELECT sum(c2) FROM TABLE_STG_CTRL WHERE source='S2'))
FROM table1
GROUP BY req_ID, Ctrl_ID, c1, source
ORDER BY SEQ_ID ;
Does this do what you want?
select req_id, call_id, sell_id,
lead(sell_id) over (partition by req_id, call_id order by seq_id) as next_sell_id,
(cnt -
lead(cnt) over (partition by req_id, call_id order by seq_id)
) as diff
from (select req_id, call_id, sell_id, sum(count) as cnt, min(seq_id) as seq_id
from t
group by req_id, call_id, sell_id
) t
At first group data on sell_id, req_id, call_id. This is subquery t in my code. Then self join properly this result and show difference. The only problem is to construct join condition carefully:
demo with your sample data
with t as (
select sell_id sid, req_id, call_id, sum(cnt) cnt
from table1
group by sell_id, req_id, call_id )
select case t1.sid when 'S1' then 1 when 'S2' then 2 when 'S3' then 3 end id,
t1.req_id, t1.call_id, t1.sid||'_vs_'||t2.sid call_id, t1.cnt - t2.cnt diff
from t t1
join t t2 on t1.req_id = t2.req_id
and t1.call_id = t2.call_id
and (t1.sid, t2.sid) in (('S1', 'S2'), ('S2', 'S3'), ('S3', 'S1'))
order by id
BTW count is Oracle reserved word, please avoid such names when naming columns etc.

Five Columns to a single row

I have the following data
+--------+
| orders |
+--------+
| S1 |
| S2 |
| S3 |
| S4 |
| S5 |
| S6 |
| S7 |
| S8 |
| S9 |
| S10 |
| S11 |
| S12 |
+--------+
I am required to return the result as follows - fit five rows in one column:
+-----------------+
| Orders |
+-----------------+
| S1,S2,S3,S4,S5 |
| S6,S7,S8,S9,S10 |
| S11,S12 |
+-----------------+
There is nothing to group on or segregate these into rows. So I assigned a row_number and did mod 5 on the row_number. It almost works, but not quite.
Here is what I have tried:
;with mycte as (
select
'S1' as orders
union all select
'S2'
union all select
'S3'
union all select
'S4'
union all select
'S5'
union all select
'S6'
union all select
'S7'
union all select
'S8'
union all select
'S9'
union all select
'S10'
union all select
'S11'
union all select
'S12'
)
,mycte2 as (
Select
orders
,ROW_NUMBER() over( order by orders) %5 as rownum
from mycte
)
select distinct
STUFF((
SELECT ',' + mycte2.orders
FROM mycte2
where t1.rownum= mycte2.rownum
FOR XML PATH('')
), 1, 1, '') orders
, rownum
from mycte2 t1
the result is :
+-----------+--------+
| orders | rownum |
+-----------+--------+
| S1,S3,S8 | 1 |
| S10,S4,S9 | 2 |
| S11,S5 | 3 |
| S12,S6 | 4 |
| S2,S7 | 0 |
+-----------+--------+
Can someone please show me how to get to my desired result?
How about
CREATE TABLE T
([orders] varchar(3));
INSERT INTO T
([orders])
VALUES
('S1'),
('S2'),
('S3'),
('S4'),
('S5'),
('S6'),
('S7'),
('S8'),
('S9'),
('S10'),
('S11'),
('S12');
WITH CTE AS
(
SELECT Orders,
(ROW_NUMBER() OVER(ORDER BY LEN(Orders)) - 1) / 5 RN
FROM T
)
SELECT STRING_AGG(Orders, ',')
FROM CTE
GROUP BY RN
ORDER BY RN;
OR
SELECT STUFF(
(
SELECT ',' + Orders
FROM CTE
WHERE RN = TT.RN
FOR XML PATH('')
), 1, 1, ''
) Orders
FROM CTE TT
GROUP BY RN
ORDER BY RN;
You can use (SELECT 1) instead of LEN(Orders)
Returns:
+-----------------+
| Orders |
+-----------------+
| S1,S2,S3,S4,S5 |
| S6,S7,S8,S9,S10 |
| S11,S12 |
+-----------------+
Demo

Convert Rows to Columns in one SQL but not impact number rows

I have a table its structure and data look like this:
then I want a SQL to convert it like this:
I really don't know how to write a SQL to accomplish this function, can anyone help me? I have referenced a lot of previous answers for this kind of topic but I cannot find one for my case. can anyone help me, please.
You can do it with a hierarchical query if you only have three levels to consider:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( LineItem_Name, LineItem_Id, parent_id, dept_name, product_name ) AS
SELECT 'ABC', 1, NULL, 'D1', 'P1' FROM DUAL UNION ALL
SELECT 'CDF', 2, 1, 'D2', 'P2' FROM DUAL UNION ALL
SELECT 'EFG', 3, 1, 'D3', 'P3' FROM DUAL UNION ALL
SELECT 'HIJ', 4, 2, 'D4', 'P4' FROM DUAL;
Query 1:
SELECT CONNECT_BY_ROOT( LineItem_Name) AS LineItem_Level1,
CASE LEVEL
WHEN 3 THEN PRIOR LineItem_Name
WHEN 2 THEN LineItem_Name
END AS LineItem_Level2,
CASE LEVEL
WHEN 3 THEN LineItem_Name
END AS LineItem_Level3,
dept_name,
product_name
FROM table_name
START WITH parent_id IS NULL
CONNECT BY PRIOR LineItem_ID = parent_id
Results:
| LINEITEM_LEVEL1 | LINEITEM_LEVEL2 | LINEITEM_LEVEL3 | DEPT_NAME | PRODUCT_NAME |
|-----------------|-----------------|-----------------|-----------|--------------|
| ABC | (null) | (null) | D1 | P1 |
| ABC | CDF | (null) | D2 | P2 |
| ABC | CDF | HIJ | D4 | P4 |
| ABC | EFG | (null) | D3 | P3 |
Query 2: This is an alternative using recursive sub-query factoring which will get the grandparent and parent of the current line item; which is slightly different to the previous query but for 3 levels would give you the same result.
WITH tree ( id, grandparent, parent, item, dept_name, product_name ) AS (
SELECT LineItem_id,
NULL,
NULL,
LineItem_name,
dept_name,
product_name
FROM table_name
WHERE parent_id IS NULL
UNION ALL
SELECT t.lineItem_id,
p.parent,
p.item,
t.lineItem_name,
t.dept_name,
t.product_name
FROM tree p
INNER JOIN
table_name t
ON ( p.id = t.parent_id )
)
SELECT COALESCE( grandparent, parent, item ) AS LineItem_Level1,
CASE
WHEN parent IS NULL THEN NULL
WHEN grandparent IS NULL THEN item
ELSE parent
END AS LineItem_Level2,
NVL2( grandparent, item, NULL ) AS LineItem_Level3,
dept_name,
product_name
FROM tree
Results:
| LINEITEM_LEVEL1 | LINEITEM_LEVEL2 | LINEITEM_LEVEL3 | DEPT_NAME | PRODUCT_NAME |
|-----------------|-----------------|-----------------|-----------|--------------|
| ABC | (null) | (null) | D1 | P1 |
| ABC | CDF | (null) | D2 | P2 |
| ABC | EFG | (null) | D3 | P3 |
| ABC | CDF | HIJ | D4 | P4 |

Get one row of grouped objects

I have a table that contains a Husband to Wife realtion.
The table contains two rows for each realtion(BTW,gender has no meaning.it could be Husband-Husband and Wife-Wife. just saying). meaning, the table might show result of two rows for a "connection":
Wife--Husband and\or Husband--Wife
The table looks like this:
Id1 | Id2 | ConnectiondID | RelatedConnectionId
-----------------------------------------------------
123 | 333 | FF45 | F421
333 | 123 | F421 | FF45
456 | 987 | F333 | F321
987 | 456 | F321 | F333
My expected result is to have only one relation per group:
Id1 | Id2
----------
123 | 333
456 | 987
This is actually very simple assuming you only want couples and your ID values are all unique and numeric, and does not require any self joins, functions or grouping:
declare #t table(Id1 int,Id2 int,ConnectiondID nvarchar(5),RelatedConnectionId nvarchar(5));
insert into #t values(123,333,'FF45','F421'),(333,123,'F421','FF45'),(456,444,'FF46','F422'),(444,456,'F422','FF46'),(789,555,'FF47','F423'),(555,789,'F423','FF47');
select *
from #t
where Id1 < Id2
order by Id1
Output:
+-----+-----+---------------+---------------------+
| Id1 | Id2 | ConnectiondID | RelatedConnectionId |
+-----+-----+---------------+---------------------+
| 123 | 333 | FF45 | F421 |
| 444 | 456 | F422 | FF46 |
| 555 | 789 | F423 | FF47 |
+-----+-----+---------------+---------------------+
If I am understanding your question correctly, you need to perform a self-join on the table e.g. ON t1.id1 = t2.id2 or ON t1.ConnectionId = t2.RelatedConnectionID and obviously this is joining both ways.
To limit this to just one way add a condition on the join predicate such that one of the values is less than or greater than the other; e.g.
DECLARE #tbl table( Id1 smallint PRIMARY KEY, Id2 smallint,ConnectiondID char(5),RelatedConnectionId char(5));
INSERT #tbl(Id1,Id2,ConnectiondID,RelatedConnectionId)
VALUES(123,333,'FF45','F421'),
(333,123,'F421','FF45'),
(456,222,'FF45','F421'),
(222,456,'F421','FF45'),
(789,111,'FF45','F421'),
(111,789,'F421','FF45');
SELECT *
FROM #tbl t1
JOIN #tbl t2 ON t2.Id1 = t1.Id2 AND t2.Id1 > t1.Id1;
For example
DECLARE #T TABLE (id1 int, id2 int,ConnectiondID varchar(5),RelatedConnectionId varchar(5) )
INSERT INTO #T (Id1,Id2,ConnectiondID,RelatedConnectionId)
VALUES
(123 , 333 ,'FF45','F421'),
(333 , 123 , 'F421','FF45'),
(2123 , 2333 ,'2FF45','2F421'),
(2333 , 2123 , '2F421','2FF45'),
(3 , 2 , 'AAAA','BBB'),
(2 , 3 , 'BBB','AAAA')
SELECT
a.*
FROM
#t a
WHERE
CASE
WHEN ConnectiondID > RelatedConnectionId
THEN RelatedConnectionId
ELSE NULL
END IS NULL

SQL combine 2 table and pivot

I don't understand how PIVOT works in SQL. I have 2 tables and I would like to pivot 1 of them in order to get only 1 table with all the data together. I've attached an image with the tables I have and the result that I would like to get.
CREATE TABLE TABLE1
([serie_id] varchar(4), [Maturity] int, [Strategy] int, [Lifetime] varchar(4), [L_max] decimal(10, 5), [W_max] decimal(10, 5), [H_max] decimal(10, 5))
;
INSERT INTO TABLE1
([serie_id], [Maturity], [Strategy], [Lifetime], [L_max], [W_max], [H_max])
VALUES
('id_1', 3, 1, '2', 2.200, 1.400, 1.400),
('id_2', 3, 1, '2', 3.400, 1.800, 2.100),
('id_3', 3, 1, NULL, 24.500, 14.500, 15.000),
('id_4', 3, 1, NULL, 28.000, 24.500, 14.000)
;
CREATE TABLE TABLE2
([serie_id] varchar(4), [L_value] decimal(10, 5), [lrms] decimal(10, 5), [latTmax] decimal(10, 5), [Rdc] decimal(10, 5))
;
INSERT INTO TABLE2
([serie_id], [L_value], [lrms], [latTmax], [Rdc])
VALUES
('id_1', 67.000, 400.000, 400.000, 0.250),
('id_1', 90.000, 330.000, 330.000, 0.350),
('id_1', 120.000, 370.000, 370.000, 0.300),
('id_1', 180.000, 330.000, 300.000, 0.350),
('id_2', 260.000, 300.000, 300.000, 0.400),
('id_2', 360.000, 280.000, 280.000, 0.450),
('id_3', 90.000, 370.000, 370.000, 0.300),
('id_4', 160.000, 340.000, 340.000, 0.400)
;
SQLFiddle
If someone could help me with the SQL query I would appreciate it so much.
In order to get your final result, you are going to have to implement a variety of methods including unpivot, pivot, along with the use of a windowing function like row_number().
Since you have multiple columns in Table2 that need to be pivoted, then you will need to unpivot them first. This is the reverse of pivot, which converts your multiple columns into multiple rows. But before you unpivot, you need some value to identify the values of each row using row_number() - sounds complicated, right?
First, query table2 using the windowing function row_number(). This creates a unique identifier for each row and allows you to easily be able to associate the values for id_1 from any of the others.
select serie_id, l_value, lrms, latTmax, Rdc,
rn = cast(row_number() over(partition by serie_id order by serie_id)
as varchar(10))
from table2;
See Demo. Once you've created this unique identifier, then you will unpivot the L_value, lrms, latTmax, and rdc. You can unpivot the data using several different methods, including the unpivot function, CROSS APPLY, or UNION ALL.
select serie_id,
col, value
from
(
select serie_id, l_value, lrms, latTmax, Rdc,
rn = cast(row_number() over(partition by serie_id order by serie_id)
as varchar(10))
from table2
) d
cross apply
(
select 'L_value_'+rn, L_value union all
select 'lrms_'+rn, lrms union all
select 'latTmax_'+rn, latTmax union all
select 'Rdc_'+rn, Rdc
) c (col, value)
See SQL Fiddle with Demo. The data from table2 is not in a completely different format that can be pivoted into the new columns:
| SERIE_ID | COL | VALUE |
|----------|-----------|-------|
| id_1 | L_value_1 | 67 |
| id_1 | lrms_1 | 400 |
| id_1 | latTmax_1 | 400 |
| id_1 | Rdc_1 | 0.25 |
| id_1 | L_value_2 | 90 |
| id_1 | lrms_2 | 330 |
| id_1 | latTmax_2 | 330 |
| id_1 | Rdc_2 | 0.35 |
The final step would be to PIVOT the data above into the final result:
select serie_id, maturity, strategy, lifetime, l_max, w_max, h_max,
L_value_1, lrms_1, latTmax_1, Rdc_1,
L_value_2, lrms_2, latTmax_2, Rdc_2,
L_value_3, lrms_3, latTmax_3, Rdc_3,
L_value_4, lrms_4, latTmax_4, Rdc_4
from
(
select t1.serie_id, t1.maturity, t1.strategy, t1.lifetime,
t1.l_max, t1.w_max, t1.h_max,
t2.col, t2.value
from table1 t1
inner join
(
select serie_id,
col, value
from
(
select serie_id, l_value, lrms, latTmax, Rdc,
rn = cast(row_number() over(partition by serie_id order by serie_id)
as varchar(10))
from table2
) d
cross apply
(
select 'L_value_'+rn, L_value union all
select 'lrms_'+rn, lrms union all
select 'latTmax_'+rn, latTmax union all
select 'Rdc_'+rn, Rdc
) c (col, value)
) t2
on t1.serie_id = t2.serie_id
) d
pivot
(
max(value)
for col in (L_value_1, lrms_1, latTmax_1, Rdc_1,
L_value_2, lrms_2, latTmax_2, Rdc_2,
L_value_3, lrms_3, latTmax_3, Rdc_3,
L_value_4, lrms_4, latTmax_4, Rdc_4)
) p;
See SQL Fiddle with Demo.
If you have an unknown number of values in Table2 then you will need to use dynamic SQL to create a sql string that will be executed. Converting the above code to dynamic sql is pretty easy once you have the logic correct. The code will be:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols
= STUFF((SELECT ',' + QUOTENAME(col+cast(rn as varchar(10)))
from
(
select rn = cast(row_number() over(partition by serie_id order by serie_id)
as varchar(10))
from table2
) d
cross apply
(
select 'L_value_', 0 union all
select 'lrms_', 1 union all
select 'latTmax_', 2 union all
select 'Rdc_', 3
) c (col, so)
group by col, rn, so
order by rn, so
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = N'SELECT serie_id, maturity, strategy, lifetime, l_max,
w_max, h_max,' + #cols + N'
from
(
select t1.serie_id, t1.maturity, t1.strategy, t1.lifetime,
t1.l_max, t1.w_max, t1.h_max,
t2.col, t2.value
from table1 t1
inner join
(
select serie_id,
col, value
from
(
select serie_id, l_value, lrms, latTmax, Rdc,
rn = cast(row_number() over(partition by serie_id order by serie_id)
as varchar(10))
from table2
) d
cross apply
(
select ''L_value_''+rn, L_value union all
select ''lrms_''+rn, lrms union all
select ''latTmax_''+rn, latTmax union all
select ''Rdc_''+rn, Rdc
) c (col, value)
) t2
on t1.serie_id = t2.serie_id
) x
pivot
(
max(value)
for col in (' + #cols + N')
) p '
exec sp_executesql #query
See SQL Fiddle with Demo
Both versions will give a result of:
| SERIE_ID | MATURITY | STRATEGY | LIFETIME | L_MAX | W_MAX | H_MAX | L_VALUE_1 | LRMS_1 | LATTMAX_1 | RDC_1 | L_VALUE_2 | LRMS_2 | LATTMAX_2 | RDC_2 | L_VALUE_3 | LRMS_3 | LATTMAX_3 | RDC_3 | L_VALUE_4 | LRMS_4 | LATTMAX_4 | RDC_4 |
|----------|----------|----------|----------|-------|-------|-------|-----------|--------|-----------|-------|-----------|--------|-----------|--------|-----------|--------|-----------|--------|-----------|--------|-----------|--------|
| id_1 | 3 | 1 | 2 | 2.2 | 1.4 | 1.4 | 67 | 400 | 400 | 0.25 | 90 | 330 | 330 | 0.35 | 120 | 370 | 370 | 0.3 | 180 | 330 | 300 | 0.35 |
| id_2 | 3 | 1 | 2 | 3.4 | 1.8 | 2.1 | 260 | 300 | 300 | 0.4 | 360 | 280 | 280 | 0.45 | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) |
| id_3 | 3 | 1 | (null) | 24.5 | 14.5 | 15 | 90 | 370 | 370 | 0.3 | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) |
| id_4 | 3 | 1 | (null) | 28 | 24.5 | 14 | 160 | 340 | 340 | 0.4 | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) |