Find all possible pairs based on transitivity via T-SQL - sql

I have a concern that doesn't let me go for already several days.
The collective mind is the last resort I may rely on.
Assume we have a table with two columns. Actually, the values are GUIDs, but for the sake of simplicity let's take them as letters.
| a | b |
|---|---|
| x | y |
| y | x |
| y | z |
| z | y |
| m | n |
| m | z |
I need to create a T-SQL query that will present all the possible pairs out of trasitivity, i.e. if x=y, y=z, then x=z. Also, simmetry has to be there, i.e. if there is x=y, then there should be y=x as well.
In this particular case, I believe there is "full house", meaning that every letter is connected to all others through the intermediates. But I need a query that will show that.
All I did is here (SQLFiddle fails to run it):
WITH
t AS
(SELECT 'x' AS a, 'y' AS b
UNION ALL
SELECT 'y' AS a, 'x' AS b
UNION ALL
SELECT 'y' AS a, 'z' AS b
UNION ALL
SELECT 'z' AS a, 'y' AS b
UNION ALL
SELECT 'm' AS a, 'n' AS b
UNION ALL
SELECT 'm' AS a, 'z' AS b),
coupled_reflective AS --for reflective couples we take either of them
(SELECT t2.a, t2.b
FROM t t1
JOIN t t2 ON t1.a=t2.b
AND t1.b!=t2.a),
reversive_coupled_reflective AS --that's another half of the above couples (reversed)
(SELECT t2.b, t2.a
FROM t t1
JOIN t t2 ON t1.a=t2.b
AND t1.b!=t2.a),
rs AS -- reduce the initial set (t)
(SELECT *
FROM coupled_reflective
UNION
SELECT *
FROM t
EXCEPT
SELECT *
FROM reversive_coupled_reflective),
cte AS -- recursively iterate through the set to find transitive values (get linked by the left field)
(SELECT a, b
FROM rs
UNION ALL
SELECT rs.b, cte.b
FROM rs
JOIN cte ON rs.a=cte.a
AND rs.b!=cte.b),
cte2 AS -- recursively iterate through the set to find transitive values (get linked by the right field)
(SELECT a, b
FROM rs
UNION ALL
SELECT rs.a, cte.a
FROM rs
JOIN cte ON rs.b=cte.b
AND rs.a!=cte.a)
SELECT a, b FROM cte2
UNION
SELECT a, b FROM cte
UNION
SELECT a, b FROM t
UNION
SELECT b, a FROM t
But that doesn't do the trick, unfortunately.
The desired result should be
| a | b |
|---|---|
| x | y |
| y | x |
| y | z |
| z | y |
| m | n |
| m | z |
| n | m |
| z | m |
| x | z |
| z | x |
| x | m |
| m | x |
| x | n |
| n | x |
| y | m |
| m | y |
| y | n |
| n | y |
Is there a SQL-gifted buddy out there who can help me here, please?
Thanks.

You can use recursive CTEs, but you need a list of already visited nodes. You can implement that using a string:
with cte as (
select a, b, cast('{' + a + '}{' + b + '}' as varchar(max)) as visited
from t
union all
select cte.a, t.b,
(visited + '{' + t.b + '}')
from cte join
t
on cte.b = t.a
where cte.visited not like '%{' + t.b + '}%'
)
select distinct a, b
from cte;
Note:
The above follows the directed links in the graph. If you want undirected links, then include both:
with t as (
select a, b from yourtable
union
select b, a from yourtable
),
The rest of the logic follows using t.

Related

How to transpose table in presto?

I have a question, which results in
A | B | C | D | E
2. | 3 | 10| 25| 60
How to convert it to
Reason | Percentage
A. | 2
B. | 3
C. | 10
D. | 25
E. | 60
(ignore the pots)
Thanks
You can try to use UNNEST function.
SELECT v.*
FROM T
CROSS JOIN UNNEST(ARRAY[
ROW('A', A),
ROW('B', B),
ROW('C', C),
ROW('D', D),
ROW('E', E)
]) AS v(Reason, Percentage);
another way is using UNION ALL
SELECT 'A' Reason , A Percentage FROM T
UNION ALL
SELECT 'B' Reason , B Percentage FROM T
UNION ALL
SELECT 'C' Reason , C Percentage FROM T
//....

SQL cross match IDs to create new cross-platform ID -> how to optimize

I have a Redshift table with two columns which shows which ID's are connected, that is, belonging to the same person. I would like to make a mapping (extra column) with a unique person ID using SQL.
The problem is similar to this one: SQL: creating unique id for item with several ids
However in my case the ID's in both columns are of a different kind, and therefor the suggested joining solution (t1.epid = t2.pid, etc..) will not work.
In below example there are 4 individual persons using 9 IDs of type 1 and 10 IDs of type 2.
ID_type1 | ID_type2
---------+--------
1 | A
1 | B
2 | C
3 | C
4 | D
4 | E
5 | E
6 | F
7 | G
7 | H
7 | I
8 | I
8 | J
9 | J
9 | B
What I am looking for is an extra column with a mapping to a unique ID for the person. The difficulty is in correctly identifying the IDs related to persons like x & z which have multiple IDs of both types. The result could look something this:
ID_type1 | ID_type2 | ID_real
---------+---------------------
1 | A | z
1 | B | z
2 | C | y
3 | C | y
4 | D | x
4 | E | x
5 | E | x
6 | F | w
7 | G | z
7 | H | z
7 | I | z
8 | I | z
8 | J | z
9 | J | z
9 | B | z
I wrote below query which goes up to 4 loops and does the job for a small dataset, however is struggling with larger sets as the number of rows after joining increase very fast each loop. I am stuck in finding ways to do this more effective / efficient.
WITH
T1 AS(
SELECT DISTINCT
l1.ID_type1 AS ID_type1,
r1.ID_type1 AS ID_type1_overlap
FROM crossmatch_example l1
LEFT JOIN crossmatch_example r1 USING(ID_type2)
ORDER BY 1,2
),
T2 AS(
SELECT DISTINCT
l1.ID_type1,
r1.ID_type1_overlap
FROM T1 l1
LEFT JOIN T1 r1 on l1.ID_type1_overlap = r1.ID_type1
ORDER BY 1,2
),
T3 AS(
SELECT DISTINCT
l1.ID_type1,
r1.ID_type1_overlap
FROM T2 l1
LEFT JOIN T2 r1 on l1.ID_type1_overlap = r1.ID_type1
ORDER BY 1,2
),
T4 AS(
SELECT DISTINCT
l1.ID_type1,
r1.ID_type1_overlap
FROM T3 l1
LEFT JOIN T3 r1 on l1.ID_type1_overlap = r1.ID_type1
ORDER BY 1,2
),
mapping AS(
SELECT ID_type1,
min(ID_type1_overlap) AS mapped
FROM T4
GROUP BY 1
ORDER BY 1
),
output AS(
SELECT DISTINCT
l1.ID_type1::INT AS ID_type1,
l1.ID_type2,
FUNC_SHA1(r1.mapped) AS ID_real
FROM crossmatch_example l1
LEFT JOIN mapping r1 on l1.ID_type1 = r1.ID_type1
ORDER BY 1,2)
SELECT * FROM output
What you're trying to do is called Transitive Closure. There are articles about how to implement it in SQL.
This is an example in Spark linq-like dsl https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/SparkTC.scala.
The solution to the problem is iterative, and to fully resolve the graph, you may need to apply more iterations. What can be optimised is the input for each iteration. I remember working on it once, but cannot recall the details.

SQL/HIVE - How do I query from a horizontal output to vertical output?

I have a query that returns two rows with the information needed
SELECT src_file_dt, a, b ,c FROM my_table WHERE src_file_dt IN ('1531675040', '1531675169');
it will return:
src_file_dt | a | b | c
1531675040 | 2 | 6 | 9
1531675169 | 8 | 2 | 0
Now, I need the data in the following layout, how do I get it like this output:
fields | prev (1531675040) | curr (1531675169)
a | 2 | 8
b | 6 | 2
c | 9 | 0
There should be easier way to achieve that.
I've build results using explode function and selecting separately data for prev and next columns:
select t1.keys, t1.vals as prev, t2.vals as next from (
SELECT explode(map('a', a, 'b', b, 'c',c)) as (keys, vals)
FROM my_table
WHERE src_file_dt = '1531675040'
) t1,
(
SELECT explode(map('a', a, 'b', b, 'c',c)) as (keys, vals)
FROM my_table
WHERE src_file_dt = '1531675169'
) t2
where t1.keys = t2.keys
;

Redshift create all the combinations of any length for the values in one column

How can we create all the combinations of any length for the values in one column and return the distinct count of another column for that combination?
Table:
+------+--------+
| Type | Name |
+------+--------+
| A | Tom |
| A | Ben |
| B | Ben |
| B | Justin |
| C | Ben |
+------+--------+
Output Table:
+-------------+-------+
| Combination | Count |
+-------------+-------+
| A | 2 |
| B | 2 |
| C | 1 |
| AB | 3 |
| BC | 2 |
| AC | 2 |
| ABC | 3 |
+-------------+-------+
When the combination is only A, there are Tom and Ben so it's 2.
When the combination is only B, 2 distinct names so it's 2.
When the combination is A and B, 3 distinct names: Tom, Ben, Justin so it's 3.
I'm working in Amazon Redshift. Thank you!
NOTE: This answers the original version of the question which was tagged Postgres.
You can generate all combinations with this code
with recursive td as (
select distinct type
from t
),
cte as (
select td.type, td.type as lasttype, 1 as len
from td
union all
select cte.type || t.type, t.type as lasttype, cte.len + 1
from cte join
t
on 1=1 and t.type > cte.lasttype
)
You can then use this in a join:
with recursive t as (
select *
from (values ('a'), ('b'), ('c'), ('d')) v(c)
),
cte as (
select t.c, t.c as lastc, 1 as len
from t
union all
select cte.type || t.type, t.type as lasttype, cte.len + 1
from cte join
t
on 1=1 and t.type > cte.lasttype
)
select type, count(*)
from (select name, cte.type, count(*)
from cte join
t
on cte.type like '%' || t.type || '%'
group by name, cte.type
having count(*) = length(cte.type)
) x
group by type
order by type;
There is no way to generate all possible combinations (A, B, C, AB, AC, BC, etc) in Amazon Redshift.
(Well, you could select each unique value, smoosh them into one string, send it to a User-Defined Function, extract the result into multiple rows and then join it against a big query, but that really isn't something you'd like to attempt.)
One approach would be to create a table containing all possible combinations — you'd need to write a little program to do that (eg using itertools in Python). Then, you could join the data against that reasonably easy to get the desired result (eg IF 'ABC' CONTAINS '%A%').

Hierarchy SQL server joining two different tables

I'm having a problem making a hierarchy query.
I have the following in MS SQL DB:
Table A - Orders with order code, article, qty:
OP | ART | QTY
A | X |100
B | Y |200
Table B with assembly references of articles, but articles CAN be made from other articles if there exist a child's reference (may need to go 3 levels deep):
ART | ART2 |QTY
X | U | 20
X | O | 10
X | Z | 30
Y | Q | 20
Y | W | 15
Y | E | 30
U | Z | 10
And I want to get something like this:
A.OP |LEVEL| ART | B.ART2 |QTY
A | 2 | X | Z |(100*20*10)=2000
A | 1 | X | O |(100*10) =1000
A | 1 | X | Z |(100*30) = 3000
B | 1 | Y | Q |(200*20) = 4000
B | 1 | Y | W |(200*15) = 3000
B | 1 | Y | E |(200*30) = 6000
B | 1 | Y | Z |(200*10) = 2000
I've already made one thing:
WITH X AS (
SELECT
firstlvl.ART,
1 AS LEVEL,
firstlvl.ART2,
firstlvl.QTY,
QTY AS PARENTQTY
FROM B AS firstlvl
WHERE firstlvl.ART='X'
UNION ALL
SELECT secondlevel.ART,
EL.LEVEL +1,
secondlevel.BDT_MLC,
secondlevel.ART2,
secondlevel.QTY,
EL.PARENTQTY AS PARENTQTY
FROM B AS secondlevel
INNER JOIN X AS EL
ON secondlevel.ART = EL.ART2)
SELECT * FROM X
But now I don't know how to join quantities with table A nor how to run this query for all items on the first table.
Can anyone help me please?
Many Thanks!
with AllData as
( select op as art, art as art2, qty from a
union all
select * from b),
Tree(RootLvl, Father, Child, qty, lvl) as (
select art, art, art2, qty, 0 from AllData
where art in ( 'A' , 'B')
union all
select C.RootLvl, C.Child, AllData.art2, AllData.qty * c.qty, C.lvl + 1 from Tree C
join AllData
on C.Child = AllData.art
)
select C1.RootLvl as OP, C1.Lvl, C1.Child, C1.Qty from Tree c1
left join Tree c2
on c1.child = c2.father and
c1.rootLvl = c2.RootLvl
where C2.RootLvl is null
order by 1, 2 desc
SQL Fiddle demo
The A and B tables are almost the same. I made one from them to just simplify it. After that we use recursion to create tree and at the end I use left join to get only leafs from tree.