SQL Query - Indirect joining of two tables - sql

I have two tables like the following
Table1
COL1 COL2 COL3
A 10 ABC
A 11 ABC
A 1 DEF
A 2 DEF
B 10 ABC
B 11 ABC
B 1 DEF
C 3 DEF
C 12 ABC
C 21 GHI
Table2
COL1 GHI ABC DEF
A1 21 10 1
A2 21 12 1
A3 21 10 1
A4 23 10 1
A5 25 11 3
A6 21 14 3
A7 25 11 1
A8 23 10 1
A9 29 10 2
A10 21 12 3
I have created another temporary table that returns all the distinct values from tbl1.col1
The values of col3 in tbl1 are columns in tbl2, which are populated by some values.
What I need is for each of these distinct values of table1.column1, (A, B, C) in this case, return a combination of table2.column1 and table1.column1 such that
the ABC value of table2.column1 matches any of the ABC value of the "group" from table1,
AND the DEF value of table2.column1 matches any of the DEF value of the "group" from table1,
AND IF THE GROUP CONTAINS GHI VALUES, the GHI value of table2.column1 matches any of the GHI value of the "group" from table1
So, I would need something like the following
Output Table
Table2.COL1 Table1.Col1
A1 A
A3 A
A4 A
A7 A
A8 A
A9 A
A1 B
A3 B
A4 B
A7 B
A8 B
A10 C
I tried something like this, but Im not sure if this is the right way of approaching
select table2.col1, temp_distinct_table.column1
from table2, temp_distinct_table
where table2.def IN (SELECT col2
FROM table1
WHERE table1.col1 = temp_distinct_table.col1
AND table1.col3 = 'DEF')
AND table2.abc IN (SELECT col2
FROM table1
WHERE table1.col1 = temp_distinct_table.col1
AND table1.col3 = 'ABC')
AND (
table2.ghi IN (SELECT col2
FROM table1
WHERE table1.col1 = temp_distinct_table.col1
AND table1.col3 = 'GHI')
OR NOT EXISTS (SELECT col2
FROM table1
WHERE table1.col1 = temp_distinct_table.col1
AND table1.col3 = 'GHI')
)
where temp_distinct_table contains of all the distinct values from table1.col1
Could someone guide me on the matter?

Another approach, counting how many matches there are for each t1.col/t2.col combination after joining all the possible matches:
select distinct t2_col1, t1_col1
from (
select t2.col1 as t2_col1, t1.col1 as t1_col1, t1.ghi_count as t1_ghi_count,
count(case when t1.col3 = 'ABC' then 1 end)
over (partition by t1.col1, t2.col1) as abc_matches,
count(case when t1.col3 = 'DEF' then 1 end)
over (partition by t1.col1, t2.col1) as def_matches,
count(case when t1.col3 = 'GHI' then 1 end)
over (partition by t1.col1, t2.col1) as ghi_matches
from (
select t1.*,
count(case when t1.col3 = 'GHI' then 1 end)
over (partition by t1.col1) as ghi_count
from table1 t1
) t1
join table2 t2
on (t1.col3 = 'ABC' and t2.abc = t1.col2)
or (t1.col3 = 'DEF' and t2.def = t1.col2)
or (t1.col3 = 'GHI' and t2.ghi = t1.col2)
)
where abc_matches > 0
and def_matches > 0
and (t1_ghi_count = 0 or ghi_matches > 0)
order by t1_col1, t2_col1;
Which with your sample data gets:
T2_COL T1_COL
------ ------
A1 A
A3 A
A4 A
A7 A
A8 A
A9 A
A1 B
A3 B
A4 B
A7 B
A8 B
A10 C
Not sure if the efficiency of that will be significantly different to MTO's cross join with your real data.

This becomes quite simple when you use collections (and you only need to do one table scan for each table):
Oracle Setup:
CREATE TYPE intlist AS TABLE OF INT;
/
Query:
SELECT t2.col1 AS t2_col1,
t1.col1 AS t1_col1
FROM (
SELECT col1,
CAST( COLLECT( CASE col3 WHEN 'ABC' THEN col2 END ) AS INTLIST ) AS abc,
CAST( COLLECT( CASE col3 WHEN 'DEF' THEN col2 END ) AS INTLIST ) AS def,
CAST( COLLECT( CASE col3 WHEN 'GHI' THEN col2 END ) AS INTLIST ) AS ghi
FROM table1
GROUP BY col1
) t1
INNER JOIN table2 t2
ON ( t2.abc MEMBER OF t1.abc
AND t2.def MEMBER OF t1.def
AND ( t2.ghi MEMBER OF t1.ghi OR t1.ghi IS EMPTY ) );
Output:
t2_col1 t1_col1
------- -------
A1 A
A3 A
A4 A
A7 A
A8 A
A9 A
A1 B
A3 B
A4 B
A7 B
A8 B
A10 C
Update
An alternative query without using collections (it is going to be more efficient than your query but probably less efficient than collections):
SELECT t2.col1,
t1.col1
FROM table1 t1
CROSS JOIN
table2 t2
GROUP BY t1.col1, t2.col1
HAVING COUNT( CASE WHEN t1.col2 = t2.abc AND t1.col3 = 'ABC' THEN 1 END ) > 0
AND COUNT( CASE WHEN t1.col2 = t2.def AND t1.col3 = 'DEF' THEN 1 END ) > 0
AND ( COUNT( CASE WHEN t1.col2 = t2.ghi AND t1.col3 = 'GHI' THEN 1 END ) > 0
OR COUNT( CASE t1.col3 WHEN 'GHI' THEN 1 END ) = 0 )
ORDER BY t1.col1, t2.col1;
Update 2:
Changed from CROSS JOIN to INNER JOIN:
SELECT t2.col1 AS t2_col1,
t1.col1 AS t1_col1
FROM (
SELECT t1.*,
COUNT( CASE col3 WHEN 'GHI' THEN 1 END )
OVER ( PARTITION BY col1 ) AS has_ghi
FROM table1 t1
) t1
INNER JOIN table2 t2
ON ( t1.col3 = 'ABC' AND t2.abc = t1.col2 )
OR ( t1.col3 = 'DEF' AND t2.def = t1.col2 )
OR ( t1.col3 = 'GHI' AND t2.ghi = t1.col2 )
GROUP BY t1.col1, t2.col1, t1.has_ghi
HAVING COUNT( CASE t1.col3 WHEN 'ABC' THEN 1 END ) > 0
AND COUNT( CASE t1.col3 WHEN 'DEF' THEN 1 END ) > 0
AND ( COUNT( CASE t1.col3 WHEN 'GHI' THEN 1 END ) > 0 OR has_ghi = 0 )
ORDER BY t1.col1, t2.col1;

Related

Hadoop - Hive - Impala - rewrite a query for performance

I have 2 tables with below columns
Table1
col1 col2 col3 val
11 221 38 10
null 90 null 989
78 90 null 77
table2
col1 col2 col3
12 221 78
23 null 67
78 90 null
I want to join these 2 tables first on col1 if values matched then stop if not join on col2 if matches stop else join on col3 and populate val if any of column matches else null and whichever columns matching then populate that column in matchingcol column. So, the output should look like this:
col1 col2 col3 val matchingcol
11 221 38 10 col2
null 90 null null null
78 90 null 77 col1
I was able to do this using below query, but the performance is very slow. Please let me know if there is any better way of writing below for faster performance
select *
from table1 t1 left join
table2 t2_1
on t2_1.col1 = t1.col1 left join
table2 t2_2
on t2_2.col2 = t1.col2 and t2_1.col1
left join table2 t2_3 on t2_3.col3 = t1.col3 and t2_2.col2 is null
ps: I asked same question before but there was no better answer
What you describe is:
select t1.col1, t1.col2, t1.col3,
(case when t2_1.col1 is not null or t2_2.col1 is not null or t2_3.col1 is not null then t1.val end) as val
(case when t2_1.col1 is not null then 'col1'
when t2_2.col2 is not null then 'col2'
when t2_3.col3 is not null then 'col3'
end) as matching
from table1 t1 left join
table2 t2_1
on t2_1.col1 = t1.col1 left join
table2 t2_2
on t2_2.col2 = t1.col2 and t2_1.col1 is null left join
table2 t2_3
on t2_3.col3 = t1.col3 and t2_2.col2 is null;
This is probably the best approach.

SQL Compare Two tables with column value difference

I've 2 tables with exact same structure and I would like compare the column values and display in specific format. I'm new to SQL. I tried with Minus function but its not helping. Find below scenario
Table 1
Key Col1 Col2
1 110 AAA
2 120 BBB
Table 2
Key Col1 Col2
1 111 CCC
2 120 DDD
I need output in below format
Key Field Table1 Table2
1 Col1 110 111
1 Col2 AAA CCC
2 Col2 BBB DDD
How can this be accomplished?
Thanks,
Milind
This is an arcane structure for bringing the tables together. I think this will work:
select t1.col1,
(case when t2.key is not null then 'col2' else 'col1' end) as field,
(case when t2.key is not null then t1.col2
when seqnum = 1 then t1.col1
when seqnum = 2 then t1.col2
end) as Table1,
(case when t2.key is not null then t2.col2
when seqnum = 1 then t2.col1
when seqnum = 2 then t2.col2
end) as Table2
from table1 t1 left join
table2 t2
on t1.key = t2.key and t1.col1 = t2.col1 left join
(select tt2.*, row_number() over (partition by tt2.key order by tt2.key) as seqnum
from table2 tt2
) tt2
on t1.key = tt2.key and t2.key is null;

Select Group data with one matching condition

Table:
Col1 Col2
1 2
1 3
1 4
2 2
2 3
first need to check all rows with col2 = 4
Then need to select all rows with values col1
The result should be:
1 2
1 3
1 4
Off the top of my head
SELECT A.* FROM MyTable A JOIN MyTable B ON A.Col1 = B.Col1 WHERE B.Col2 = 4
I think you want this:
select t.*
from t
where t.col1 in (select t2.col1 from t t2 where t2.col2 = 4);
This query checks on both columns, where col2 = 4 and col1 = 1, from what i can understand in your description.
SELECT t1.col1, t2.col2 FROM Table t1
WHERE t1.col2 = 4
UNION
SELECT t2.col1, t2.col2 FROM Table t2
WHERE t2.col1 = 1

Oracle join view for best matches

I wish to create a view which joins two tables together.
T1 =
Col1 Col2
AA BB
EE FF
YY ZZ
11 00
T2 =
Col1 Col2 Col3
AA BB 1
AA CC 2
CC BB 3
GG FF 4
GG HH 5
EE HH 6
XX YY 7
XX WW 8
YY RR 9
The rules for this view are a Best match scenario based upon the following rules:
1. Return Col3 from T2 if T1.Col1 & T1.Col2 = T2.Col1 & T2.Col2
ELSE
2. Return Col3 if T1.Col2 = T2.Col2
ELSE
3. Return Col3 if T1.Col = T2.Col1
ELSE
4. Return NULL
So in these examples I would expect the final view to contain:
AA BB 1 (Rule 1 match)
EE FF 4 (Rule 2 match)
YY ZZ 9 (Rule 3 match)
11 00 NULL (Rule 4 match)
The difficulty I am having is in the cases where it hits multiple rules (e.g. Rows 1 and 3 where rules 1 and 2 are hit or Rows 4 and 6 where rules 2 and 3 are hit separately).
I realise in this example that Rule 3 is hit multiple times - this is fine as the idea is it will only hit rule 3 when the other rules aren't true which should only ever yield 1 result (like in example 3).
Is there a way to do a similar union to cater for these cascading rules or will additional views need to creating with pre-filtering (such as having count < 2)?
A formula for this in excel would be:
=IF(AND(A3=$F$2,B3=$G$2),"Rule1",IF((B3=$G$2),"Rule 2",IF((A3=$F$2),"Rule 3","NULL")))
Where A3 = T2.Col1, B3 = T2.Col2 G2 = T1.Col2 and F2 = T1.Col1.
I'd do it like this:
with t1 as (select 'AA' col1, 'BB' col2 from dual union all
select 'EE' col1, 'FF' col2 from dual union all
select 'YY' col1, 'ZZ' col2 from dual union all
select '11' col1, '00' col2 from dual),
t2 as (select 'AA' col1, 'BB' col2, 1 col3 from dual union all
select 'AA' col1, 'CC' col2, 2 col3 from dual union all
select 'CC' col1, 'BB' col2, 3 col3 from dual union all
select 'GG' col1, 'FF' col2, 4 col3 from dual union all
select 'GG' col1, 'HH' col2, 5 col3 from dual union all
select 'EE' col1, 'HH' col2, 6 col3 from dual union all
select 'XX' col1, 'YY' col2, 7 col3 from dual union all
select 'XX' col1, 'WW' col2, 8 col3 from dual union all
select 'YY' col1, 'RR' col2, 9 col3 from dual),
res as (select t1.col1,
t1.col2,
t2.col3,
case when t1.col1 = t2.col1 and t1.col2 = t2.col2 then 1
when t1.col2 = t2.col2 then 2
when t1.col1 = t2.col1 then 3
end join_level,
min (case when t1.col1 = t2.col1 and t1.col2 = t2.col2 then 1
when t1.col2 = t2.col2 then 2
when t1.col1 = t2.col1 then 3
end) over (partition by t1.col1, t1.col2) min_join_level
from t1
left outer join t2 on (t1.col1 = t2.col1 or t1.col2 = t2.col2))
select col1,
col2,
col3
from res
where join_level = min_join_level
or join_level is null;
COL1 COL2 COL3
---- ---- ----------
11 00
AA BB 1
EE FF 4
YY ZZ 9
Ie. do the joins first (in this case, t1 left outer join t2 on (t2.col1 = t1.col1 or t2.col2 = t1.col2) includes rows where t1.col1 = t2.col1 and t1.col2 = t2.col2), and then filter the results based on which join condition takes precedence.
Here's a slightly different alternative, using aggregates instead of analytic functions like the above answer:
with t1 as (select 'AA' col1, 'BB' col2 from dual union all
select 'EE' col1, 'FF' col2 from dual union all
select 'YY' col1, 'ZZ' col2 from dual union all
select '11' col1, '00' col2 from dual),
t2 as (select 'AA' col1, 'BB' col2, 1 col3 from dual union all
select 'AA' col1, 'CC' col2, 2 col3 from dual union all
select 'CC' col1, 'BB' col2, 3 col3 from dual union all
select 'GG' col1, 'FF' col2, 4 col3 from dual union all
select 'GG' col1, 'HH' col2, 5 col3 from dual union all
select 'EE' col1, 'HH' col2, 6 col3 from dual union all
select 'XX' col1, 'YY' col2, 7 col3 from dual union all
select 'XX' col1, 'WW' col2, 8 col3 from dual union all
select 'YY' col1, 'RR' col2, 9 col3 from dual)
select t1.col1,
t1.col2,
min(t2.col3) keep (dense_rank first order by case when t1.col1 = t2.col1 and t1.col2 = t2.col2 then 1
when t1.col2 = t2.col2 then 2
when t1.col1 = t2.col1 then 3
end) col3
from t1
left outer join t2 on (t1.col1 = t2.col1 or t1.col2 = t2.col2)
group by t1.col1,
t1.col2;
COL1 COL2 COL3
---- ---- ----------
11 00
AA BB 1
EE FF 4
YY ZZ 9
N.B. These could return different results if there happened to be more than one row that met the highest priority available join condition. The first query would return each row with a (potentially) different col3, whereas the second query would return just one row, with the lowest available col3 value.
What would you expect to see if T2 contained:
COL1 COL2 COL3
---- ---- ----------
AA BB 1
AA CC 2
CC BB 3
GG FF 4
GG HH 5
EE HH 6
XX YY 7
XX WW 8
YY RR 9
YY SS 10
The first query will give you:
COL1 COL2 COL3
---- ---- ----------
11 00
AA BB 1
EE FF 4
YY ZZ 10
YY ZZ 9
The second query will give you:
COL1 COL2 COL3
---- ---- ----------
11 00
AA BB 1
EE FF 4
YY ZZ 9
Perhaps this method, which chains together the result sets of three common table expressions, each of which implements a different join and checks whether the rowid of the row in T1 has already been projected from a successful join:
with
first_join as (
select t1.col1,
t1.col2,
t2.col3,
t1.rowid
from t1 join t2 on t1.col1 = t2.col1 and t1.col2 = t2.col2),
second_join as (
select t1.col1,
t1.col2,
t2.col3,
t1.rowid
from t1 join t2 on t1.col2 = t2.col2
where t1.rowid not in (select rowid from first_join)),
third_join as (
select t1.col1,
t1.col2,
t2.col3,
from t1 join t2 on t1.col1 = t2.col1
where t1.rowid not in (select rowid from first_join union all
select rowid from second_join))
select col1, col2, col3 from first_join union all
select col1, col2, col3 from second_join union all
select col1, col2, col3 from third_join

Union two select statements while keeping distinct for not null column

My Oracle statement has two parts:
Select statement 1 is returning rows as:
a b c NULL
a x y NULL
Select statement 2 is returning rows as:
a b c d
e f g h
I want to union both the selects provided for a row having same columns(except NULL column) as in select 2 , only the not NULL row is returned.
Output:
a b c d
a x y NULL
e f g h
CHANGED REQUIREMENTS:
The requirements are bit changed now and i have case like:
Select statement 1 as:
a b c e NULL
a x y s NULL
Select statement 2 as:
a b c d text
e f g h text
Output:
a b c d text
a x y s NULL
e f g h text
I.e. in case of NULL field in last column, I need to fetch the row from "Select statement 2".
Considering that first three columns are not nullable you can use FULL OUTER JOIN:
with t1 as (
select 'a' c1, 'b' c2, 'c' c3, null c4 from dual
union all
select 'a', 'x', 'y', null from dual),
t2 as (
select 'a' c1, 'b' c2, 'c' c3, 'd' c4 from dual
union all
select 'e', 'f', 'g', 'h' from dual)
select c1, c2, c3, coalesce(t1.c4, t2.c4) c4
from t1 full outer join t2 using(c1, c2, c3);
C1 C2 C3 C4
-- -- -- --
a b c d
e f g h
a x y (NULL)
According to updated requirements:
with t1(c1, c2, c3, c4, c5) as (
select 'a', 'b', 'c', 'e', null from dual
union all
select 'a', 'x', 'y', 's', null from dual),
t2(c1, c2, c3, c4, c5) as (
select 'a', 'b', 'c', 'd', 'qwerty' from dual
union all
select 'e', 'f', 'g', 'h', 'asdfgh' from dual)
select c1,
c2,
c3,
nvl(nvl2(t1.c5, t1.c4, t2.c4), t1.c4) c4,
coalesce(t1.c5, t2.c5) c5
from t1
full outer join t2
using (c1, c2, c3);
C1 C2 C3 C4 C5
-- -- -- -- ------
a b c d qwerty
e f g h asdfgh
a x y s (NULL)
This is fast and dirty hack. Although it works on sample data you provided, it might behave unpredictable on your full dataset. This just gives you an idea how to accomplish your goal. I strongly recommend you to test it thoroughly before use.
Suppose that we got TABLE1 and TABLE2, in this query TAB1 returns same rows in tables and remove the null columns from that rows and TAB2 returns all rows from TABLE1 with out the rows which are the same with the rows in TABLE2 and TAB3 is same TAB2 but the tables changed, and finally UNION ALL TAB1 and TAB2 and TAB3:
SELECT * FROM
(SELECT
CASE WHEN T1.COL1 IS NULL THEN T2.COL1 ELSE T1.COL1 END COL1,
CASE WHEN T1.COL2 IS NULL THEN T2.COL2 ELSE T1.COL2 END COL2,
CASE WHEN T1.COL3 IS NULL THEN T2.COL3 ELSE T1.COL3 END COL3,
CASE WHEN T1.COL4 IS NULL THEN T2.COL4 ELSE T1.COL4 END COL4
FROM
TABLE2 T2 INNER JOIN TABLE1 T1 ON
(T1.COL1 = T2.COL1 OR T1.COL1 IS NULL OR T2.COL1 IS NULL) AND
(T1.COL2 = T2.COL2 OR T1.COL2 IS NULL OR T2.COL2 IS NULL) AND
(T1.COL3 = T2.COL3 OR T1.COL3 IS NULL OR T2.COL3 IS NULL) AND
(T1.COL4 = T2.COL4 OR T1.COL4 IS NULL OR T2.COL4 IS NULL))TAB1
UNION ALL
SELECT * FROM
(SELECT * FROM TABLE1
MINUS
SELECT T1.*
FROM
TABLE2 T2 INNER JOIN TABLE1 T1 ON
(T1.COL1 = T2.COL1 OR T1.COL1 IS NULL OR T2.COL1 IS NULL) AND
(T1.COL2 = T2.COL2 OR T1.COL2 IS NULL OR T2.COL2 IS NULL) AND
(T1.COL3 = T2.COL3 OR T1.COL3 IS NULL OR T2.COL3 IS NULL) AND
(T1.COL4 = T2.COL4 OR T1.COL4 IS NULL OR T2.COL4 IS NULL))TAB2
UNION ALL
SELECT * FROM
(SELECT * FROM TABLE2
MINUS
SELECT T2.*
FROM
TABLE2 T2 INNER JOIN TABLE1 T1 ON
(T1.COL1 = T2.COL1 OR T1.COL1 IS NULL OR T2.COL1 IS NULL) AND
(T1.COL2 = T2.COL2 OR T1.COL2 IS NULL OR T2.COL2 IS NULL) AND
(T1.COL3 = T2.COL3 OR T1.COL3 IS NULL OR T2.COL3 IS NULL) AND
(T1.COL4 = T2.COL4 OR T1.COL4 IS NULL OR T2.COL4 IS NULL))TAB3;
SQL Fiddle1
SQL Fiddle2