Oracle join to either of multiple columns - sql

I have a RELATION table
NUM1 | NUM2 | NUM3
-- --- -----
1 2 3
2 4 5
3 4 null
3 4 null
and the actual INFO table where NUM is primary key.
NUM | A_LOT_OF_OTHER_INFO
--- --------------------
1 asdff
2 werwr
3 erert
4 ghfgh
5 cvbcb
I want to create a view to see the count of the NUM that appeared in any of the NUM1, NUM2, NUM3 of the RELATION table.
MY_VIEW
NUM | A_LOT_OF_OTHER_INFO | TOTAL_COUNT
--- -------------------- ------------
1 asdff 1
2 werwr 2
3 erert 3
4 ghfgh 3
5 cvbcb 1
I can do this by doing three selects from RELATION table and UNION them, but I do not want to use UNION because the tables have a lot of records, MY_VIEW is already large enough and I am looking for a better way to join to the RELATION table in the view. Can you suggest a way?

What i would try is to unpivot the relation table.
After that join the info table on the values and count the number of times the val gets repeated.
create table relation(num1 int,num2 int, num3 int);
insert into relation values(1,2,3);
insert into relation values(2,4,5);
insert into relation values(3,4,null);
create table info(num int, a_lot_of_other_info varchar2(100));
insert into info
select 1,'asdff' from dual union all
select 2,'werwr' from dual union all
select 3,'erert' from dual union all
select 4,'ghfgh' from dual union all
select 5,'cvbcb' from dual
select a.num
,max(a_lot_of_other_info) as a_lot_of_other_info
,count(*) as num_of_times
from info a
join (select val
from relation a
unpivot(val for x in (num1,num2,num3))
)b
on a.num=b.val
group by a.num
order by 1

I would suggest a correlated subquery:
select i.*,
(select ((case when r.num1 = i.num then 1 else 0 end) +
(case when r.num2 = i.num then 1 else 0 end) +
(case when r.num3 = i.num then 1 else 0 end)
)
from relation r
where i.num in (r.num1, r.num2, r.num3)
) as total_count
from info i;
If performance is a consideration, it might be faster to use left joins:
select i.*,
((case when r1.num1 is not null then 1 else 0 end) +
(case when r2.num1 is not null then 1 else 0 end) +
(case when r3.num1 is not null then 1 else 0 end)
) as total_count
from info i left join
relation r1
on i.num = r1.num1 left join
relation r2
on i.num = r2.num2 left join
relation r3
on i.num = r3.num3;
In particular, this will make optimal use of three separate indexes on relation: relation(num1), relation(num2), and relation(num3).

It seems what you want is UNPIVOT. Perhaps easiest to do with a cross join in this case:
select NUM, count(*) as TOTAL_COUNT
from (
select decode(column_value, 1, NUM1, 2, NUM2, 3, NUM3) as NUM
from RELATION cross join table(sys.odcinumberlist(1,2,3))
)
group by NUM
;
Then join this to the second table; the join part is really irrelevant here.

Related

Hive: count matches between INNER JOIN on unique values of a column

I am trying to count matches between columns resulting from an INNER JOIN of two tables on unique values of a single column in one of the two tables. An example may make things more clear:
If I had the following two tables:
Table A
-------
id_A: info_A
1 'a'
2 'b'
3 'c'
3 'd'
Table B
-------
id_B: info_B
1 'a'
3 'c'
5 'b'
I want to find the unique id_A: [1,2,3] and the info_A associated with them: ['a','b','c','d'].
I want to create a table that looks like the following:
Table join of A+B
-----------------
id_A: info_A id_B info_B match_cnt
1 'a' 1 'a' 1
3 'c','d' 3 'c' 0.5
where match_cnt is the number of matches between info_A and info_B for a given id_A. FYI, the actual tables I'm working with have billions of rows.
A code chunk demonstrates what I've tried, plus variations (not shown below):
SELECT z.id_A, z.info_A, z.id_B, z.info_B
FROM(
SELECT u.id_A AS id_A, u.info_A AS info_A, y.id_B AS true_id_B, y.info_B AS true_info_B
FROM db.table_A u
WHERE EXISTS
( SELECT id_B, info_B
FROM table_B l
where l.id_B= u.id_A)
INNER JOIN table_B y
ON u.id_A = y.id_B
) z
You may use some thing like below :-
WITH T1 AS ( select ID_A ,count(1) as cnt from tableA inner join tableB on tableA.ID_A=tableB.ID_B and tableA.INFO_A=tableB.INFO_B group by ID_A,INFO_A)
select distinct tmp.ID_A,tmp.a,tmp.ID_B,tmp.b, (cnt/size(a)) from
(select ID_A ,collect_set(INFO_A) as a,ID_B,collect_set(INFO_B) as b from tableA inner join tableB on tableA.a=tableB.a group by tableA.a,tableB.a)
tmp join T1 on T1.ID_A=tmp.ID_A
select id
,collect_list (case when a=1 then info end) as info_a
,collect_list (case when b=1 then info end) as info_b
,count (case when a=1 and b=1 then 1 end) / count(*) as match_cnt
from (select id
,info
,min (case when tab = 'A' then 1 end) as a
,min (case when tab = 'B' then 1 end) as b
from ( select 'A' as tab ,id_A as id ,info_A as info from A
union all select 'B' as tab ,id_B as id ,info_B as info from B
) t
group by id
,info
) t
group by id
having min(a) = 1
and min(b) = 1
;
+----+-----------+--------+-----------+
| id | info_a | info_b | match_cnt |
+----+-----------+--------+-----------+
| 1 | ["a"] | ["a"] | 1.0 |
| 3 | ["c","d"] | ["c"] | 0.5 |
+----+-----------+--------+-----------+

Exclude value of a record in a group if another is present

In the example table below, I'm trying to figure out a way to sum amount over id for all marks where mark 'C' doesn't exist within an id. When mark 'C' does exist in an id, I want the sum of amounts over that id, excluding the amount against mark 'A'. As illustration, my desired output is at the bottom. I've considered using partitions and the EXISTS command, but I'm having trouble conceptualizing the solution. If any of you could take a look and point me in the right direction, it would be greatly appreciated :)
sample table:
id mark amount
------------------
1 A 1
2 A 3
2 B 2
3 A 2
4 A 1
4 B 3
5 A 1
5 C 3
6 A 2
6 C 2
desired output:
id sum(amount)
-----------------
1 1
2 5
3 2
4 4
5 3
6 2
select
id,
case
when count(case mark when 'C' then 1 else null end) = 0
then
sum(amount)
else
sum(case when mark <> 'A' then amount else 0 end)
end
from sampletable
group by id
Here is my effort:
select id, sum(amount) from table t where not t.id = 'A' group by id
having id in (select id from table t where mark = 'C')
union
select id, sum(amount) from table t where t.id group by id
having id not in (select id from table t where mark = 'C')
SELECT
id,
sum(amount) AS sum_amount
FROM atable t
WHERE mark <> 'A'
OR NOT EXISTS (
SELECT *
FROM atable
WHERE id = t.id
AND mark = 'C'
)
GROUP BY
id
;

Create indexed view

My table structure is below :
MyTable (ID Int, AccID1 Int, AccID2 Int, AccID3 int)
ID AccID1 AccID2 AccID3
---- -------- -------- --------
1 12 2 NULL
2 4 12 1
3 NULL NULL 5
4 7 NULL 1
I want to create indexed view with below output :
ID Level Value
---- ----- -------
1 1 12
1 2 2
2 1 4
2 2 12
2 3 1
3 3 5
4 1 7
4 3 1
EDIT :
My table is very huge and I want to have above output.
I can Get my query such as below :
Select ID,
Case StrLevel
When 'AccID1' Then 1
When 'AccID2' Then 2
Else 3
End AS [Level],
AccID as Value
From (
Select A.ID, A.AccID1, A.AccID2, A.AccID3
From MyTable A
)as p
UNPIVOT (AccID FOR [StrLevel] IN (AccID1, AccID2, AccID3)) AS unpvt
or
Select *
from (
select MyTable.ID,
num.n as [Level],
Case Num.n
When 1 Then MyTable.AccID1
When 2 Then MyTable.AccID2
Else MyTable.AccID3
End AS AccID
from myTable
cross join (select 1
union select 2
union select 3)Num(n)
)Z
Where Z.AccID IS NOT NULL
or
Select A.ID,
2 AS [Level],
A.AccID1 AS AccID
From MyTable A
Where A.AccID1 IS NOT NULL
Union
Select A.ID,
2 AS [Level],
A.AccID2
From MyTable A
Where A.AccID2 IS NOT NULL
Union
Select A.ID,
3 AS [Level],
A.AccID3
From MyTable A
Where A.AccID3 IS NOT NULL
But Above query is slow and I want to have indexed view to have better performance.
and in indexed view I can't use UNION or UNPIVOT or CROSS JOIN in indexed view.
What if you created a Numbers table to essentially do the work of your illegal CROSS JOIN?
Create Table Numbers (number INT NOT NULL PRIMARY KEY)
Go
Insert Numbers
Select top 30000 row_number() over (order by (select 1)) as rn
from sys.all_objects s1 cross join sys.all_objects s2
go
Create view v_unpivot with schemabinding
as
Select MyTable.ID,
n.number as [Level],
Case n.number
When 1 Then MyTable.AccID1
When 2 Then MyTable.AccID2
Else MyTable.AccID3
End AS AccID
From dbo.Mytable
Join dbo.Numbers n on n.number BETWEEN 1 AND 3
go
Create unique clustered index pk_v_unpivot on v_unpivot (ID, [Level])
go
Select
ID,
[Level],
AccID
From v_unpivot with (noexpand)
Where AccID IS NOT NULL
Order by ID, [Level]
The WHERE AccID IS NOT NULL must be part of the query because derived tables are not allowed in indexed views.

SQL query to group based on sum

I have a simple table with values that I want to chunk/partition into distinct groups based on the sum of those values (up to a certain limit group sum total).
e.g.,. imagine a table like the following:
Key Value
-----------
A 1
B 4
C 2
D 2
E 5
F 1
And I would like to group into sets such that no one grouping's sum will exceed some given value (say, 5).
The result would be something like:
Group Key Value
-------------------
1 A 1
B 4
--------
Total: 5
2 C 2
D 2
--------
Total: 4
3 E 5
--------
Total: 5
4 F 1
--------
Total: 1
Is such a query possible?
While I am inclined to agree with the comments that this is best done outside of SQL, here is some SQL which would seem to do roughly what you're asking:
with mytable AS (
select 'A' AS [Key], 1 AS [Value] UNION ALL
select 'B', 4 UNION ALL
select 'C', 2 UNION ALL
select 'D', 2 UNION ALL
select 'E', 5 UNION ALL
select 'F', 1
)
, Sums AS (
select T1.[Key] AS T1K
, T2.[Key] AS T2K
, (SELECT SUM([Value])
FROM mytable T3
WHERE T3.[Key] <= T2.[Key]
AND T3.[Key] >= T1.[Key]) AS TheSum
from mytable T1
inner join mytable T2
on T2.[Key] >= T1.[Key]
)
select S1.T1K AS StartKey
, S1.T2K AS EndKey
, S1.TheSum
from Sums S1
left join Sums S2
on (S1.T1K >= S2.T1K and S1.T2K <= S2.T2K)
and S2.TheSum > S1.TheSum
and S2.TheSum <= 5
where S1.TheSum <= 5
AND S2.T1K IS NULL
When I ran this code on SQL Server 2008 I got the following results:
StartKey EndKey Sum
A B 5
C D 4
E E 5
F F 1
It should be straightforward to construct the required groups from these results.
If you want to have only two members or less in each set, you can use the following query:
Select
A.[Key] as K1 ,
B.[Key] as K2 ,
isnull(A.value,0) as V1 ,
isnull(B.value,0) as V2 ,
(A.value+B.value)as Total
from Table_1 as A left join Table_1 as B
on A.value+B.value<=5 and A.[Key]<>B.[Key]
For finding sets having more members, you can continue to use joins.

little help with some tsql

Given following table:
rowId AccountId Organization1 Organization2
-----------------------------------------------
1 1 20 10
2 1 10 20
3 1 40 30
4 2 15 10
5 2 20 15
6 2 10 20
How do I identify the records where Organization2 doesn't exist in Organization1 for a particular account
for instance, in the given data above my results will be a single record which will be AccountId 1 because row3 organization2 value 30 doesn't exist in organization1 for that particular account.
SELECT rowId, AccountId, Organization1, Organization2
FROM yourTable yt
WHERE NOT EXISTS (SELECT 1 FROM yourTable yt2 WHERE yt.AccountId = yt2.AccountId AND yt.Organization1 = yt2.Organization2)
There are two possible interpretations of your question. The first (where the Organization1 and Organization2 columns are not equal) is trivial:
SELECT AccountID FROM Table WHERE Organization1 <> Organization2
But I suspect you're asking the slightly more difficult interpretation (where Organization2 does not appear in ANY Organization1 value for the same account):
SELECT AccountID From Table T1 WHERE Organization2 NOT IN
(SELECT Organization1 FROM Table T2 WHERE T2.AccountID = T1.AccountID)
Here is a how you could do it:
Test data:
CREATE TABLE #T(rowid int, acc int, org1 int, org2 int)
INSERT #T
SELECT 1,1,10,10 UNION
SELECT 2,1,20,20 UNION
SELECT 3,1,40,30 UNION
SELECT 4,2,10,10 UNION
SELECT 5,2,15,15 UNION
SELECT 6,2,20,20
Then perform a self-join to discover missing org2:
SELECT
*
FROM #T T1
LEFT JOIN
#T T2
ON t1.org1 = t2.org2
AND t1.acc = t2.acc
WHERE t2.org1 IS NULL
SELECT
*
FROM
[YorTable]
WHERE
[Organization1] <> [Organization2] -- The '<>' is read "Does Not Equal".
Use left join as Noel Abrahams presented.