Avoid multiple rows in join - sql

Suppose that I have three tables, A, B and C:
Table A:
C1 C2 Dt
-------------
1 2 8 pm
1 2 10 pm
Table B:
C1 C2 Ind
-------------
1 2 123
1 2 456
Table C:
C1 C2 C3 C4 Ind
-------------------
1 2 a b 123
1 2 c d 123
1 2 e f 123
1 2 g h 456
As you can see, table B and C have a matching index, while A doesn't. How can I join the three tables so that the first row of A (ordered by the 'dt' column) will only match the rows in C which index is the first in B (ordered by Ind)? The same would apply for the other rows.
What I have tried is to create a simple join:
SELECT *
FROM A JOIN B
ON A.C1 = B.C1
AND A.C2 = B.C2
JOIN C ON A.C1 = C.C1
AND A.C2 = C.C2
AND B.IND = C.IND
I know this doesn't work, because each row in A will match all the rows in B and then match all the rows in C. In other words, there is no unique match.
Another approach I have thought made use of two selects:
SELECT *
FROM B JOIN (
SELECT C1, C2, C3, C4, Ind,
row_number() OVER (PARTITION BY C1, C2, ind ORDER BY C1, C2, ind) AS num_row
FROM C
) table_c
ON B.IND = table_c.IND
AND B.C1 = table_c.C1
AND B.C2 = table_c.C2
JOIN (
SELECT C1, C2, DT, row_number() OVER (ORDER BY DT) AS num_row
FROM A
) table_a
ON table_a.num_row = table_c.num_row
AND table_a.C1 = table_c.C1
AND table_a.C2 = table_c.C2
But those tables are very big and every approach that I have tried, would use multiple selects and is very slow. So I was wondering what would be the best way to do this.

Tables A and B have a one to one relationship. So joining them on a unique id based on the order of each should solve the first part of the problem.
create table newA as select rownum as uniq_id, A.* from A order by dt;
create table newB as select rownum as uniq_id, B.* from B order by ind;
select * from newA inner join newB on newA.uniq_id = newB.uniq_id;
Then with your new query, join on C.
select *
from
C
inner join (select
newB.Ind as ind
from
newA
inner join newB on newA.uniq_id = newB.uniq_id)
as sub on C.ind = sub.ind
I'm sure this could be done with temp tables or strictly in sql but that will depend on your implementation

You can use ROW_NUMBER effectively here
WITH arn
AS (SELECT a.c1,
a.c2,
"Dt",
Row_number()
over (
PARTITION BY a.c1, a.c2
ORDER BY "Dt")rn
FROM a),
brn
AS (SELECT b.c1,
b.c2,
b."Ind",
Row_number()
over (
PARTITION BY b.c1, b.c2
ORDER BY b."Ind") rn
FROM b)
SELECT *
FROM arn a
inner join brn b
ON a.c1 = b.c1
AND a.c2 = b.c2
AND a.rn = b.rn
Inner join c
ON b.c1 = c.c1
AND b.c2 = c.c2
AND b."Ind" = c."Ind"
Demo

Related

Select multiple count(*) in multiple tables with single query

I have 3 tables:
Basic
id
name
description
2
Name1
description2
3
Name2
description3
LinkA
id
linkA_ID
2
344
3
3221
2
6642
3
2312
2
323
LinkB
id
linkB_ID
2
8287
3
42466
2
616422
3
531
2
2555
2
8592
3
1122
2
33345
I want to get results as the table below:
id
name
description
linkA_count
linkB_count
2
Name1
description2
3
2
3
Name2
description3
5
3
my query:
SELECT
a.id
,a.name
,a.description
,COUNT(b.linkA_ID) AS linkA_count
,COUNT(c.linkB_ID) AS linkb_count
FROM
basic a
JOIN linkA b on (a.id = b.id)
JOIN linkb c on (a.id = c.id)
GROUP BY
a.id
,a.name
,a.description
Result from the query is count of linkA always same as linkB
A more traditional approach is to use "derived tables" (subqueries) so that the counts are performed before joins multiply the rows. Using left joins allows for all id's in basic to be returned by the query even if there are no related rows in either joined tables.
select
basic.id
, coalesce(a.LinkACount,0) LinkACount
, coalesce(b.linkBCount,0) linkBCount
from basic
left join (
select id, Count(linkA_ID) LinkACount from LinkA group by id
) as a on a.id=basic.id
left join (
select id, Count(linkB_ID) LinkBCount from LinkB group by id
) as b on b.id=basic.id
Try This (using SubQuery)
SELECT
basic.id
,basic.name
,basic.description
,(select Count(linkA_ID) from LinkA where LinkA.id=basic.id) as LinkACount
,(select Count(linkB_ID) from LinkB where LinkB.id=basic.id) as LinkBCount FROM basic
Method 2 (Try CTE)
with a as(select id,Count(linkA_ID)LinkACount from LinkA group by id)
, b as (select id,Count(linkB_ID)LinkBCount from LinkB group by id)
select basic.id,a.LinkACount,b.linkBCount
from basic
join a on (a.id=basic.id)
join b on (b.id=basic.id)
If you only select from your table you see why your query cannot work.
SELECT
*
FROM
basic a
JOIN linkA b on (a.id = b.id)
JOIN linkb c on (a.id = c.id)
WHERE a.ID = 3
=> just use distinct in your count
SELECT
a.id
,a.name
,a.description
,COUNT(DISTINCT(b.linkA_ID)) AS linkA_count
,COUNT(DISTINCT(c.linkB_ID)) AS linkb_count
FROM
basic a
JOIN linkA b on (a.id = b.id)
JOIN linkb c on (a.id = c.id)
GROUP BY
a.id
,a.name
,a.description

hive - how to select top N elements for each match

Please consider a hive table - TableA as mentioned below.
This basic SQL syntax works fine when we want to get "all" the rows that matches the condition in the where clause. I want to limit the returned rows to a number - say N - for each of the matches of where clause.
Let me explain with an example:
(1)
Consider this table:
TableA
c1 c2
1. a
1 b
1 c
2. d
2. e
2. f
(2) Consider this query:
SELECT c1, c2
FROM TableA
WHERE c1 in (1,2)
(3) As you can imagine, it would produce this result:
Actual Results:
c1 c2
1. a
1 b
1 c
2. d
2. e
2. f
(4)
Desired Result:
c1 c2
1. a
1 b
2. d
2. e
Question: How do I modify the query in #2) to get the desired output mention in #4).
You can use row_number function to do this.
select c1,c2
from (SELECT c1, c2, row_number() over(partition by c1 order by c2) as rnum
FROM TableA
--add a where clause as needed
) t
where rnum <= 2
Only 2 values for c1
SELECT c1, c2 FROM TableA WHERE c1 = 1 ORDER BY c2 LIMIT 2
UNION ALL
SELECT c1, c2 FROM TableA WHERE c1 = 2 ORDER BY c2 LIMIT 2
More than 2 values, use rank()
select c1,c2 from
(
select c1,c2,rank() over (partition by c1 order by c2) as rank
from TableA
) t
where rank < 3;

Selecting the the last row in a partition in HIVE

I have a table t1:
c1 | c2 | c3| c4
1 1 1 A
1 1 2 B
1 1 3 C
1 1 4 D
1 1 4 E
1 1 4 F
2 2 1 A
2 2 2 A
2 2 3 A
I want to select the last row of each c1, c2 pair. So (1,1,4,F) and (2,2,3,A) in this case. My idea is to do something like this:
create table t2 as
select *, row_number() over (partition by c1, c2 order by c3) as rank
from t1
create table t3 as
select a.c1, a.c2, a.c3, a.c4
from t2 a
inner join
(select c1, c2, max(rank) as maxrank
from t2
group by c1, c2
)
on a.c1=b.c1 and a.c2=b.c1
where a.rank=b.maxrank
Would this work? (Having environment issues so can't test myself)
Just use a subquery:
select t1.*
from (select t1.*, row_number() over (partition by c1, c2 order by c3 desc) as rank
from t1
) t1
where rank = 1;
Note the use of desc for the order by.

SQL Server, selecting from 2 columns from different tables

I have these columns from 2 tables
Table1 Table2
Code ID Code ID
A 1 A 1
B 1 B 1
C 1 C 1
D 1
E 1
My query:
Select
a.id, a.code, b.code
from
Table1 a, Table2 b
where
a.id = '1' and a.id = b.id
What I expected
ID code code
1 A A
1 B B
1 C C
1 D NULL
1 E NULL
What I got
ID code code
1 A A
1 B A
1 C A
1 D A
1 E A
1 A B
1 B B
1 C B
....
Any ideas? distinct didn't help
Thanks
Well, all the ID's in both tables are 1, so by joining on ID you'll get the cartesian product of both tables.
Instead, you'll need to do a left outer join based on Table1.Code:
Select a.id, a.code, b.code
from Table1 a LEFT OUTER JOIN Table2 b
on a.code = b.code
where a.id = '1';
You need to do a LEFT OUTER JOIN instead of a Cartesian Product
SELECT a.Id, a.Code, b.Code FROM Table1 a
LEFT OUTER JOIN Table2 b ON a.Code = b.Code
WHERE a.Id = '1'
A LEFT OUTER JOIN returns all rows from the left-hand side of the join (in this case Table 1) regardless of whether there is a matching record in the table on the right-hand side of the join (in this case Table 2). Where there is no match a NULL is returned for b.Code as per your requirements.
Reference OUTER JOINS

How can I get the exact match join for the below scenario?

How can i join the below tables
TableA TableB TableC TableD
ID ID_C ID ID_A Value ID ID ID_C Value
1 1 1 1 a 1 1 1 a
2 1 b 2 1 b
in order to get the Result like
Result
ID ID_B Value ID_C ID_D Value
1 1 a 1 1 a
1 2 b 1 2 b
and my result shouldn't contain 1 2 b 1 1 b and both value columns cannot always have same values so it cannot be used in a condition.
To make it simplier,
Resultant Table TableA TableB
ID Value ID Value ID ID_A
1 a 1 a 1 1
1 b 2 g 2 1
2 a 3 d 3 2
3 c 4 3
Now i need to join the Resultant Table with TableA,TableB inorder to get some of the columns from TableA,TableB and ResultantTable.ID=TableA.ID and TableB.ID_A=TableA.ID since its a foreign key.
Doing the Join with TableB turns to duplicates. Since ID=1 occurs twice i get 4 records where ID=1, when there are only 2 records. It can be done with distinct or group by but i need other columns as well to be displayed.How do i do both in the process.
SELECT A.ID, B.ID, B.Value, C.ID, D.ID, D.Value
FROM TableA A
INNER JOIN TableB B ON A.ID = B.ID_A
INNER JOIN TableC C ON A.ID_C = C.ID
INNER JOIN TableD D ON B.ID = D.ID AND C.ID = D.ID_C
You tell us that the field "value" in TableB should not be different from the field "value" in TableD? Could we replace the B.ID = D.ID with B.Value = D.Value so solve your problem?
Are you sure, that is the way that is suppose to work?
Try:
SELECT A.ID, B.ID ID_B, B.Value Value_B, C.ID ID_C, D.ID ID_D, D.Value Value_D
FROM TableA A
JOIN TableB B ON A.ID = B.ID_A
JOIN TableC C ON A.ID_C = C.ID
JOIN TableD D ON B.Value = D.Value AND C.ID = D.ID_C