Join on two tables gives duplicate results - sql

i have a table with data that I want to join unto another table. Problem is that the join can happen on two columns of the same table, where I want to get the first join to work and if this Fails i want the second join to give me a valid result.
Base table:
| ID1 | ID2 | Value |
| a1 | a2 | val_1 |
| b1 | b2 | val_2 |
| c1 | c2 | val_3 |
join Table:
| ID1 | ID2 | Join_Value |
| | a2 | join_val_1 |
| b1 | | join_val_2 |
| c1 | c2 | join_val_3 |
What i tried was this:
select base.id1, base.id2, Value, isnull(j1.Join_value,j2.Join_value) Join_Value from base
left join Join j1 on j1.id1 = base.id1
left join Join j2 on j2.id2 = base.id2
The Result is this:
| ID1 | ID2 | Value | Join_Value |
| a1 | a2 | val_1 | join_val_1 |
| b1 | b2 | val_2 | join_val_2 |
| c1 | c2 | val_3 | join_val_3 |
| c1 | c2 | val_3 | join_val_3 |
What i want is this:
| ID1 | ID2 | Value | Join_Value |
| a1 | a2 | val_1 | join_val_1 |
| b1 | b2 | val_2 | join_val_2 |
| c1 | c2 | val_3 | join_val_3 |
I hope i made my Problem clear.

You don't need to join the same table twice. Just specify the condition in the ON
select b.ID1, b.ID2, b.[Value], j.Join_Value
from [base] b
inner join [join] j on b.ID1 = j.ID1
or (
j.ID1 = ''
and b.ID2 = j.ID2
)

You are going to get duplicate rows for for the c1 and c2 rows because they match on both of your Join table joins (j1 and j2).
A quick fix is to add a DISTINCT to your query:
select DISTINCT base.id1, base.id2, Value, isnull(j1.Join_value,j2.Join_value) Join_Value
from base
left join Join j1 on j1.id1 = base.id1
left join Join j2 on j2.id2 = base.id2
A better fix, depending on your DBMS is to use a window function:
select id1, id2, Value, Join_Value
FROM (
select base.id1, base.id2, Value, isnull(j1.Join_value,j2.Join_value) Join_Value,
ROW_NUMBER() OVER(
PARTITION BY base.id1, base.id2 -- Group rows based on (id1, id2) combination
ORDER BY j1.id1 -- If more than one row, give priority to row with "id1" value
) AS RowNum
from base
left join Join j1 on j1.id1 = base.id1
left join Join j2 on j2.id2 = base.id2
) src
WHERE RowNum = 1 -- Only return one row
This will make sure you always one row maximum per (id1, id2) combination.

Try:
select *
from base b
join [join] j on b.id1 = j.id1 or b.id2 = j.id2

First, your version does exactly what you want. Here is a db<>fiddle.
Second, for more control over the matching, you can use a lateral join. This allows you to choose only one matching row -- say the one where both ids match:
select b.id1, b.id2, b.value, jt.join_value
from base b cross apply
(select top (1) jt.*
from jointable jt
where b.id1 = jt.id1 or
b.id2 = jt.id2
order by (case when b.id1 = jt.id1 then 1 else 0 end) +
(case when b.id2 = jt.id2 then 1 else 0 end) desc
) jt ;

Related

SQL Query - Check for Two Distinct Values

Given the below data set I want to run a query to highlight any 'pairs' that do not consist of a 'left' and 'right'.
+---------+-----------+---------------+----------------------+
| Pair_Id | Pair_Name | Individual_Id | Individual_Direction |
+---------+-----------+---------------+----------------------+
| 1 | A | A1 | Left |
| 1 | A | A2 | Right |
| 2 | B | B1 | Right |
| 2 | B | B2 | Left |
| 3 | C | C1 | Left |
| 3 | C | C2 | Left |
| 4 | D | D1 | Right |
| 4 | D | D2 | Left |
| 5 | E | E1 | Left |
| 5 | E | E2 | Right |
+---------+-----------+---------------+----------------------+
In this instance Pair 3 'C' has two lefts. Therefore, I would look to display the following:
+---------+-----------+---------------+----------------------+
| Pair_Id | Pair_Name | Individual_Id | Individual_Direction |
+---------+-----------+---------------+----------------------+
| 3 | C | C1 | Left |
| 3 | C | C2 | Left |
+---------+-----------+---------------+----------------------+
You can simply use not exists:
select t.*
from t
where not exists (select 1
from t t2
where t2.pair_id = t.pair_id and
t2.Individual_Direction <> t.Individual_Direction
) ;
With an index on (pair_id, Individual_Direction), this should not only be the most concise solution but also the fastest.
If you want to be sure that there are pairs (the above returns singletons):
select t.*
from t
where not exists (select 1
from t t2
where t2.pair_id = t.pair_id and
t2.Individual_Direction <> t.Individual_Direction
) and
exists (select 1
from t t2
where t2.pair_id = t.pair_id and
t2.Individual_ID <> t.Individual_ID
);
You can also do this using window functions:
select t.*
from (select t.*,
count(*) over (partition by pair_id) as cnt,
min(status) over (partition by pair_id) as min_status,
max(status) over (partition by pair_id) as max_status
from t
) t
where cnt > 1 and min_status <> max_status;
One option uses aggregation:
WITH cte AS (
SELECT Pair_Name
FROM yourTable
WHERE Individual_Direction IN ('Left', 'Right')
GROUP BY Pair_Name
HAVING MIN(Individual_Direction) = MAX(Individual_Direction)
)
SELECT *
FROM yourTable
WHERE Pair_Name IN (SELECT Pair_Name FROM cte);
The HAVING clause used above asserts that a matching pair has both a minimum and maximum direction which are the same. This implies that such a pair only has one direction.
As is the case with Gordon's answer, an index on (Pair_Name, Individual_Direction) might help performance:
CREATE INDEX idx ON yourTable (Pair_Name, Individual_Direction);
There should be an elegant way of using window function than what I wrote:
WITH ranked AS
(
SELECT *, RANK() OVER(ORDER BY Pair_Id, Pair_Name, Individual_Direction) AS r
FROM pairs
),
counted AS
(
SELECT Pair_Id, Pair_Name, Individual_Direction,r, COUNT(r) as times FROM ranked
GROUP BY Pair_Id, Pair_Name, Individual_Direction, r
HAVING COUNT(r) > 1
)
SELECT ranked.Pair_Id, ranked.Pair_Name, ranked.Individual_Id, ranked.Individual_Direction FROM ranked
RIGHT JOIN counted
ON ranked.Pair_Id=counted.Pair_Id
AND ranked.Pair_Name=counted.Pair_Name
AND ranked.Individual_Direction=counted.Individual_Direction

Oracle SQL: Exclude IDs from another table without subquery join

I would like to know if the following is possible without joining the same table twice:
Table A:
+----+------+
| ID | ColA |
+----+------+
| 1 | A1 |
| 2 | A2 |
| 3 | A3 |
| 4 | A4 |
+----+------+
Table B:
+----+------+
| ID | ColB |
+----+------+
| 1 | B1 |
| 2 | B2 |
| 3 | B3 |
| 4 | B4 |
| 5 | B5 |
| 6 | B6 |
+----+------+
Table C:
+----+
| ID |
+----+
| 1 |
| 2 |
+----+
Desired result: (A LEFT JOIN B WITHOUT C)
+----+------+------+
| ID | ColA | ColB |
+----+------+------+
| 3 | A3 | B3 |
| 4 | A4 | B4 |
+----+------+------+
So basically I need to add Column B to Table A, hence left join, and exclude all IDs which occur in Table C.
Current solution:
SELECT a.id, a.ColA, b.ColB
FROM tableA a
LEFT JOIN tableB b ON a.id = b.id
WHERE a.id NOT IN(
SELECT a2.id FROM tableA a2
LEFT JOIN tableC c on a2.id = c.id)
What's irritating me is, that the exclusion of table C requires an additional left join of table A with table C. Isn't there a more straight-forward approach, without having to join table A again as part of the subquery, if all I want to do is to exclude IDs which occur in table C from the resultset?.
Thanks
Use a not exists:
SELECT a.id, a.ColA, b.ColB
FROM tableA a
LEFT JOIN tableB b ON a.id = b.id
where not exists(select 1 from tablec c where a.id = c.id)
The issue with using a not in with a select in Oracle is that:
a) it has to return the whole subquery dataset
b) if there are nulls, it breaks
TOM link regarding these 2 issues
won't this work?
SELECT a.id, a.ColA, b.ColB
FROM tableA a
JOIN tableB b ON a.id = b.id
WHERE a.id NOT IN (SELECT c.Id FROM tableC c)
this can also be done in a join
SELECT a.id, a.ColA, b.ColB
FROM tableA a
JOIN tableB b ON a.id = b.id
LEFT JOIN tableC C ON a.id = c.id
WHERE c.Id is null

MS SQL Server - Select group of max n results of multiple columns in join query result set

Table 1
| Customer_ID | Template_ID
---------------------
| C1 | T1 |
| C1 | T2 |
| C2 | T100 |
| C2 | T5 |
---------------------
Table 2
---------------------
| Template_ID | Product_ID
---------------------
| T1 | P1 |
| T1 | P2 |
| T1 | P5 |
| T2 | P10 |
| T2 | P45 |
| T100 | P98 |
| T100 | P78 |
| T5 | P7777 |
| T5 | P9 |
| T5 | P10 |
| T5 | P1 |
Join query result:
------------------------------------------
| Customer_ID | Template_ID | Product_ID
------------------------------------------
| C1 | T1 | P1
| C1 | T1 | P2
| C1 | T1 | P5
| C1 | T2 | P10
| C1 | T2 | P45
| C2 | T100 | P98
| C2 | T100 | P78
| C2 | T5 | P7777
.
.
I have an existing join query which returns all the matches for Customer_ID & Template_ID, I want to restrict to get only the latest 'Templates for customers - Customer_ID & Template_ID '.
Expected output:
Customer_ID Template_ID Product ID
C1 T1 P1
C1 T1 P2
C1 T1 P5
C2 T100 P98
C2 T100 P78
PS: Actually I want the latest 10, for easier understanding I mention as only the recent Customer_ID & Template_ID combination . I have a date column in Table1 , and I got 'order by SAVED_DATE DESC' , so in the result set I want to get the first one. I have other tables as part of join, which I haven't provided to keep it simple.
You can try to make count window function by Customer_ID and Template_ID columns in t CTE result.
then use a correlated subquery exists to get the max count from etc.
;with cte as (
SELECT t1.Template_ID,
t1.Customer_ID,
t2.Product_ID,
COUNT(*) OVER(PARTITION BY Customer_ID,t2.Template_ID) cnt
FROM t1
join t2 on t1.Template_ID = t2.Template_ID
)
select Template_ID,
Customer_ID,
Product_ID
from cte c1
where exists (
select 1
from cte cc
where c1.Customer_ID = cc.Customer_ID
having max(cc.cnt) = c1.cnt
)
sqlfiddle
Sounds like you need a RANK on table1:
;with cte as
(
select *,
-- assign a ranking for each customer
RANK() OVER(PARTITION BY Customer_ID ORDER BY SAVED_DATE DESC) AS rnk
from table1
)
select ...
from cte
join table2 as t2
on cte.Template_ID = t2.Template_ID
WHERE cte.rnk <= 10 -- no filter for the n latest rows per curomer

select most recent rows from one table 1 and join to get rows from table 2

I need N number of recent data selecting some columns from table 1 and some from table 2.
For example, I need 2 most recent rows from table 1 and table 2.
Table 1
id | Fname | LName
------------------------
1 | F1 | L1
2 | F2 | L2
3 | F3 | L3
4 | F4 | L4
Table 2
id | City | Date
---+----------------------
1 | C1 | 02/23/2014
2 | C2 | 02/01/2014
3 | C3 | 02/20/2014
4 | C4 | 02/19/2014
Desired Result
Fname| City | Date
----------------------------
F1 | C1 | 02/23/2014
F3 | C3 | 02/20/2014
I suppose that you have the same IDs in both T1 and T2 (why two different tables?).
If so:
SELECT FNAME, CITY, MY_DATE
FROM (SELECT T1.FNAME, T2.CITY, T2.MY_DATE
FROM T2, T1
WHERE T2.ID = T1.ID
ORDER BY T2.MY_DATE DESC)
WHERE ROWNUM <= 2;
Otherwise, please explain what's the difference between T1 and T2...

Sql Server get first matching value

I have two tables History and Historyvalues:
History
HID(uniqeidentifier) | Version(int)
a1 | 1
a2 | 2
a3 | 3
a4 | 4
Historyvalues
HVID(uniqeidentifier) | HID(uniqeidentifier) | ControlID(uniqeidentifier) | Value(string)
b1 | a1 | c1 | value1
b2 | a2 | c1 | value2
b3 | a2 | c2 | value3
Now I Need a query where I can get a list with the last historyvalue of each control from a specific Version like:
Get the last values from Version 3 -> receiving ->
HVID | ControlID | Value
b2 | c1 | value2
b3 | c2 | value3
I tried something like this:
Select HVID, ControlId, max(Version), Value from
(
Select HVID, ControlId, Version, Value
from History inner JOIN
Historyvalues ON History.HID = Historyvalues.HID
where Version <= 3
) as a
group by ControlId
order by Version desc
but this does not work.
Are there any ideas?
Thank you very much for your help.
Best regards
Latest version from each control with your specific Version (WHERE t1.Version <= 3)
Query:
SQLFIDDLEExample
SELECT HVID, ControlId, Version, Value
FROM
(
SELECT t2.HVID, t2.ControlId, t1.Version, t2.Value,
ROW_NUMBER() OVER(PARTITION BY t2.ControlId ORDER BY t1.Version DESC) as rnk
FROM History t1
JOIN Historyvalues t2
ON t1.HID = t2.HID
WHERE t1.Version <= 3
) AS a
WHERE a.rnk = 1
ORDER BY a.Version desc
Result:
| HVID | CONTROLID | VERSION | VALUE |
|------|-----------|---------|--------|
| b2 | c1 | 2 | value2 |
| b3 | c2 | 2 | value3 |
here is your solution
Select Historyvalues.HVID,Historyvalues.ControlID,Historyvalues.Value
from Historyvalues
inner join History on Historyvalues.hid=History.hid
where Historyvalues.hvid in (
select MAX(Historyvalues.hvid) from Historyvalues
inner join History on Historyvalues.hid=History.hid
group by ControlID)