SQL inner join duplicates - sql

The query returns duplicates on columns when adding the third table to the query
TABLE A
A.AccId
A.AccNr
A.EntId
TABLE B
B.EntName
B.EntId
TABLE C
C.AccNr
C.CustomerNr
SELECT A.AccID, A.AccNr, A.EntId, B.EntName, B.EntId, C.AccNr,C.CustomerNr, C.EntName
FROM ((Cat.dbo.A
INNER JOIN Cat.dbo.B ON A.EntId = B.EntId)
INNER JOIN Dog.dbo.C ON B.EntName = C.EntName

Your query needs to be updated first. I dont see EntName column in Table C.
There is a possibility of duplicates, if there are many rows matching in Table C , rows from Table A & Table B are repeated while joining with Table C.
If there are many records matching for EntName in C, then you will have duplicates. if you don't want duplicate, you have to apply DISTINCT
SELECT DISTINCT A.AccID, A.AccNr, A.EntId, B.EntName, B.EntId, C.AccNr,C.CustomerNr
FROM Cat.dbo.Accounts
INNER JOIN Cat.dbo.B ON A.EntId = B.EntId
INNER JOIN Dog.dbo.C ON B.EntName = C.EntName
Or if duplicates are due to duplicate records in Table C, you can filter them out before JOIN.
SELECT A.AccID, A.AccNr, A.EntId, B.EntName, B.EntId, C.AccNr,C.CustomerNr
FROM Cat.dbo.Accounts
INNER JOIN Cat.dbo.B ON A.EntId = B.EntId
INNER JOIN (SELECT DISTINCT CustomerNr, EntName FROM Dog.dbo.C) AS C ON B.EntName = C.EntName;

Related

why is my sql inner join return much more data than table 1?

I need to join three tables to get all the info I need. Table a has 70 million rows, after joining a with b, I got 40 million data. But after I join table c, which has only 1.7 million rows, it becomes 300 million rows.
In table c, there are more than one same pt_id and fi_id, one pt_id can connect to many different fi_id, but one fi_id only connects to one same pt_id.
I'm wondering if there is any way to get rid of the duplicate rows, cause I join table c only to get the pt_id.
Thanks for any help!
select c.pt_id,b.fi_id,a.zq_id
from a
inner join (select zq_id, fi_id from b) b
on a.zq_id = b.zq_id
inner join (select fi_id,pt_id from c) c
on b.fi_id = c.fi_id
You can use GROUP BY
select c.pt_id,b.fi_id,a.zq_id
from a
inner join (select zq_id, fi_id from b) b
on a.zq_id = b.zq_id
inner join (select fi_id,pt_id from c) c
on b.fi_id = c.fi_id
group by c.pt_id,b.fi_id,a.zq_id
to remove all duplicate row as question below:
How do I (or can I) SELECT DISTINCT on multiple columns?

SQL Table Joining

I'm joining these three tables, but the same information gets displayed 3 times ... Any idea how to have only the unique rows to be displayed, as determined by unique shipment id's?
SELECT S.SHIPMENT_ID, S.CREATION_DATE, S.BUSINESS_ID, B.BUS_ID, S.SHIPMENT_STATUS, S.BUSINESS_NAME, S.SHIPMENT_MODES, S.CUSTOMER_NAME
FROM "SHIPMENT" S
INNER JOIN "BUSINESS" B ON S.BUSINESS_ID=B.BUS_ID
INNER JOIN "SHIPMENT_GROUP" SG ON S.SHIPMENT_ID=SG.SHIPMENT_ID
INNER JOIN "DATA_GROUP" DG ON DG.ID=SG.GROUP_ID
try select distinct
SELECT DISTINCT column1, column2, ...
FROM table_name;
w3schools
You are selecting rows from the first table only, so this suggests that you are using the joins for filtering.
If so, you can rewrite this with exists, which will avoid duplicates if there are multiple matches. Starting from your existing query, the logic would be:
select s.*
from shipment s
where
exists (
select 1
from business b
where b.bus_id = s.business_id
) and exists (
select 1
from shipment_group sg
inner join data_group dg on dg.id = sg.group_id
where sg.shipment_id = s.shipment_id
)

Joining 3 tables on 2 columns?

I've created 3 views with identical columns- Quantity, Year, and Variety. I want to join all three tables on year and variety in order to do some calculations with quantities.
The problem is that a particular year/variety combo does not occur on every view.
I've tried queries like :
SELECT
*
FROM
a
left outer join
b
on a.variety = b.variety
left outer join
c
on a.variety = c.variety or b.variety = c.variety
WHERE
a.year = '2015'
and b.year = '2015'
and a.year= '2015'
Obviously this isn't the right solution. Ideally I'd like to join on both year and variety and not use a where statement at all.
The desired output would be put all quantities of matching year and variety on the same line, regardless of null values on a table.
I really appreciate the help, thanks.
You want a full outer join, not a left join, like so:
Select coalesce(a.year, b.year, c.year) as Year
, coalesce(a.variety, b.variety, c.variety) as Variety
, a.Quantity, b.Quantity, c.Quantity
from tableA a
full outer join tableB b
on a.variety = b.variety
and a.year = b.year
full outer join tableC c
on isnull(a.variety, b.variety) = c.variety
and isnull(a.year, b.year) = c.year
where coalesce(a.year, b.year, c.year) = 2015
The left join you are using won't pick up values from b or c that aren't in a. Additionally, your where clause is dropping rows that don't have values in all three tables (because the year in those rows is null, which is not equal to 2015). The full outer join will grab rows from either table in the join, regardless of whether the other table contains a match.

joining inner join and outer join in one query in oracle

I am trying to join 7 tables in a select query four with inner join and two with outer join.
Can I combine outer and inner join in the same query? Because when I am doing so i am not getting proper results. I tried with both ANSI joins ( INNER JOIN , LEFT OUTER JOIN) and with + sign as well. I am wondering is the order of joining is important in ANSI joins?
so here is the scenario,
table a
table b
table c
table e
table f
tanle g
table h
inner join ( a , b, c )
inner join ( a , e , f)
left outer outer join ( f , g)
left outer join ( f , h)
My query ( which looks wrong) _==>
FROM a inner join b on (a.col_1 = b.col_1)
inner join c on (b.y = c.y)
inner join e on ( a.col_1 = e.col_1)
inner join f on (e.col_4 = f.col_4)
left outer join g on (g.col_5= f.col_5)
left outer join h on (h.col_6 = f.col_6)
Could any one please help me with the correct joining query?
Any lead would be highly appreciated
You can always write a query with INNER and OUTER JOIN together.
Your example is not clear because is very important, when you write a query knows your goal.
INNER JOIN: You use this operation when you want to extract rows across two (or more) tables, and for you is important the presence of those data values in both tables.
OUTER JOIN: You use this operation when you want to extract rows from a main table independent if the corresponding row is presents in linked table.
I try to make an example:
I have a table (PERSON) with a list of persons. This table has a foreign key to point a table (COUNTRY) to know information about birth place. I have another table (BANK_ACCOUNT) where I store the bank account for every person (if a person has).
My result wants to know: all person information (included the birth place name) and if a person has a bank account, knows it.
The query:
SELECT p.*, b.name, b.account_no
FROM person p
INNER JOIN country c <-- Here I apply an INNER JOIN
ON p.fk_country = c.id
LEFT OUTER JOIN bank_account b <-- Here I apply an OUTER JOIN
ON b.fk_person = p.id
In this case is very important to know the goal! About another goal the upper query can be wrong.
About the order of JOIN: Is not important the order but the type yes.
INNER JOIN: Is commutative. If you have table A and table B if you write
A INNER JOIN B is the same if you write B INNER JOIN A
OUTER JOIN: Is not commutative. If you have table A and table B the follows queries are differences:
A LEFT OUTER JOIN B
B LEFT OUTER JOIN A
Because the first tells: Get all A rows and if there exists a corresponding row in B give me those information, instead return NULL value.
The second query tells: get all B rows and if there exists a corresponding row in A give me those information, instead return NULL value.
Some of your INNER JOINs (="requirements") probably aren't returning anything.
Inner join returns nothing from the source table (a) if the "join on -condition" can't be fullfilled. Left join returns rows (from table a) and fills the joined row's columns with nulls if not found. In both cases, if there are multiple matches, multiple rows are also returned.
Example with one row in each table:
table A values (col_1, ..., col_4) = (1, 2, 3, 4)
table b values (col_1, x, y, z) = (1, 3, 5, 7)
table c values (col_1, x, y, z) = (1, 3, 5, 7)
table e values (col_1, ..., col_4) = (1, ..., 6)
table f values (col_1, ..., col_6) = (8, ..., 7, 4, 3)
table g values (col_1, ..., col_6) = (7, ..., 4, 6)
table h values (col_1, ..., col_6) = (..., 9)
And our query:
FROM a
inner join b on (a.col_1 = b.col_1) -- requirement 1
inner join c on (b.y = c.y) -- requirement 2
inner join e on ( a.col_1 = e.col_1) -- requirement 3
inner join f on (e.col_4 = f.col_4) -- requirement 4
left outer join g on (g.col_5= f.col_5) -- optional 1
left outer join h on (h.col_6 = f.col_6) -- optional 2
So do we return anything?
Requirement 1: a.col_1 = b.col_1; 1 = 1 --> OK
Requirement 2: b.y = c.y; 5 = 5 -- > OK
Requirement 3: a.col_1 = e.col_1; 1 = 1 -> OK
Requirement 4: e.col_4 = f.col_4; 6 != 7 -> NOT OK
Already at this point the query won't return anything and we don't event need to check left joins (there's nothing to join on).
If f.col4 would have been 6 instead of 7, the example row would be returned. Then we would also join the row(s) from g if the condition can be matched (g.col_5= f.col_5; 4 = 4 -> OK). In this example, selected colums from table h would all have value null, because the condition (optional 2) isn't met.
I hope this helps you finding the issues. It's really hard to see the actual problem without valid data. In the future, consider using for example SQL Fiddle with your questions.
PS. OUTER and INNER are optional words and don't make any difference in the query. So LEFT OUTER JOIN is same as LEFT JOIN and INNER JOIN is same as JOIN.

When would you INNER JOIN a LEFT JOINed table /

I came across below code today.
SELECT StaffGroup.*
FROM StaffGroup
LEFT OUTER JOIN StaffByGroup
ON StaffByGroup.StaffGroupId = StaffGroup.StaffGroupId
INNER JOIN StaffMember
ON StaffMember.StaffMemberId = StaffByGroup.StaffMemberId
WHERE StaffByGroup.StaffGroupId IS NULL
The main table StaffGroup is being LEFT JOINed with StaffByGroup table and then StaffByGroup table is being INNER JOINed with StaffMember table.
I thought the INNER JOIN is trying to filter out the records which exist in StaffGroup and StaffByGroup but do not exist in StaffMember.
But this is not how it is working. The query does not return any records.
Am I missing something in understanding the logic of the query ? Have you ever used INNER JOIN with a table which has been used with LEFT JOIN in earlier part of the query ?
Actually you are missing one concept:
The main table StaffGroup is being LEFT Joined with StaffByGroup table and then this creates a virtual table say VT1 with all records from StaffGroup and matching records from StaffByGroup based on your match/filter condition in ON predicate.Then not StaffByGroup table but this VT1 is being INNER Joined with StaffMember table based on match/filter condition in ON predicate.
So basically the inner join is trying to filter out those records from StaffGroup and hence StaffByGroup which do not have a StaffMemberId.
Adding your where condition adds a final filter like from the final virtual table created by all the above joins remove all such records which don't have a StaffGroupId which in turn might be removing all rows collected in VT1 as all of them will be having some value for StaffGroupId.
To get all records from StaffGroup which have no StaffGroupId along with details from StaffMember for all such records you can add condition in ON predicate as:
SELECT StaffGroup.*
FROM StaffGroup
LEFT OUTER JOIN StaffByGroup
ON StaffByGroup.StaffGroupId = StaffGroup.StaffGroupId and StaffByGroup.StaffGroupId IS NULL
INNER JOIN StaffMember
ON StaffMember.StaffMemberId = StaffByGroup.StaffMemberId
This query looks fundamentally flawed - I guess what was originally intended is
SELECT StaffGroup.*
FROM StaffGroup
LEFT OUTER JOIN
(SELECT * FROM StaffByGroup
INNER JOIN StaffMember
ON StaffMember.StaffMemberId = StaffByGroup.StaffMemberId) StaffByGroup
ON StaffByGroup.StaffGroupId = StaffGroup.StaffGroupId
WHERE StaffByGroup.StaffGroupId IS NULL
which returns all groups from StaffGroup that dont' have existing staffmembers assigned to them (the INNER JOIN with StaffMember filters out those rows from StaffByGroup that don't have a matching row in StaffMember - probably because there exists no foreign key between them)
You are getting 0 records because of your where clause
where StaffByGroup.StaffGroupId is null
The left join links all the records from tbl A which are contained in tbl B and since you have specified StaffGROUPID as your key and then looked for Nulls values in your key, its 100% clear that you will end up with no results