multiple inner joins in bigquery result in duplicated rows - sql

Im tryin to join 6 tables in bigquery named
T0, T1, T2, T3, T4, T5
The tables result im interested are T0 and T1
after query this tables I got 43 matches
SELECT
T1.F1,
T0.F2,
T0.F3,
T0.F4,
T1.F5,
T1.F6,
T1.F7,
T1.F8
T0.F9
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 on T1.F1= T0.F1
WHERE T0.F1 = "010001476713"
AND T0.F2 = T1.F2
ORDER BY T0.F4
But when I run this with multiple INNER JOIN I got 800 results not the 43, results are duplicated
SELECT
T2.F11,
T3.F15,
T2.F12,
T3.F16,
T3.F17,
T1.F1,
T2.F13,
T3.F17,
T5.F18,
T5.F19,
T5.F20,
T2.F14,
T0.F9,
T1.F10,
T4.F3,
T4.F21,
T4.F22,
T0.F2,
T3.F23,
T0.F3,
T0.F4,
T1.F5,
T1.F6,
T1.F7,
T1.F8
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 ON T1.F1= T0.F1
INNER JOIN `TABLE3` T3 ON T3.F1=T1.F1
INNER JOIN `TABLE2` T2 ON T2.F24 = T3.F24
INNER JOIN `TABLE4` T4 ON T4.F3 = T0.F3
INNER JOIN `TABLE5` as T5 ON T5.F1=T0.F1
WHERE T0.F1 = "010001476713"
AND T0.F2 = T1.F2
ORDER BY T0.F4

When I get duplicate rows, I solve it like this:
You get 43 results on your inner join of table T0 & T1. So far so good.
Now comment out everything related to table T2, T4, & T5 (I've placed the commas at the beginning of the row for easier commenting out) like this
SELECT
--T2.F11,
T3.F15
--,T2.F12
,T3.F16
,T3.F17
,T1.F1
--,T2.F13
,T3.F17
--,T5.F18
--,T5.F19
--,T5.F20
--,T2.F14
,T0.F9
,T1.F10
--,T4.F3
--,T4.F21
--,T4.F22
,T0.F2
,T3.F23
,T0.F3
,T0.F4
,T1.F5
,T1.F6
,T1.F7
,T1.F8
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 ON T1.F1= T0.F1 and T0.F2 = T1.F2
INNER JOIN `TABLE3` T3 ON T3.F1=T1.F1
--INNER JOIN `TABLE2` T2 ON T2.F24 = T3.F24
--INNER JOIN `TABLE4` T4 ON T4.F3 = T0.F3
--INNER JOIN `TABLE5` as T5 ON T5.F1=T0.F1
WHERE T0.F1 = "010001476713"
ORDER BY T0.F4
I've moved the and T0.F2 = T1.F2 from the where to on in the inner join. When you run this query, do you still get 43 rows, or more? If more, you need to figure out what it is double matching on, and add that to your on statement of it really is a 1-1 relationship, or perhaps group the results if you don't want multiple matches. You may need to comment out your select statement and select all to really figure it out, like this:
SELECT *
/*
--T2.F11,
T3.F15
--,T2.F12
,T3.F16
,T3.F17
,T1.F1
--,T2.F13
,T3.F17
--,T5.F18
--,T5.F19
--,T5.F20
--,T2.F14
,T0.F9
,T1.F10
--,T4.F3
--,T4.F21
--,T4.F22
,T0.F2
,T3.F23
,T0.F3
,T0.F4
,T1.F5
,T1.F6
,T1.F7
,T1.F8
*/
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 ON T1.F1= T0.F1 and T0.F2 = T1.F2
INNER JOIN `TABLE3` T3 ON T3.F1=T1.F1
--INNER JOIN `TABLE2` T2 ON T2.F24 = T3.F24
--INNER JOIN `TABLE4` T4 ON T4.F3 = T0.F3
--INNER JOIN `TABLE5` as T5 ON T5.F1=T0.F1
WHERE T0.F1 = "010001476713"
ORDER BY T0.F4
Once you figure out what rows are causing the duplication, you either group the results or add an 'and' statement to the on clause to make it a 1-1, and then move on. You then uncomment the parts of the query related to T2 and do the same thing, then T4 and then T5. If you send me the results of the query above, I can help you figure out what your on clause needs to be to keep it from duplicating.

thank you #jenstretman, I find table 4 to be duplicating matches by using a foreign Key with non-primary Key creating duplicates, the solution was to use a DISTINCT to only select specifically matched rows.
SELECT DISTINCT
T2.F11,
T3.F15,
T2.F12,
T3.F16,
T3.F17,
T1.F1,
T2.F13,
T3.F17,
T5.F18,
T5.F19,
T5.F20,
T2.F14,
T0.F9,
T1.F10,
T4.F3,
T4.F21,
T4.F22,
T0.F2,
T3.F23,
T0.F3,
T0.F4,
T1.F5,
T1.F6,
T1.F7,
T1.F8
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 ON T1.F1= T0.F1
INNER JOIN `TABLE3` T3 ON T3.F1=T1.F1
INNER JOIN `TABLE2` T2 ON T2.F24 = T3.F24
INNER JOIN (SELECT DISTINCT T4.F3, T4.F21, T4.F22, FROM `TABLE4` T4)T4 ON T4.F3 = T0.F3
INNER JOIN `TABLE5` as T5 ON T5.F1=T0.F1
WHERE T0.F1 = "010001476713"
AND T0.F2 = T1.F2
ORDER BY T0.F4

Related

BigQuery Deduplicate rows - No unique columns

I have a bigquery table which is a result of multiple left join tables.
The results are duplicated because of the left join (cartesian product)
How do I de-duplicate the rows so I only see one record?
SELECT T1.Col1,T1.Col2,........
T2.Col1,T2.Col2,........
T3.Col1,T3.Col3,........
T5.Col1,T5.Col2,........
T7.Col1.......
FROM `TABLE1` as T1
LEFT JOIN
`TABLE`as T2 ON T1.CUSTOMER_CODE = T2.CUSTOMER_CODE
LEFT JOIN
`TABLE3` as T3 ON (T1.MIAL_CODE) = T3.MIAL_CODE
LEFT JOIN
`TABLE5` as T5
ON T1.WORK_CODE = T5.WORK_CODE
LEFT JOIN
`TABLE7` as T7
ON T1.CA_DATE = T7.date
ORDER BY CA_DATE
Use DISTINCT:
select distinct *
from mytable
or create a new table:
create or replace table my_new_table
as
select distinct *
from mytable
I used GROUP BY and it worked fine to get rid of duplicates
SELECT T1.Col1,T1.Col2,........
T2.Col1,T2.Col2,........
T3.Col1,T3.Col3,........
T5.Col1,T5.Col2,........
T7.Col1.......
FROM `TABLE1` as T1
LEFT JOIN
`TABLE`as T2 ON T1.CUSTOMER_CODE = T2.CUSTOMER_CODE
LEFT JOIN
`TABLE3` as T3 ON (T1.MIAL_CODE) = T3.MIAL_CODE
LEFT JOIN
`TABLE5` as T5
ON T1.WORK_CODE = T5.WORK_CODE
LEFT JOIN
`TABLE7` as T7
ON T1.CA_DATE = T7.date
GROUP BY T1.Col1,T1.Col2,........
T2.Col1,T2.Col2,........
T3.Col1,T3.Col3,........
T5.Col1,T5.Col2,........
T7.Col1.......
ORDER BY CA_DATE

How to replace an OR statement from a join in sql server

I have the following query that uses an or statement on a join, so basically if one condition on the join isn't met it must check the next condition. The problem is that with the OR statement it takes really long to run but when I remove one of the OR conditions it runs instantly. is there a better way to do this with both conditions without using the OR statement so it would speed up the query
select t5.TransactionNumber
,t4.ID
,t3.[Entry] AS Amount
,t2.Address AS AddressDetail
,t1.PhoneNumber AS ContactNumber
FROM Table1 t1 (NOLOCK)
JOIN Table2 t2 (NOLOCK) ON t2.FicaID = t1.FicaId
inner join Table3 t3 (NOLOCK) ON (t3.ID = t2.ID AND t3.Code = t2.Code) or (t3.TypeID = t2.TypeID) //on this join i have an or statement if one condition isnt met it must check the next condition
LEFT JOIN Table4 t4 (NOLOCK) ON t4.Result = t3.Result
LEFT JOIN Table5 t5 (NOLOCK) ON t5.AccNum = t3.AccNum
where t1.date>'2018-09-01' and t1.date<'2018-09-30'
By the rule of distributivity in logic,
P OR (Q AND R) can be written as
(P OR Q) AND (P OR R).. maybe that helps?
You could try by using left join and COALESCE function
select t5.TransactionNumber
,t4.ID
,COALESCE(t3.[Entry],t33.[Entry]) AS Amount
,t2.Address AS AddressDetail
,t1.PhoneNumber AS ContactNumber
FROM Table1 t1 (NOLOCK)
JOIN Table2 t2 (NOLOCK) ON t2.FicaID = t1.FicaId
left join Table3 t3 (NOLOCK) ON (t3.ID = t2.ID AND t3.Code = t2.Code)
left join Table3 t33 (t33.TypeID = t2.TypeID) //I moved it to left join
LEFT JOIN Table4 t4 (NOLOCK) ON t4.Result = t3.Result
LEFT JOIN Table5 t5 (NOLOCK) ON t5.AccNum = t3.AccNum
where t1.date>'2018-09-01' and t1.date<'2018-09-30'
You can try below query :
select * from
(
select t5.TransactionNumber
,t4.ID
,t3.[Entry] AS Amount
,t2.Address AS AddressDetail
,t1.PhoneNumber AS ContactNumber
FROM Table1 t1 (NOLOCK)
JOIN Table2 t2 (NOLOCK) ON t2.FicaID = t1.FicaId
inner join Table3 t3 (NOLOCK) ON (t3.ID = t2.ID AND t3.Code = t2.Code)
)A
join Table3 t3 (NOLOCK) ON (A.TypeID = t3.TypeID)
LEFT JOIN Table4 t4 (NOLOCK) ON t4.Result = t3.Result
LEFT JOIN Table5 t5 (NOLOCK) ON t5.AccNum = t3.AccNum
where t1.date>'2018-09-01' and t1.date<'2018-09-30'

Joining tbl1 to select statement twice with join to tbl2 that also joins to tbl3

I'm using SQL server manger.
I have 3 tables
I need a query that pulls t1 ands add an Origin Basin and a Destination Basin.
So far I have the following:
select T1.[Country (destination)], T3.AreaName
From T1
left outer join T2 on
T1.[Country (destination)] = T2.CountryName
inner join T3 on
T2.AreaID = T3.AreaID
inner join T3 on
T2.AreaID = T3.AreaID
Which returns:
Country | Area
However, I'm having trouble doing this for the second country column. I believe you use aliases. I've tried:
select (select AreaName
FROM T3
where T3.AreaID = T2.AreaID) as 'Area Imp',
(select AreaID
From T2
where T2.CountryName = T1.[Country (origin)]) as 'x',
(select AreaID
From T2
where T2.CountryName = T1.[Country (destination)]) as 'y'
FROM T1
But I can't get it to work.
This is what you need to do:
select t1.date, t1.country_destination, t1.country_origin, destination_area.AreaName as area_destination, origin_area.AreaName as area_origin
from t1 as t1 join t2 as destination on t1.country_destination = destination.countryname
join t2 as origin on t1.country_origin = origin.countryname
join t3 as destination_area on t2.areaid = destination_area.areaid
join t3 as origin_area on t2.areaid = origin_area.areaid
You will need to join with the same table twice, both for t2 and t3 so that you get the matching records for your needs.
It helps usually to put aliases that match the purpose of the join (in this case, destination and origin) when writing the query.
I think what you're trying to do is something like this:
select T1.*, T3dest.AreaName, T3orig.AreaName
From
T1
inner join
T2 T2dest on
T1.[Country (destination)] = T2dest.CountryName
inner join
T3 T3dest on
T2dest.AreaID = T3dest.AreaID
inner join
T2 T2orig on
T1.[Country (origin)] = T2orig.CountryName
inner join
T3 T3orig on
T2orig.AreaID = T3orig.AreaID
Note that I've switched to inner joins throughout, at the moment. If you do want left join semantics, you either need to use those for all of the joins to the T2 and T3 tables or you need to change the join order (so that the relevant T3 joins to the T2 tables occur before the attempted join with T1). It's not clear from the sample data if that's required, however.
Try this, You would still want to join on area id's
select T1.Date,T1.[Country (destination)], null [Country (origin)], T3.AreaName [AreaName(Destination)], null [AreaName(Origin)]
From T1
left outer join T2 on
T1.[Country (destination)] = T2.CountryName
inner join T3 on
T2.AreaID = T3.AreaID
union all
select T1.Date,null [Country (destination)], t1.[Country (origin)], Null [AreaName(Destination)], t3. [AreaName(Origin)]
From T1
left outer join T2 on
T1.[Country (Origin)] = T2.CountryName
inner join T3 on
T2.AreaID = T3.AreaID

Joining multiple tables by skipping On condition for few tables and join them alone

I want to understand how we can join by skipping the On condition. Let me explain with example. If you notice in below query, there is no ON condition for T3 and I am just joining with that table to T4 table.
Question: How Data set will create and how that will combine to other other data set?
SELECT * FROM T1
INNER JOIN T2 ON T1.ID = T2.ID
INNER JOIN T3
LEFT JOIN T4 ON T3.ID = T4.ID
LEFT JOIN T5 ON T1.ID = T5.ID
Well, when there is no join relation, then you are basically performing a CROSS JOIN , your query is basically equivalent to :
SELECT * FROM T1
INNER JOIN T2 ON T1.ID = T2.ID
CROSS JOIN(SELECT * FROM T3
LEFT JOIN T4 ON T3.ID = T4.ID)
LEFT JOIN T5 ON T1.ID = T5.ID

SQL Join Syntax with three tables with a left join

I have three tables T1, T2, T3 the relations are:
T1 one to many on T2 (Field T2.ID1 -> T1.ID)
T3 one to many on T2 (Feild T2,ID2 -> T3,ID)
What I need are all records from T1, only those records from T2 where T2.ID1 is equal to T1.ID and the record from T3 where T2.ID3 = T3.ID
The situation is T1 is a list of parts, T2 a list of order lines, T3 is the order header. The first relation is any part (T1) can appear on many order lines (T2), the second relation is for any order line (2) there is only one order header (T3) but and order header could have many lines.
What I have got so far is:
SELECT ar.customer_id,
ar.invnumber,
ar.transdate,
invoice.qty,
parts.partnumber,
parts.description,
parts.rop,
parts.bin,
parts.obsolete,
parts.partsgroup_id,
parts.onhand
FROM (parts LEFT JOIN invoice ON
parts.id = invoice.parts_id)
INNER JOIN ar ON invoice.trans_id = ar.id
But his is not giving me any of the parts that are not on any order lines at all. PARTS = T1, ORDER LINES = T2, AR = T3
I believe the problem is that you're doing the inner join after the left join. The inner join is then discarding the results of the left join that don't have a match on T3.
Try this FROM clause and let me know:
FROM (invoice INNER JOIN ar ON invoice.trans_id = ar.ID)
RIGHT JOIN parts ON parts.id = invoice.parts_id
You should INNER JOIN orderline with AR and LEFT JOIN the Parts
SELECT ar.customer_id,
ar.invnumber,
ar.transdate,
invoice.qty,
parts.partnumber,
parts.description,
parts.rop,
parts.bin,
parts.obsolete,
parts.partsgroup_id,
parts.onhand
FROM ar
INNER JOIN invoice ON invoice.trans_id = ar.id
LEFT JOIN parts ON parts.id = invoice.parts_id