BigQuery Deduplicate rows - No unique columns - google-bigquery

I have a bigquery table which is a result of multiple left join tables.
The results are duplicated because of the left join (cartesian product)
How do I de-duplicate the rows so I only see one record?
SELECT T1.Col1,T1.Col2,........
T2.Col1,T2.Col2,........
T3.Col1,T3.Col3,........
T5.Col1,T5.Col2,........
T7.Col1.......
FROM `TABLE1` as T1
LEFT JOIN
`TABLE`as T2 ON T1.CUSTOMER_CODE = T2.CUSTOMER_CODE
LEFT JOIN
`TABLE3` as T3 ON (T1.MIAL_CODE) = T3.MIAL_CODE
LEFT JOIN
`TABLE5` as T5
ON T1.WORK_CODE = T5.WORK_CODE
LEFT JOIN
`TABLE7` as T7
ON T1.CA_DATE = T7.date
ORDER BY CA_DATE

Use DISTINCT:
select distinct *
from mytable
or create a new table:
create or replace table my_new_table
as
select distinct *
from mytable

I used GROUP BY and it worked fine to get rid of duplicates
SELECT T1.Col1,T1.Col2,........
T2.Col1,T2.Col2,........
T3.Col1,T3.Col3,........
T5.Col1,T5.Col2,........
T7.Col1.......
FROM `TABLE1` as T1
LEFT JOIN
`TABLE`as T2 ON T1.CUSTOMER_CODE = T2.CUSTOMER_CODE
LEFT JOIN
`TABLE3` as T3 ON (T1.MIAL_CODE) = T3.MIAL_CODE
LEFT JOIN
`TABLE5` as T5
ON T1.WORK_CODE = T5.WORK_CODE
LEFT JOIN
`TABLE7` as T7
ON T1.CA_DATE = T7.date
GROUP BY T1.Col1,T1.Col2,........
T2.Col1,T2.Col2,........
T3.Col1,T3.Col3,........
T5.Col1,T5.Col2,........
T7.Col1.......
ORDER BY CA_DATE

Related

multiple inner joins in bigquery result in duplicated rows

Im tryin to join 6 tables in bigquery named
T0, T1, T2, T3, T4, T5
The tables result im interested are T0 and T1
after query this tables I got 43 matches
SELECT
T1.F1,
T0.F2,
T0.F3,
T0.F4,
T1.F5,
T1.F6,
T1.F7,
T1.F8
T0.F9
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 on T1.F1= T0.F1
WHERE T0.F1 = "010001476713"
AND T0.F2 = T1.F2
ORDER BY T0.F4
But when I run this with multiple INNER JOIN I got 800 results not the 43, results are duplicated
SELECT
T2.F11,
T3.F15,
T2.F12,
T3.F16,
T3.F17,
T1.F1,
T2.F13,
T3.F17,
T5.F18,
T5.F19,
T5.F20,
T2.F14,
T0.F9,
T1.F10,
T4.F3,
T4.F21,
T4.F22,
T0.F2,
T3.F23,
T0.F3,
T0.F4,
T1.F5,
T1.F6,
T1.F7,
T1.F8
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 ON T1.F1= T0.F1
INNER JOIN `TABLE3` T3 ON T3.F1=T1.F1
INNER JOIN `TABLE2` T2 ON T2.F24 = T3.F24
INNER JOIN `TABLE4` T4 ON T4.F3 = T0.F3
INNER JOIN `TABLE5` as T5 ON T5.F1=T0.F1
WHERE T0.F1 = "010001476713"
AND T0.F2 = T1.F2
ORDER BY T0.F4
When I get duplicate rows, I solve it like this:
You get 43 results on your inner join of table T0 & T1. So far so good.
Now comment out everything related to table T2, T4, & T5 (I've placed the commas at the beginning of the row for easier commenting out) like this
SELECT
--T2.F11,
T3.F15
--,T2.F12
,T3.F16
,T3.F17
,T1.F1
--,T2.F13
,T3.F17
--,T5.F18
--,T5.F19
--,T5.F20
--,T2.F14
,T0.F9
,T1.F10
--,T4.F3
--,T4.F21
--,T4.F22
,T0.F2
,T3.F23
,T0.F3
,T0.F4
,T1.F5
,T1.F6
,T1.F7
,T1.F8
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 ON T1.F1= T0.F1 and T0.F2 = T1.F2
INNER JOIN `TABLE3` T3 ON T3.F1=T1.F1
--INNER JOIN `TABLE2` T2 ON T2.F24 = T3.F24
--INNER JOIN `TABLE4` T4 ON T4.F3 = T0.F3
--INNER JOIN `TABLE5` as T5 ON T5.F1=T0.F1
WHERE T0.F1 = "010001476713"
ORDER BY T0.F4
I've moved the and T0.F2 = T1.F2 from the where to on in the inner join. When you run this query, do you still get 43 rows, or more? If more, you need to figure out what it is double matching on, and add that to your on statement of it really is a 1-1 relationship, or perhaps group the results if you don't want multiple matches. You may need to comment out your select statement and select all to really figure it out, like this:
SELECT *
/*
--T2.F11,
T3.F15
--,T2.F12
,T3.F16
,T3.F17
,T1.F1
--,T2.F13
,T3.F17
--,T5.F18
--,T5.F19
--,T5.F20
--,T2.F14
,T0.F9
,T1.F10
--,T4.F3
--,T4.F21
--,T4.F22
,T0.F2
,T3.F23
,T0.F3
,T0.F4
,T1.F5
,T1.F6
,T1.F7
,T1.F8
*/
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 ON T1.F1= T0.F1 and T0.F2 = T1.F2
INNER JOIN `TABLE3` T3 ON T3.F1=T1.F1
--INNER JOIN `TABLE2` T2 ON T2.F24 = T3.F24
--INNER JOIN `TABLE4` T4 ON T4.F3 = T0.F3
--INNER JOIN `TABLE5` as T5 ON T5.F1=T0.F1
WHERE T0.F1 = "010001476713"
ORDER BY T0.F4
Once you figure out what rows are causing the duplication, you either group the results or add an 'and' statement to the on clause to make it a 1-1, and then move on. You then uncomment the parts of the query related to T2 and do the same thing, then T4 and then T5. If you send me the results of the query above, I can help you figure out what your on clause needs to be to keep it from duplicating.
thank you #jenstretman, I find table 4 to be duplicating matches by using a foreign Key with non-primary Key creating duplicates, the solution was to use a DISTINCT to only select specifically matched rows.
SELECT DISTINCT
T2.F11,
T3.F15,
T2.F12,
T3.F16,
T3.F17,
T1.F1,
T2.F13,
T3.F17,
T5.F18,
T5.F19,
T5.F20,
T2.F14,
T0.F9,
T1.F10,
T4.F3,
T4.F21,
T4.F22,
T0.F2,
T3.F23,
T0.F3,
T0.F4,
T1.F5,
T1.F6,
T1.F7,
T1.F8
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 ON T1.F1= T0.F1
INNER JOIN `TABLE3` T3 ON T3.F1=T1.F1
INNER JOIN `TABLE2` T2 ON T2.F24 = T3.F24
INNER JOIN (SELECT DISTINCT T4.F3, T4.F21, T4.F22, FROM `TABLE4` T4)T4 ON T4.F3 = T0.F3
INNER JOIN `TABLE5` as T5 ON T5.F1=T0.F1
WHERE T0.F1 = "010001476713"
AND T0.F2 = T1.F2
ORDER BY T0.F4

Inner join the tables where the value of one column is different. But the meaning of the values are same

Table t1
Table t2
I tried running the query
select CountyCode, ContactPerson
from table1 t1
inner join select * from table 2 t2 on t1.CountyCode[1] = Code[0]
Any one help me please.
Use this syntax:
select CountyCode, ContactPerson
from t1
inner join t2 on t1.CountyCode = t2.Code

Combine column after left join

How do I combine 3 returned value columns into one column based on which column is not null?
My query is:
SELECT val1,val2,val3
FROM db.table1 t1
LEFT JOIN db.table2 t2 ON t2.pk = t1.t2_fk
LEFT JOIN db.table3 t3 ON t3.pk = t1.t3_fk
use coalesce()
SELECT coalesce(val1,val2,val3) as va
FROM db.table1 t1
left join db.table2 t2
on t2.pk = t1.t2_fk
left join db.table3 t3
on t3.pk = t1.t3_fk

Joining tbl1 to select statement twice with join to tbl2 that also joins to tbl3

I'm using SQL server manger.
I have 3 tables
I need a query that pulls t1 ands add an Origin Basin and a Destination Basin.
So far I have the following:
select T1.[Country (destination)], T3.AreaName
From T1
left outer join T2 on
T1.[Country (destination)] = T2.CountryName
inner join T3 on
T2.AreaID = T3.AreaID
inner join T3 on
T2.AreaID = T3.AreaID
Which returns:
Country | Area
However, I'm having trouble doing this for the second country column. I believe you use aliases. I've tried:
select (select AreaName
FROM T3
where T3.AreaID = T2.AreaID) as 'Area Imp',
(select AreaID
From T2
where T2.CountryName = T1.[Country (origin)]) as 'x',
(select AreaID
From T2
where T2.CountryName = T1.[Country (destination)]) as 'y'
FROM T1
But I can't get it to work.
This is what you need to do:
select t1.date, t1.country_destination, t1.country_origin, destination_area.AreaName as area_destination, origin_area.AreaName as area_origin
from t1 as t1 join t2 as destination on t1.country_destination = destination.countryname
join t2 as origin on t1.country_origin = origin.countryname
join t3 as destination_area on t2.areaid = destination_area.areaid
join t3 as origin_area on t2.areaid = origin_area.areaid
You will need to join with the same table twice, both for t2 and t3 so that you get the matching records for your needs.
It helps usually to put aliases that match the purpose of the join (in this case, destination and origin) when writing the query.
I think what you're trying to do is something like this:
select T1.*, T3dest.AreaName, T3orig.AreaName
From
T1
inner join
T2 T2dest on
T1.[Country (destination)] = T2dest.CountryName
inner join
T3 T3dest on
T2dest.AreaID = T3dest.AreaID
inner join
T2 T2orig on
T1.[Country (origin)] = T2orig.CountryName
inner join
T3 T3orig on
T2orig.AreaID = T3orig.AreaID
Note that I've switched to inner joins throughout, at the moment. If you do want left join semantics, you either need to use those for all of the joins to the T2 and T3 tables or you need to change the join order (so that the relevant T3 joins to the T2 tables occur before the attempted join with T1). It's not clear from the sample data if that's required, however.
Try this, You would still want to join on area id's
select T1.Date,T1.[Country (destination)], null [Country (origin)], T3.AreaName [AreaName(Destination)], null [AreaName(Origin)]
From T1
left outer join T2 on
T1.[Country (destination)] = T2.CountryName
inner join T3 on
T2.AreaID = T3.AreaID
union all
select T1.Date,null [Country (destination)], t1.[Country (origin)], Null [AreaName(Destination)], t3. [AreaName(Origin)]
From T1
left outer join T2 on
T1.[Country (Origin)] = T2.CountryName
inner join T3 on
T2.AreaID = T3.AreaID

SQL join order and conditions

I've got 3 tables that I want to join and filter on some conditions.
I've first wrote this query:
select * from table1 t1
left join (select * from table2 where table2.fieldX=...) t2
on t1.id_12=t2.id_12
left join table3
on t2.id_23=t3.id_23
where t1.fieldY=...
Then I wanted to make it looks like more canonical by rewritting it like that:
select * from table1 t1
left join table2 t2
on t1.id_12=t2.id_12
left join table3
on t2.id_23=t3.id_23
where table2.fieldX=...
and t1.fieldY=...
But it does not give the same result.
I dont't understand why...
Do you?
Thanks in advance.
When you put table2.fieldX=... in the where clause you eliminate all rows from table1 that do not have a corresponding row in table2. Effectively you are changing the left join into an inner join.
Instead, you can apply the table2 filter in the join itself:
SELECT
*
FROM
Table1 t1
LEFT JOIN Table2 t2 ON t1.id_12 = t2.id_12 AND t2.fieldX = ...
LEFT JOIN Table3 t3 ON t2.id_23 = t3.id_23
WHERE
t1.fieldY = ...
Did you add the inner select where on the second query?
SELECT *
FROM table1 t1
LEFT JOIN table2 t2
on t1.id_12=t2.id_12
LEFT JOIN table3
on t2.id_23=t3.id_23
WHERE t1.fieldY=...
and t2.fieldX=...