Multiple table join query - IDs and data tables - hive

SOLVED turns out it was my where clause which was throwing off the results, I changed this out and added the where clause to the ON statement
I need some help.
I have a table with 25 million IDs and 4 tables with IDs and data. I need to create a new table with these 25 million IDs as well as the associated table data from the 4 tables. Each data table will not contain the full 25 million IDs. So as an example;
ID Table:
ID
A
B
Table 1
ID
measure_a
measure_b
B
1
3
Table 2
ID
measure_f
measure_g
A
3
4
etc..
Expected output:
ID
measure_a
measure_b
measure_f
measure_g
A
3
4
NULL
NULL
B
NULL
NULL
1
3
The most important thing is the 25 million IDs are in the final table. I've tried multiple joins but end up with a hugely reduced number of IDs which I believe is due to the IDs which don't match on the join condition being filtered out.
Any help is greatly appreciated.

You would use left joins:
select ids.id, t1.measure_a, t1.measure_b, t2.measure_f, t2.measure_g
from ids left join
table1 t1
on ids.id = t1.id left join
table2 t2
on ids.id = t2.id;

Related

Duplicate rows in left join

I have 2 tables. There are about 100000 of null in one column, other values are integer, total values are about 200000. Another table has only the integer value. When I use the left join on this column, it gave me a lot of duplicates rows. Is it ok to use left join here?
Table 1:
Column 1
2
3
5
null
null
Table 2:
Column 1
1
2
3
so on
Your example is really odd. Why would anyone have null values in an ID field? But anyway.
If you need fields from table 2 in the resultset as you say above then you must use an INNER JOIN not a LEFT JOIN
Something like:
SELECT DISTINCT a.id, a.name, b.someOtherField
FROM Table1 a
INNER JOIN Table2 b ON a.id = b.id
Please note: Since only the ID field of table 1 has null values there will be no records selected from table 1 with id IS NULL because they have no equivalent in table 2. Adding the DISTINCT keyword helps in case this query would still produce duplicates.

how to fetch distinct records from 3 tables which should contain all the values from 1 table

I have query which should fetch all the matching records from 3 tables. other records which are not common in three tables, it should union the records from first table..
EX:
select a.x,a.y,a.z
from table1 a,table2 b,table3 c
where a.x=b.x
and b.x=c.x;
above query will fetch common records among all 3 tables.
I need to add the records to my result set which are not present in table2 or table3.
Records should come like below:
1 abc acd
2 xyz xzy
3 pqr prq
4 null null -- incase 4 is not present in either table2 or table3
You need to use an outer join instead:
select a.x,a.y,a.z
from table1 a
left join table2 b on a.x=b.x
left join table3 c on b.x=c.x;
A Visual Explanation of SQL Joins
You state
other records which are not common in three tables, it should union
the records from first table.
Then state
I need to add the records to my result set which are not present in
table2 or table3.
For the first case use a full outer join which will return all rows from all tables. For the second case use left joins which will leave out rows that have no match to table1.

SQL - using JOIN while filling missing values with NULL

The most related question I looked into was this but sadly I did not get a solution for my problem there.
I have two tables, both have a similar column. The only difference is that one column is missing a few values. I want to join the tables, so that for the missing value in one column, the join will show the missing values.
Ill provide an example since this might be confusing -
table 1 table 2
ID count ID count
1 9 1 2
2 2 2 1
3 1
I want the result to be
table 3
ID count2 count1
1 2 9
2 1 2
3 NULL 1
However, using LEFT OUTER JOIN I could only achieve the table "table 3" without the row for id 3, because it has no representation in table 2.
Can you help me with my problem?
A left join would work for your sample data, I'm guessing you want to know what to do if you move the row with id 3 into table 2 so that your query will show all ids. To show all rows from both tables, use a FULL OUTER JOIN:
SELECT CASE WHEN t1.id IS NULL THEN t2.id ELSE t1.id END AS id,
t2.count as count2, t1.count as count1
FROM t1
FULL OUTER JOIN t2 ON t2.id = t1.id

SQL STATEMENT QUERY

I have table1Name with data populated and table2 with no data populated.
select * from [database1Name].dbo.table1Name
join [database1Name].dbo.table2Name
on [database1Name].dbo.table1Name.fieldName like value;
After running the above sql statement it joins the tables but does not show any populated data from the table 'table1Name'.
Why does this happen?
Using JOIN which is an INNER JOIN means that it will get you only data where the condition matches. So if the second table has not data, then the condition is never met, so you get no data in return.
In your case you need a LEFT JOIN. This will get all the rows from the left table (table1Name in your case) and the corresponding values from the right table when the condition is met.
SELECT *
FROM [database1Name].dbo.table1Name
LEFT JOIN [database1Name].dbo.table2Name
ON [database1Name].dbo.table1Name.fieldName like [database1Name].dbo.table2Name.fieldName;
Just to mention that using joins mean that you might get multiple times a single row from a specific table. For instance since you have a LIKE condition, if fieldName of Table 1 matches fieldName in 2 rows from Table 2 then you will get two rows containing the same row from Table 1 and the two rows from Table 2:
Example:
Table1
FieldName
1
2
Table2
FieldName OtherField
1 1
1 2
Result of LEFT JOIN
T1FieldName T2FieldName T2OtherField
1 1 1
1 1 2
2 NULL NULL

left join on MS SQL 2008 R2

I'm trying to left join two tables. Table A contains unique 100 records with field_a_1, field_a_2, field_a_3. The combination of field_a_1 and field_a_2 is unique.
Table B has multi-million records with multiple fields. field_b_1 is same as field_a_1 and field_b_2 is same as field_a_2.
I join the two tables together like this:
select a.*, b.*
from a
left join b
on field_a_1 = field_b_1
and field_a_2 = field_b_2
Instead of getting 100 records, I get multi-million records. Why is this?
Because table B has multiple rows for each table A entry.
For example:
TableA (ID)
1
2
3
TableB (ID, data)
1 hello
1 world
1 foo
1 bar
2 data
2 words
2 more
3 words
3 boring
If you left join from TableA to TableB, you will get a row for every TableB record that matches a TableA record - ie. all of them.
Can you explain what results you are looking for?
Because a left join returns all of the rows from the first table + all of the matching rows from the second table. Which of the millions of matching rows did you expect to get?
Left join or inner join don't really make a difference. A JOIN will return all rows that match the join condition. So if table b has millions of rows that match the JOIN criteria, then all the rows will be returned.
Depending on what you wish to accomplish you should consider using the DISTINCT keyword or GROUP BY to perform aggregate functions.