Inconsistent Hive Left Join Results - sql

I composed a simple left out join hiveql
select * from a left outer join b on (a.f1=b.f1 and a.f2=b.f2)
The total count of above query result is 798,608.
However, the total number of records in table a is 780,499, which doesn't match.
I tried to find all records that only exist in the left join results but not in table a; the results returned blank.
I even tried to create 2 small tables (a' and b') with a few records and the count of the left join result matches the count of table a' records, as expected.
What could cause the inconsistent results?

Thanks to David Lee. There are 1 to many situation in table b.
Problem solved.

Related

Get full join in google big query keeping the all frequecy combination in bigquery giving me only left join for all kind of join

I am trying to join 2 table using Id such that the frequency of all combination are present. But when I am using the join (left, right) I am still getting the inner join or left join output.
these are the table a
b
I am expecting output
I tried the actual code
select
act.action,
dvr.dateofdel,
dvr.output
FROM internal.actions as act
Right join internal.deliveries as dvr
ON dvr.id= act.id
I tried multiple joins but still same outcome ..
Your query does not match your sample data.
But based on your problem statement, I suspect that you want:
select
act.ID_log,
act.ID_send_message,
act.action_date,
act.action,act.ID_email,
dvr.delivery_date,
act.email
from internal.actions as act
left join internal.deliveries as dvr
on dvr.ID_send_message= act.ID_send_message
and dvr.delivery_date >= '2017-01-01'
and dvr.delivery_date < '2018-01-01'
where act.ID_send_message != 0
This will bring all records from act that satisfy the condition in the where clause, along with information coming from dvr; when there is no match in dvr, the corresponding columns will show null values. The important part in the query is that all conditions on the left joined table should be listed in the on clause of the join (rather than in the where clause).

Getting way more results than expected in SQL left join query

My code is such:
SELECT COUNT(*)
FROM earned_dollars a
LEFT JOIN product_reference b ON a.product_code = b.product_code
WHERE a.activity_year = '2015'
I'm trying to match two tables based on their product codes. I would expect the same number of results back from this as total records in table a (with a year of 2015). But for some reason I'm getting close to 3 million.
Table a has about 40,000,000 records and table b has 2000. When I run this statement without the join I get 2,500,000 results, so I would expect this even with the left join, but somehow I'm getting 300,000,000. Any ideas? I even refered to the diagram in this post.
it means either your left join is using only part of foreign key, which causes row multiplication, or there are simply duplicate rows in the joined table.
use COUNT(DISTINCT a.product_code)
What is the question are are trying to answer with the tsql?
instead of select count(*) try select a.product_code, b.product_code. That will show you which records match and which don't.
Should also add a where b.product_code is not null. That should exclude the records that don't match.
b is the parent table and a is the child table? try a right join instead.
Or use the table's unique identifier, i.e.
SELECT COUNT(a.earned_dollars_id)
Not sure what your datamodel looks like and how it is structured, but i'm guessing you only care about earned_dollars?
SELECT COUNT(*)
FROM earned_dollars a
WHERE a.activity_year = '2015'
and exists (select 1 from product_reference b ON a.product_code = b.product_code)

When joining 2 tables one table comes up null

I am joining 2 tables on the first table I get all the relevant data on the second table I only get nulls. There are no nulls in either table Can any one tell me why this is happening?
select * from apmast
left join apitem
on apmast.fvendno + apmast.fccompany = apitem.fcinvkey
There is a problem with your ON that's resulting in you not getting matching records. A LEFT JOIN means that you should get all data from the left table and only the matching records from the right table, or else NULL where there are no matching records. The key to the join, however, is the ON statement. Make sure that apmast.fvendno + apmast.fccompany is actually equal to apitem.fcinvkey.
here is a explanation on the types of joins just incase you get stuck in the future.
INNER JOIN this will get only the rows that match in both the FROM clause and the JOINING table.
LEFT OUTER JOIN this gets all the rows from the table specified in the FROM clause and only the rows that match in the JOINING table.
RIGHT OUTER JOIN this gets all the rows from the table specified in the JOIN clause and only the rows that match in the FROM clause.
FULL OUTER JOIN this will get all the rows from both tables.
SELF JOIN this is used when you need to join the table back to its self to return data.

Inner join between two tables with same count values

I have been working on this issue since 2 days now.
I have two tables created by using SQL Select statements
SELECT (
) Target
INNER JOIN
SELECT (
) Source
ON Join condition 1
AND Join condition 2
AND Join condition 3
AND Join condition 4
AND Join condition 5
The target table has count value of 10,000 records.
The source table has count value of 10,000 records.
but when I do an inner join between the two tables on the 5 join conditions
I get 9573 records.
I am basically trying to find a one to one match between source and target table. I feel every field from target matches every field in source.
Questions:
Why does my inner join give less records even if there are same value of records in both tables?
If it is expected, how can I make sure I get the exact 10,000 records after the join condition?
1) An INNER JOIN only outputs the rows from the JOINING of two tables where their joining columns match. So in your case, Join Condition1 may not exist in rows in both tables and therefore some rows are filtering out.
2) As the other poster mentioned a left join is one way. You need to look which table source or target you want to use as your master i.e. start from and return all those rows. You then left join the remaining table based on your conditions to add all the columns where you join conditions match.
It's probably better if you give us the tables you are working on and the query\results you are trying to achieve.
There's some really good articles about the different joins out there. But it looks like you'd be interested in left joins. So if it exists in Target, but not in Source, it will not drop the record.
So, it would be:
SELECT(...) Target
LEFT OUTER JOIN
SELECT(...) Source
ON cond1 and cond2 and cond3 and cond4 and cond5
Give that a shot and let me know how it goes!
Sometime you need to rely on logical analysis rather than feelings. Use this query to find the fields that do not match and then work out your next steps
SELECT
Target.Col1,Source.Col1,
Target.Col2,Source.Col2,
Target.Col3,Source.Col3
FROM
(
) Target
FULL OUTER JOIN
(
) Source
ON Target.Col1=Source.Col1
AND Target.Col2=Source.Col2
AND Target.Col3=Source.Col3
WHERE (
Target.Col1 IS NULL
OR Source.Col1 IS NULL
OR Target.Col2 IS NULL
OR Source.Col2 IS NULL
OR Target.Col3 IS NULL
OR Source.Col3 IS NULL
)

Duplicate columns with inner Join

SELECT
dealing_record.*
,shares.*
,transaction_type.*
FROM
shares
INNER JOIN shares ON shares.share_ID = dealing_record.share_id
INNER JOIN transaction_type ON transaction_type.transaction_type_id = dealing_record.transaction_type_id;
The above SQL code produces the desired output but with a couple of duplicate columns. Also, with incomplete display of the column headers. When I change the
linesize 100
the headers shows but data displayed overlaps
I have checked through similar questions but I don't seem to get how to solve this.
You have duplicate columns, because, you're asking to the SQL engine for columns that they will show you the same data (with SELECT dealing_record.* and so on) , and then duplicates.
For example, the transaction_type.transaction_type_id column and the dealing_record.transaction_type_id column will have matching rows (otherwise you won't see anything with an INNER JOIN) and you will see those duplicates.
If you want to avoid this problem or, at least, to reduce the risk of having duplicates in your results, improve your query, using only the columns you really need, as #ConradFrix already said. An example would be this:
SELECT
dealing_record.Name
,shares.ID
,shares.Name
,transaction_type.Name
,transaction_type.ID
FROM
shares
INNER JOIN shares ON shares.share_ID = dealing_record.share_id
INNER JOIN transaction_type ON transaction_type.transaction_type_id = dealing_record.transaction_type_id;
Try to join shares with dealing_record, not shares again:
select dealing_record.*,
shares.*,
transaction_type.*
FROM shares inner join dealing_record on shares.share_ID = dealing_record.share_id
inner join transaction_type on transaction_type.transaction_type_id=
dealing_record.transaction_type_id;