Hive Query is not working as expected - sql

I am trying a left join in Hive Query, but it does not seem to work. It returns me columns only from left table:
create table mb.spt_new_var as select distinct customer_id ,target from mb.spt_201603 A
left outer join mb.temp B
on (A.customer_id=B.cust_id);
I tried selecting few records from table B based on the some random customer_id from table A and it returns some records. But if I try the left join on table A, it returns me only columns from table A. The data-type of both the IDs is same(int). what could be the possible reason behind this?
Sample Table A:
Customer_account_id target
12356 1
34245 0
12356 1
.... ..
Sample Table B:
Cust_id col1 col2 col3
12356 ..
12567 ..
24426 ..
...
Table A has some 1m records, while table B has some 30m records. There is possibility of some duplicate IDs in table A and Table B.

I'm a bit confused. Hive is returning the columns that you specify in the query:
select distinct a.customer_id, a.target
from mb.spt_201603 a left outer join
mb.temp b
on a.customer_id = b.cust_id;
If you want columns from the second table, you need to select them:
select distinct a.customer_id, a.target, b.col1, b.col2
from mb.spt_201603 a left outer join
mb.temp b
on a.customer_id = b.cust_id;

Related

Almost equal table with different running time

I’m using oracle. I have two table A and B
Table A has 8000 rows and 5 five columns
Table B has 5500 rows and same 5 columns
All of 5500 rows in table B are contained in Table A and they are the same
I have a query like
With t1 as (select distinct(id) from table A/B)
,T2 as (select a.id, c.value, d.value from Table A/B a
Join table c c on “conditions”
Join table d d on “conditions”
) select * from t2
So the query with Table A works excellent but with Table B it freezes for eternity.
Data types and other properties are equal in table A and table A.
Where should i look for the problem?
I tried to explain plan but differences only in row “PX PARTITION HASH JOIN-FILTER” in Table A and “PX BLOCK ITERATOR ADAPTIVE” in table B

Hive join understanding issue

I have created two tables as below in hive
create table test1(id string);
create table test2(id string);
test1 has values as given below
1
1
test2 has values as given below
1
1
When I am joining these two tables I am getting output with
1
1
1
1
This is the query used :
select a.id from test1 a,test2 b where a.id=b.id;
Please help I expected the output to be as
1
1
I am using cloudera distribution
Better use ANSI join syntax:
select a.id
from test1 a
inner join test2 b on a.id=b.id
The expected output cannot be the result of your join because for each a.id all matching rows from a and b are selected. For the first row from a it will be two matching rows in b. For the second row from a it will be also two matching rows from b. So it will be four rows totally.
You can apply distinct to the second table before join for example.
select a.id
from test1 a
inner join (select distinct b.id from test2 b) b on a.id=b.id
In this case for each row in table a it will be single matching row in table b.
See this lesson to understand JOINS better: https://www.coursera.org/learn/analytics-mysql/lecture/kydcf/joins-with-many-to-many-relationships-and-duplicates

How to take distinct values in hive join

I need to take the distinct values from Table 2 while joining with Table 1 in Hive. Because the table 2 has duplicate records.
Considering below join condition is it possible to take only distinct key_col from table 2? i dont want to use select distinct * from ...
select * from Table_1 a left join Table_2 b on a.key_col = b.key_col
Note: This is in Hive
Use Left semi join. This will give you all the record in table1 which exist in table2(duplicate record) without duplicates.
select a.* from Table_1 a left semi join Table_2 b on a.key_col = b.key_col

MS ACCESS Query for items not in table

I am a beginner with SQL and I have a question regarding finding a subset of data that does not exist in another table.
Currently I have 2 tables
Table A has a single column of OrderID containing about 300 records
Table B also has a single column containing 1000 records
How do I write a SQL query that helps me identify the 700 records not in Table A?
Thank you
You need to use NOT IN.Try this:
SELECT * FROM TableB
WHERE OrderID NOT IN (SELECT OrderID FROM TableA)
OR
Use a join.
SELECT B.*
FROM TableB B LEFT JOIN TableA A ON A.OrderID = B.OrderID
WHERE A.OrderID IS NULL
Try this:
SELECT TableB.* FROM TableB LEFT JOIN TableA ON TableВ.OrderID = TableA.OrderID WHERE TableA.OrderID is NULL;

The Difference Inner Join Query

I'm just curious, if i have table a and table b.
I write query 1:
SELECT * FROM table a INNER JOIN table b ON table a.id = table b.id
I write query 2:
SELECT * FROM table b INNER JOIN table a ON table b.id = table a. id
What is the difference both of above query?
Thank you
When using INNER JOIN , there is no difference in resultset returned except in order of columns when SELECT * is used i.e. columns are not explicitly mentioned.
SELECT *
FROM table a
INNER JOIN table b
ON table a.id = table b.id
returns columns from tableA followed by columns from tableB
SELECT *
FROM table b
INNER JOIN table a
ON table b.id = table a. id
returns columns from tableB followed by columns from tableA
The second table matches data with the first one.
So it is better to put smaller table on the second place.