Hive - only 1 subquery expression is supported - hive

I have 3 tables T1, T2, T3 with id column. I want to get the id's from T1 which are not in T2 and T3.
select id from T1 where id not in (select id from T2) and id not in (select id from T3).
But it throws error like Only 1 subquery expression is supported for a single query.

You can use left join:
select T1.id
from T1
left join (select distinct id from T2) T2 on T2.id=T1.id
left join (select distinct id from T3) T3 on T3.id=T1.id
where T2.id is NULL
and T3.id is NULL;
distinct is used in the subqueries to make sure joins will not multiply rows if the id is not unique in T2 and T3.

Related

Seeking suggestion about Left join

I have two tables t1 & t2. In t1, there are 1641787 records. In t2, there are 33176007 records. I want to take two columns from table2 and keep everything of t1. When I use left join with t1 to t2, I got more records than t1. I would like to get a similar number of records as t1 after joining. Please give me a suggestion. Here is my code:
SELECT t1.*,
t2.City
FROM t1
LEFT JOIN
t2
ON t1.ID = t2.ID;
You can aggregate and choose an arbitrary value:
select t1.*, t2.city
from t1 left join
(select t2.id, any_value(t2.city) as city
from t2
group by t2.id
) t2
on t1.id = t2.id;

select a columns not exist in another tables multiple columns in sql

I have two tables Tables1 with ID, name, and Table2 has ID1, ID2, and ID3, name1, name2, and name3.
I want to select table1.ID not exists in tables2: ID1, ID2, and ID3
select T1.ID,t1.name
from table1 t1
where not exists (
SELECT *
FROM table2 t2
where t1.ID=t2.ID1 or t1.ID=t2.ID2 or or t1.ID=t2.ID3 )
I get error message for this query
After a little research, I could find this. Basically, you can't have multiple columns for a subquery in a IN or a NOT IN condition in the WHERE clause. This is why your query is currently failing : Your subquery get all the columns from Table2.
From my understanding of your question, you want a select where the results would be the elements that are not existing in Table2.
To do this, you can simply use a LEFT OUTER JOIN. In SQL, I would left join on all three columns, but it seems like Hive does not support multiple conditions in JOIN statements, so you can use the following alternative :
SELECT T1.ID, T1.name
FROM Table1 T1
LEFT JOIN Table2 T2_1 ON T2_1.ID1 = T1.ID
LEFT JOIN Table2 T2_2 ON T2_2.ID2 = T1.ID
WHERE (T2_1.id IS NULL) AND -- the id of Table2 - T2_1
(T2_2.id IS NULL) -- the id of Table2 - T2_2
Just add as many LEFT JOIN and conditions in the WHERE clause as you have columns to check.
Here's a fiddle with this query concept (the data are not the same, but the concept is).
Join first by ID1, then resulted dataset join by ID2, then resulted dataset join by ID3:
select p2.ID, p2.name --pass3
from
(select p1.ID, p1.name --pass2
from
(SELECT T1.ID, T1.name --pass1
FROM Table1 T1
LEFT JOIN Table2 T2 ON T2.ID1 = T1.ID
where T2.ID1 is null --not in ID1
) p1 LEFT JOIN Table2 T2 ON T2.ID2 = p1.ID
where T2.ID1 is null --also not in ID2
) p2 LEFT JOIN Table2 T2 ON T2.ID3 = p2.ID
where T2.ID1 is null --also not in ID3
Joins on 2 and 3 steps will receive already reduced dataset from T1 and this solution may be good for big tables.
SELECT DISTINCT ID,NAME
FROM
(SELECT T1.ID, T1.name
FROM Table1 T1
LEFT OUTER JOIN Table2 T2 ON T2.ID1 = T1.ID
where T2.ID1 is null
union
SELECT T1.ID, T1.name
FROM Table1 T1
LEFT OUTER JOIN Table2 T2 ON T2.ID2 = T1.ID
where T2.ID2 is null
union
SELECT T1.ID, T1.name
FROM Table1 T1
LEFT OUTER JOIN Table2 T2 ON T2.ID3 = T1.ID
where T2.ID3 is null)JO

Convert to join query

select t.* from table1 t where t.id NOT IN(
select Id from t2 where usrId in
(select usrId from t3 where sId=value));
I the result i need is like if there are matching id's in t1 and t2 then those id's should be omitted and only the remaining rows should be given to me. I tried converting into join but it is giving me the result i wanted. Below is my join query.
SELECT t.* FROM table1 t JOIN table2 t2 ON t.Id <> t2.Id
JOIN table3 t3 ON t3.Id=t2.Id WHERE t3.sId= :value
This doesn't feth me the correct result. it was returning all the rows, but i want to restrict the result based on the matching id's in table t1 and table t2. Matching id's should be ommited from the result.I will be passing the value for sId.
I believe this to be an accurate refactor of your query using joins. I don't know if we can do away with the subquery, but in any case the logic appears to be the same.
select t1.*
from table1 t1
left join
(
select t2.Id
from table2 t2
inner join table3 t3
on t2.usrId = t3.usrId
where t3.sId = <value>
) t2
on t1.Id = t2.Id
where t2.Id is null
Let's break down and solve problem step by step.
So your query
select t.* from table1 t where t.id NOT IN(
select Id from t2 where usrId in
(select usrId from t3 where sId=value));
on converting the inner query to JOIN will yield
select t.* from table1 t where t.id NOT IN
(SELECT T2.ID FROM T2 JOIN T3 on T2.UsrID =T3.UsrID and T3.sID=value)
which on further converting to JOIN with outer table will be
select t.* from table1 t LEFT JOIN
(SELECT T2.ID FROM T2 JOIN T3 on T2.UsrID =T3.UsrID and T3.sID=value)t4
ON t.id =T4.ID
WHERE t4.ID is NULL
In case you completely want to remove sub-query you can try like this
SELECT t.*
FROM table1 t
LEFT JOIN T2
ON T.ID=T2.ID
LEFT JOIN T3
ON T3.UsrId=T2.UsrID AND T3.sId=value
WHERE T3.UsrID IS NULL

not in operator in SQL

there is 2 tables
table1
ID
1
2
4
6
7
TABLE2
2
4
6
i want those number from table1 which is not in table2 how i do this ?
i try this
select id from table1 t1
inner join table2 t2 on t1.id=t2.id
where t1.id not in (select id from table2)
but this is not working
SELECT t1.id
FROM table1 t1
LEFT JOIN table2 t2 ON t2.id = t1.id
WHERE t2.id IS NULL
Conceptually, we select all rows from table1 and for each row we attempt to find a row in table2 with the same value for the id column. If there is no such row, we just leave the table2 portion of our result empty for that row. Then we constrain our selection by picking only those rows in the result where the matching row does not exist. Finally, We ignore all fields from our result except for the id column (the one we are sure that exists, from table1).
try this:
select id from table1 t1 where t1.id not in (select t2.id from table2 t2)
You don't need to join the two tables in this case. You could just do
select id from table1 A where A.id not in (select B.id from table2 B);
You could also just simply use the sql set difference EXCEPT operator to achieve this
(select id from table1) except (select id from table2);
Use NOT IN or NOT EXISTS
select id from table1 t1
where t1.id not in (select id from table2)
select id from table1 t1
where not exists (select id from table2 where id = t1.id)

Hive QL Except clause

How do I do an EXCEPT clause (like SQL) in Hive QL
I have 2 tables, and each table is a column of unique ids.
I want to find the list of ids that are only in table 1 but not in table 2
Table 1
apple
orange
pear
Table 2
apple
orange
In SQL you can do an EXCEPT clause (http://en.wikipedia.org/wiki/Set_operations_%28SQL%29) but you can't do that in Hive QL
I don't think there's any built-in way to do this but a LEFT OUTER JOIN should do the trick.
This selects all Ids from table1 that do not exist in table2:
SELECT t1.id FROM table1 t1 LEFT OUTER JOIN table2 t2 ON (t1.id=t2.id) WHERE t2.id IS NULL;
We can use NOT EXISTS clause in Hive as MINUS equivalent.
SELECT t1.id FROM t1 WHERE NOT EXISTS (SELECT 1 from t2 WHERE t2.id = t1.id);
1:
select distinct id from table1 where id not in (select distinct id from table2)
2:
select t1.id
from table1 as t1
left join table2 as t2
on t1.id = t2.id
where t2.id is null