How do I do an EXCEPT clause (like SQL) in Hive QL
I have 2 tables, and each table is a column of unique ids.
I want to find the list of ids that are only in table 1 but not in table 2
Table 1
apple
orange
pear
Table 2
apple
orange
In SQL you can do an EXCEPT clause (http://en.wikipedia.org/wiki/Set_operations_%28SQL%29) but you can't do that in Hive QL
I don't think there's any built-in way to do this but a LEFT OUTER JOIN should do the trick.
This selects all Ids from table1 that do not exist in table2:
SELECT t1.id FROM table1 t1 LEFT OUTER JOIN table2 t2 ON (t1.id=t2.id) WHERE t2.id IS NULL;
We can use NOT EXISTS clause in Hive as MINUS equivalent.
SELECT t1.id FROM t1 WHERE NOT EXISTS (SELECT 1 from t2 WHERE t2.id = t1.id);
1:
select distinct id from table1 where id not in (select distinct id from table2)
2:
select t1.id
from table1 as t1
left join table2 as t2
on t1.id = t2.id
where t2.id is null
Related
I have 3 tables T1, T2, T3 with id column. I want to get the id's from T1 which are not in T2 and T3.
select id from T1 where id not in (select id from T2) and id not in (select id from T3).
But it throws error like Only 1 subquery expression is supported for a single query.
You can use left join:
select T1.id
from T1
left join (select distinct id from T2) T2 on T2.id=T1.id
left join (select distinct id from T3) T3 on T3.id=T1.id
where T2.id is NULL
and T3.id is NULL;
distinct is used in the subqueries to make sure joins will not multiply rows if the id is not unique in T2 and T3.
there is 2 tables
table1
ID
1
2
4
6
7
TABLE2
2
4
6
i want those number from table1 which is not in table2 how i do this ?
i try this
select id from table1 t1
inner join table2 t2 on t1.id=t2.id
where t1.id not in (select id from table2)
but this is not working
SELECT t1.id
FROM table1 t1
LEFT JOIN table2 t2 ON t2.id = t1.id
WHERE t2.id IS NULL
Conceptually, we select all rows from table1 and for each row we attempt to find a row in table2 with the same value for the id column. If there is no such row, we just leave the table2 portion of our result empty for that row. Then we constrain our selection by picking only those rows in the result where the matching row does not exist. Finally, We ignore all fields from our result except for the id column (the one we are sure that exists, from table1).
try this:
select id from table1 t1 where t1.id not in (select t2.id from table2 t2)
You don't need to join the two tables in this case. You could just do
select id from table1 A where A.id not in (select B.id from table2 B);
You could also just simply use the sql set difference EXCEPT operator to achieve this
(select id from table1) except (select id from table2);
Use NOT IN or NOT EXISTS
select id from table1 t1
where t1.id not in (select id from table2)
select id from table1 t1
where not exists (select id from table2 where id = t1.id)
I have some set of records, but now i have to select only those records from this set which have theeir Id in either of the two tables.
Suppose I have table1 which contains
Id Name
----------
1 Name1
2 Name2
Now I need to select only those records from table one
which have either their id in table2 or in table3
I was trying to apply or operator witin inner join like:
select *
from table1
inner join table2 on table2.id = table1.id or
inner join table3 on table3.id = table1.id.
Is it possible? What is the best method to approach this? Actually I am also not able to use
if exist(select 1 from table2 where id=table1.id) then select from table1
Could someone help me to get over this?
Use left join and then check if at least one of the joins has found a relation
select t1.*
from table1 t1
left join table2 t2 on t2.id = t1.id
left join table3 t3 on t3.id = t1.id
where t2.id is not null
or t3.is is not null
I would be inclined to use exists:
select t1.*
from table1 t1
where exists (select 1 from table2 t2 where t2.id = t1.id) or
exists (select 1 from table3 t3 where t3.id = t1.id) ;
The advantage to using exists (or in) over a join involves duplicate rows. If table2 or table3 have multiple rows for a given id, then a version using join will produce multiple rows in the result set.
I think the most efficient way is to use UNION on table2 and table3 and join to it :
SELECT t1.*
FROM table1 t1
INNER JOIN(SELECT id FROM Table2
UNION
SELECT id FROM Table3) s
ON(t.id = s.id)
Alternatively, you can use below SQL as well:
SELECT *
FROM dbo.Table1
WHERE id Table1.IN ( SELECT table2.id
FROM dbo.table2 )
OR Table1.id IN ( SELECT table3.id
FROM Table3 )
I have a problem joining two tables:
table1
id name
1 aaa
2 bbb
3 ccc
table2
id table1_id name
1 1 x1
2 1 x2
3 2 s1
table1 is the main table, table2 contains attributes.
I need to join and search both tables, but display distinct results from first table.
When using JOIN I get multiple results from table2.
The scenario is I need to search main table TABLE1 and ALL ATTRIBUTES in TABLE2 and return if found
select distinct(name) from table1 inner join table2 on table1.id = table2.table1_id where table2.name = x2;
Should do the trick.
If you need entries which exists in both tables:
SELECT * from Table1 t1
WHERE YourConditionsHere
AND EXISTS (SELECT 1 from Table2 t2
WHERE t1.Id = t2.Table1_id
AND YourConditionsHere)
if you need entries from Table1 for which does not exists enteries in Table2
SELECT * from Table1 t1
LEFT JOIN
(SELECT * from Table2
WHERE YourConditionsHere
) t2
ON (t1.Id = t2.Table1_id)
WHERE YourConditionsHereForTable1
another option
select * from table1 t1 where t1.id in (select table1_id from table2 t2 where t2.name = "x1");
it's probably best to check query plains (i.e. EXPLAIN) for all suggested queries and check the one that performs best for your exact scenario.
consider the following example.
I have to select all the records from table1 which exist in table2 also + all the records from table2 which don't exist in table1 but exist in table2 and have IsActive=1 and status not null.
I initially tried it with a join but how to do the later part where I have to select the records which don't exist in table 1 ? I have to do it inside a single query presumably with a SQL view.
Edit
I need to combine the results like a UNION of 2 tables, so incase of rows absent in table1 but present in table2, the columns of belonging to table1 would be blank.
Here's an example query:
select *
from Table2 t2
left join
Table1 t1
on t1.id = t2.id
where t1.id is not null
or (isActive = 1 and status is not null)
The first line of the where clause takes care of "all the records from table1 which exist in table2". The second line is for "don't exist in table1 but exist in table2 and have IsActive=1 and status not null".
You will need an outer join here.
http://msdn.microsoft.com/en-us/library/ms187518.aspx
Is this it? Not sure if I got right what you want to do.
SELECT
*
FROM
Table1 t1
JOIN
Table2 t2 ON (t1.ID = t2.ID OR (t1.ID IS NULL AND t2.isActive = 1 AND t2.Status IS NOT NULL))