Subquery not in performance question - sql

I have this slow query
select * from table1 where id NOT IN ( select id from table2 )
Would this be faster by doing something like (not sure if this is possible):
select * from table1 where id not in ( select id from table2 where id = table1.id )
Or:
select * from table1 where table1.id NOT EXIST( select id from table2 where table2.id = table1.id )
Or:
select * from table1
left join table2 on table2.id = table1.id
WHERE table2.id is null
Or do something else? Like break it up into two queries ...

The question is - are the field(s) in the comparison nullable (meaning, can the column value be NULL)?
If they're nullable...
...in MySQL the NOT IN or NOT EXISTS perform better - see this link.
If they are NOT nullable...
... LEFT JOIN / IS NULL performs better - see this link.

select table1.* from table1
LEFT JOIN table2 ON table1.id = table2.id
WHERE table2.id IS NULL
The object being to get rid of NOT IN

Related

Print all the data from table1 and matching data from table 2 but what if the table2 is null

As I am trying to learn SQL, I have came across this question.
Let's say I have two tables:
table1 which contains two columns
id and data1
and
table2 which also has two columns
p_id and data2.
How would I print out all the data from table1 and just matching data from table2 in two different scenarios.
Scenario 1: table1.id = 1
Scenario 2: what if the table2 is null, (but I still want to print out all the data from table1)
So my approach was
Scenario 1:
Select table1.id, table1.data1, table2.data2
From table1
Inner Join table2 On table1.id=table2.p_id
Where table1.id = 1;
Scenario 2:
Select table1.id, table1.data1, table2.data2
From table1
Inner Join table2 On table1.id=table2.p_id
Where table1.id = 1 and table2.data2 is not null;
For the second scenario, you are describing a left join:
Select table1.id, table1.data1, table2.data2
From table1 left Join
table2
On table1.id = table2.p_id
where table1.id = 1 ;
Because you are filtering on the first table, the condition remains in the where clause. If you wanted to filter on the second table, you would presumably want something like:
Select table1.id, table1.data1, table2.data2
From table1 left Join
table2
On table1.id = table2.p_id and
table2.col = 'xyz';

Using SELECT * from multiple tables in postgres/redshift

I have 3 tables with 2 common fields for all tables (id, timestamp). I want to do something like this:
SELECT *
FROM table1,
table2,
table3
WHERE id = '123'
and timestamp = '1704'
What I would expect is that it returns all the rows with the id and timestamp matching from the 3 tables. How can the query be modified to achieve this?
select /*your_columns*/
from table1
inner join table2 on table1.id = table2.id and table1.timestamp = table2.timestamp
inner join table3 on table2.id = table3.id and table2.timestamp = table3.timestamp
where table1.id = '123'and table1.timestamp = '1704'
See more here about JOINs

SQL Join tables with empty values

I have 2 tables
Lets say Table1 and Table2
They both have one shared value(id)
What I'm looking for is whether there is any function to combine them both based on that key, however if table2 has more elements, i want columns of table1 to be empty, and if table1 has more elements, table 2 columns to be empty
I tried a lot of different joins, but most of the time I end up with a lot of duplicate values as it tries to fill in both sides.
Tried Full outer join, Full join, etc
You are looking for full join:
select t1.*, t2.*
from t1 full join
t2
on t1.id = t2.id;
The above code from Gordon is right. However, since you have not specified the database and its version, I will post an alternate version for MySQL, which should also work for other databases.
Without duplicates:
SELECT * FROM Table1
LEFT JOIN Table2 ON Table1.id = Table2.id
UNION
SELECT * FROM Table1
RIGHT JOIN Table2 ON Table1.id = Table2.id
With duplicates:
SELECT * FROM Table1
LEFT JOIN Table2 ON Table1.id = Table2.id
UNION ALL
SELECT * FROM Table1
RIGHT JOIN Table2 ON Table1.id = Table2.id

subquery in SELECT without JOIN in Hive?

I know Hive doesn't support this
SELECT (CASE WHEN table1.id in (SELECT table1.id
from table1,table2
where table1.id = table2.id and table2.company like '%My Company%')
THEN table1.email
ELSE regexp_replace(table1.email, substr(table1.email, 1), 'XXXX')
END) as email, table1.id
FROM table1
Hive cannot do SELECT within SELECT (subquery in SELECT).
But let say for some restriction I cannot do JOIN after FROM clause. Is there a "creative" way to do this? I was thinking about parsing and passing a "static list" from SELECT table1.id from table1,table2 where table1.id = table2.id and table2.company like '%My Company%' in a separate query. But this could go up to thousands.
if you could use a select for a join the you could use a left join and check for null value
SELECT case when t1.id is null
then regexp_replace(table1.email, substr(table1.email, 1), 'XXXX')
else table1.email
end
, table1.id
FROM table1
left join (
SELECT table1.id
from table1,table2
where table1.id = table2.id
and table2.company like '%My Company%'
) t on table1.id = t.id

Applying joins conditionally in SQL Server

I have some set of records, but now i have to select only those records from this set which have theeir Id in either of the two tables.
Suppose I have table1 which contains
Id Name
----------
1 Name1
2 Name2
Now I need to select only those records from table one
which have either their id in table2 or in table3
I was trying to apply or operator witin inner join like:
select *
from table1
inner join table2 on table2.id = table1.id or
inner join table3 on table3.id = table1.id.
Is it possible? What is the best method to approach this? Actually I am also not able to use
if exist(select 1 from table2 where id=table1.id) then select from table1
Could someone help me to get over this?
Use left join and then check if at least one of the joins has found a relation
select t1.*
from table1 t1
left join table2 t2 on t2.id = t1.id
left join table3 t3 on t3.id = t1.id
where t2.id is not null
or t3.is is not null
I would be inclined to use exists:
select t1.*
from table1 t1
where exists (select 1 from table2 t2 where t2.id = t1.id) or
exists (select 1 from table3 t3 where t3.id = t1.id) ;
The advantage to using exists (or in) over a join involves duplicate rows. If table2 or table3 have multiple rows for a given id, then a version using join will produce multiple rows in the result set.
I think the most efficient way is to use UNION on table2 and table3 and join to it :
SELECT t1.*
FROM table1 t1
INNER JOIN(SELECT id FROM Table2
UNION
SELECT id FROM Table3) s
ON(t.id = s.id)
Alternatively, you can use below SQL as well:
SELECT *
FROM dbo.Table1
WHERE id Table1.IN ( SELECT table2.id
FROM dbo.table2 )
OR Table1.id IN ( SELECT table3.id
FROM Table3 )