Select in where condition is slow - sql

It appears to me that a select statement (that is very fast on its own) is slower than writing the condition explicitly as a string. Here is a dummy example:
The fast version:
select *
FROM db.dbo.A
left join db.dbo.B on A.id = B.id
where A.selected_variable in ('XXX','YYY')
The slow version:
select
selected_variable
into #t_temp
from db.dbo.some_table
where some_condition = 'X'
select *
FROM db.dbo.A
left join db.dbo.B on A.id = B.id
where A.selected_variable in (select selected_variable from #t_temp) -- returns ('XXX','YYY')
Does someone knows why and what would be best practice to avoid the performance drop?

I think the performance perspective EXISTS is better than the IN clause. Since IN clause internally convert to OR conditions(resulting in more conditions)
Try this script and compare the query execution plan and execution time
select *
FROM db.dbo.A
left join db.dbo.B on A.id = B.id
where EXISTS (select 1
from db.dbo.some_table t
WHERE A.selected_variable = t.selected_variable AND some_condition = 'X')

You can try the below version, which avoids additional TempDB IO ( which is reducing the performance).
Also, see if the selected_variable, some_condition in some_table is having proper indexes.
SELECT A.*
FROM dbo.A AS A
INNER JOIN DBO.some_table AS s
ON A.Selected_Variable = s.selected_variable
LEFT JOIN dbo.B AS B ON A.id = B.id
WHERE s.some_condition = 'X'

Related

Query Efficiency in Redshift

I have a question about query efficiency in Redshift. I have two sample queries as below
Query A
select a.*, b.*
from a
left outer join b
on a.id=b.id
where a.market_id = 1
and a.dataset_date = to_date('{RUN_DATE_YYYY-MM-DD}', 'YYYY-MM-DD')
and b.market_id = 1
and b.user_group in ('X');
Query B
select a.*, b.*
from (
select *
from a
where marketplace_id = 1
and dataset_date = to_date('{RUN_DATE_YYYY-MM-DD}', 'YYYY-MM-DD')
)a
left outer join
(select *
from b
where market_id = 1
and user_group in ('X')
)b
on a.id=b.id
where b.market_id = 1
and b.user_group in ('X');
I used to think that Query B would be more efficient as the dataset used were filtered and smaller. But recently I found that Redshift would do some query optimization automatically before it runs each query. In this case, the efficiency of Query A and B should be very close, and Query A is easier to be understood and maintained.
I'm not sure about the Redshift optimization thing, so post this question here. Really appreciate for any knowledge shared!

SQL (sybase) query using TOP N performs very badly when inserted into table

I have a performance issue using SYBASE ASE when you try to insert the following code into a table or temp table:
INSERT INTO #temp (Id)
SELECT TOP 100 a.Id
FROM TableA a
INNER JOIN TableB b ON a.Id = b.Id
WHERE a.SomeColumn = 'blah' and b.SomeColumn = 'Blah'
ORDER BY a.Id
The WHERE clause isn't that important...the important thing is that the SELECT query runs in a split second on it's own, but as soon as you try to insert it into a table, it take 2 mintutes!!!!
Looking at the query plan, the optimiser does not seem to take into account that the estimated rows should be 100 and does a table scan of TableB. The select statement on it's own seems to render a sensible plan where the TOP 100 is taken into account, but the insert seems to make the optimiser take a very inefficient route. Have tried many permutations to this query and to no avail. Tables A and B are very large and the TOP N is a must. Have also tried set rowcount 100 and same result.
Can anyone suggest a work around for this?
Thanks
Have you tried:
INSERT INTO #temp (Id)
SELECT * FROM
(
SELECT TOP 100 a.Id
FROM TableA a
INNER JOIN TableB b ON a.Id = b.Id
WHERE a.SomeColumn = 'blah' and b.SomeColumn = 'Blah'
ORDER BY a.Id
)
It may produce a slightly different execution plan.
Why dont you try this and see if you still have issues. Technically, the insert into also should have been quite fast if the select statement itself is fast.
--INSERT INTO #temp (Id)
SELECT TOP 100 a.Id
INTO #temp
FROM TableA a
INNER JOIN TableB b ON a.Id = b.Id
WHERE a.SomeColumn = 'blah' and b.SomeColumn = 'Blah'
ORDER BY a.Id

Is it possible to use IF or CASE in sql FROM statement

I have a long stored procedure and I would like to make a slight modification to the procedure without having to create a new one(for maintenance purposes).
Is it possible to use a IF or CASE in the FROM statement of the select statement to join other tables?
Like this:
from tableA a
join tableB b a.indexed = c.indexed
IF #Param='Y'
BEGIN
join tableC c a.indexed = c.indexed
END
It didn't seem to work for me. But I am wondering if this is even possible and/or if this even makes sense to do.
Thanks.
No, it is not possible. You can only accomplish this through the use of dynamic SQL.
The Curse and Blessings of Dynamic SQL
An Intro to Dynamic SQL
I would not advise using Dynamic SQL, there are most likely better ways to perform this operation but you would have to provide more info.
You can achieve something like it if you have a left outer join
Consider
declare #param bit = 1
select a.*, b.*, c.* from a
inner join b on a.id = b.a_id
left outer join c on b.id = c.b_id and #param = 1
This will return all columns from a, b, c.
Now try with
declare #param bit = 0
This will return all columns from a and b, and nulls for columns of c.
It won't work if both joins are inner.
No this is not possible. Your best bet would probably be to select from both tables and only include the data your care about. If you provide an example of what you are trying to do I can provide a better answer.
Attempt at an example:
SELECT t1.id, COALESCE(t2.name, t3.name)
FROM Table1 as t1
LEFT JOIN Table2 as t2
ON t1.id = t2.id
LEFT JOIN Table2 as t3
ON t1.id = t3.id
While what you proposed is not possible, you can play with your where conditions:
from tableA a
inner join tableB b ON a.indexed = c.indexed
left join tableC c ON a.indexed = c.indexed AND 1 = CASE #Param WHEN 'Y' THEN 1 ELSE 0 END
More performant would be to just doing a big
IF #Param='Y' THEN
from tableA a
inner join tableB b ON a.indexed = c.indexed
ELSE
from tableA a
inner join tableB b ON a.indexed = c.indexed
left join tableC c ON a.indexed = c.indexed
You haven't revealed you SELECT clause. The essence of what you want is as follows:
SELECT indexed
FROM tableA
INTERSECT
SELECT indexed
FROM tableB
INTERSECT
SELECT indexed
FROM tableC
WHERE #Param = 'Y'
Then use this table expression as dictated by your SELECT clause e.g. say you only want to project tableA:
WITH T
AS
(
SELECT indexed
FROM tableA
INTERSECT
SELECT indexed
FROM tableB
INTERSECT
SELECT indexed
FROM tableC
WHERE #Param = 'Y'
)
SELECT *
FROM tableA
WHERE indexed IN ( SELECT indexed FROM T );

Postgresql: alternative to WHERE IN respective WHERE NOT IN

I have several statements which access very large Postgresql tables i.e. with:
SELECT a.id FROM a WHERE a.id IN ( SELECT b.id FROM b );
SELECT a.id FROM a WHERE a.id NOT IN ( SELECT b.id FROM b );
Some of them even access even more tables in that way. What is the best approach to increase the performence, should I switch i.e. to joins?
Many thanks!
JOIN will be far more efficient, or you can use EXISTS:
SELECT a.id FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.id = a.id)
The subquery will return at most 1 row.
Here's a way to filter rows with an INNER JOIN:
SELECT a.id
FROM a
INNER JOIN b ON a.id = b.id
Note that each version can perform differently; sometimes IN is faster, sometimes EXISTS, and sometimes the INNER JOIN.
Yes, i would recomend going to joins. It will speed up the select statements.

How do I find records that are not joined?

I have two tables that are joined together.
A has many B
Normally you would do:
select * from a,b where b.a_id = a.id
To get all of the records from a that has a record in b.
How do I get just the records in a that does not have anything in b?
select * from a where id not in (select a_id from b)
Or like some other people on this thread says:
select a.* from a
left outer join b on a.id = b.a_id
where b.a_id is null
select * from a
left outer join b on a.id = b.a_id
where b.a_id is null
The following image will help to understand SQL LET JOIN :
Another approach:
select * from a where not exists (select * from b where b.a_id = a.id)
The "exists" approach is useful if there is some other "where" clause you need to attach to the inner query.
SELECT id FROM a
EXCEPT
SELECT a_id FROM b;
You will probably get a lot better performance (than using 'not in') if you use an outer join:
select * from a left outer join b on a.id = b.a_id where b.a_id is null;
SELECT <columnns>
FROM a WHERE id NOT IN (SELECT a_id FROM b)
In case of one join it is pretty fast, but when we are removing records from database which has about 50 milions records and 4 and more joins due to foreign keys, it takes a few minutes to do it.
Much faster to use WHERE NOT IN condition like this:
select a.* from a
where a.id NOT IN(SELECT DISTINCT a_id FROM b where a_id IS NOT NULL)
//And for more joins
AND a.id NOT IN(SELECT DISTINCT a_id FROM c where a_id IS NOT NULL)
I can also recommended this approach for deleting in case we don't have configured cascade delete.
This query takes only a few seconds.
The first approach is
select a.* from a where a.id not in (select b.ida from b)
the second approach is
select a.*
from a left outer join b on a.id = b.ida
where b.ida is null
The first approach is very expensive. The second approach is better.
With PostgreSql 9.4, I did the "explain query" function and the first query as a cost of cost=0.00..1982043603.32.
Instead the join query as a cost of cost=45946.77..45946.78
For example, I search for all products that are not compatible with no vehicles. I've 100k products and more than 1m compatibilities.
select count(*) from product a left outer join compatible c on a.id=c.idprod where c.idprod is null
The join query spent about 5 seconds, instead the subquery version has never ended after 3 minutes.
Another way of writing it
select a.*
from a
left outer join b
on a.id = b.id
where b.id is null
Ouch, beaten by Nathan :)
This will protect you from nulls in the IN clause, which can cause unexpected behavior.
select * from a where id not in (select [a id] from b where [a id] is not null)