performance comparison on coalesce vs 'is null' - sql

select * from zzz t1
INNER JOIN yyy T2
on (T1.col1 = T2.col1 )
and (T1.col2 = T2.col2 or (T1.col2 is null and T2.col2 is null ) )
VS
select * from zzz t1
INNER JOIN yyy T2
on (T1.col1 = T2.col1 )
and (coalesce(T1.col2, '\0') = coalesce(T2.col2, '\0'))
Or if there's a third, better, way to do this I'd appreciate that too. This is something I find myself constantly doing because half of our databases allow nulls and half of them don't so comparisons suck.
The only thing is I need to avoid things that aren't standard, I use too many different dbs and I'm trying hard to get away from anything that isn't supported by all of them unless the performance gain is utterly magical. (db2, oracle, sqlserver are the main ones I'm using)

If a database has NULL safe comparisons, then the best approach is:
select *
from zzz t1 join
yyy T2
on (T1.col1 = T2.col1 ) and
(T1.col2 is not distinct from T2.col2);
(This is supported by DB2 but not Oracle or SQL Server.)
Otherwise, I think your two versions are going to be equivalent in most databases at least. The use of functions/or limits the ability to use indexes for col2.

Related

How can this query be optimized please?

I have performence issues with the following query :
SELECT A,B,C,D,E,F
FROM TABLE1 T1
INNER JOIN TABLE2 T2
ON (((T1.E IS NULL OR T2.E IS NULL) AND T1.F= T2.F)
OR((T1.E IS NOT NULL OR T2.E IS NOT NULL) AND T1.E = T2.E))
More than 30 min to return about 1000 rows
I've tried this :
SELECT A,B,C,D,E,F
FROM TABLE1 T1
INNER JOIN TABLE2 T2
ON (((COALESCE(T1.E,-1) = COALESCE(T2.E,-1)
AND ((T1.F= T2.F)
OR(T1.E = T2.E)))))
but gives less results than the first one
Can you help me to find another way to write it in oreder to reduce execution time please ?
I'm using SQL Server 2016
Try this:
SELECT A,B,C,D,E,F
FROM TABLE1 T1
INNER JOIN TABLE2 T2 ON T1.F = T2.F
WHERE T1.E IS NULL OR T2.E IS NULL
UNION
SELECT A,B,C,D,E,F
FROM TABLE1 T1
INNER JOIN TABLE2 T2 ON T1.E = T2.E
WHERE COALESCE(T1.E, T2.E) IS NOT NULL
You might want a UNION ALL, but this should match the original.
This also exposes an interesting quirk in the original logic you may want to reconsider. If the E field from one table is NULL, but not the other, the original code would make checks on both the E and F fields. Which is interesting, because for the E field we know one side is null, but the other is not, so that case can't ever be true... but the logic says to still make the comparison.
It's hard to know what you're doing with the generic names, but there's definitely room to clean up that conditional check. Before worrying about matching to your first results, go back and make sure those first results clearly and accurately state what you want to accomplish, even if that means making the query even slower or longer.
Then, only when you are sure you have a query that both produces accurate results and describes them in an understandable way, you can start looking for different or clever ways to express the same logic that might perform better. But if you don't first take the step of better-defining your logic, you won't be able to validate your optimizations and you'll risk quickly producing incorrect data.
Non-equality conditions -- such as OR -- pretty much kill JOIN performance, especially in databases such as SQL Server that do not use indexes in such cases.
I would recommend a two-join approach, but you are going to have to fix the SELECT because it is not clear where the columns come from.
SELECT --A, B, C, D, E, F,
T1.A,
COALESCE(T2_1.B, T2_2.B) as B,
. . .
FROM TABLE1 T1 INNER JOIN
TABLE2 T2_1
ON T2.F = T1.F AND
(T1.E IS NULL OR T2_1.E IS NULL) LEFT JOIN
TABLE2 T2_2
ON T2_2.E = T1.E -- E cannot be NULL
WHERE T2_1.F IS NOT NULL OR T2_2.E IS NOT NULL; -- checks for a match for either condition
Then for performance, you want indexes on TABLE2(F, E) and TABLE2(E).
Statement OR might extremely decrease execution time. Try to get rid of it. Maybe something like this would do:
SELECT A,B,C,D,E,F
FROM TABLE1 T1
LEFT JOIN TABLE2 T2
ON T1.E = T2.E
LEFT JOIN TABLE2 T22
ON T1.F= T22.F
AND T2.E IS NULL
WHERE NOT (T2.E IS NULL AND T22.F IS NULL)

SQL Query Performance Join with condition

calling all sql experts. I have the following select statement:
SELECT 1
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.id
WHERE t1.field = xyz
I'm a little bit worried about the performance here. Is the where clause evaluated before or after the join? If its evaluated after, is there way to first evaluate the where clause?
The whole table could easily contain more than a million entries but after the where clause it may be only 1-10 entries left so in my opinion it really is a big performance difference depending on when the where clause is evaluated.
Thanks in advance.
Dimi
You could rewrite your query like this:
SELECT 1
FROM (SELECT * FROM table1 WHERE field = xyz) t1
JOIN table2 t2 ON t1.id = t2.id
But depending on the database product the optimiser might still decide that the best way to do this is to JOIN table1 to table2 and then apply the constraint.
For this query:
SELECT 1
FROM table1 t1 JOIN
table2 t2
ON t1.id = t2.id
WHERE t1.field = xyz;
The optimal indexes are table1(field, id), table2(id).
How the query is executed depends on the optimizer. It is tasked with choosing the based execution plan, given the table statistics and environment.
Each DBMS has its own query optimizer. So by logic of things in case like yours WHERE will be executed first and then JOINpart of the query
As mentioned in the comments and other answers with performance the answer is always "it depends" depending on your dbms and the indexing of the base tables the query may be fine as is and the optimizer may evaluate the where first. Or the join may be efficient anyway if the indexes cover the join requirements.
Alternatively you can force the behavior you require by reducing the dataset of t1 before you do the join using a nested select as Richard suggested or adding the t1.field = xyz to the join for example
ON t1.field = xyz AND t1.id = t2.id
personally if i needed to reduce the dataset before the join I would use a cte
With T1 AS
(
SELECT * FROM table1
WHERE T1.Field = 'xyz'
)
SELECT 1
FROM T1
JOIN Table2 T2
ON T1.Id = T2.Id

SQL select returns extra result

Guess there are similar questions and the answere might is easy but I cant help my self and thats why I ask you guys.
I have some Data in a DB (Centura/Gupta SQLBase 7)
no Left/Right Join possible - obviously not implemented in sqlbase sql
Here is my select
SELECT
I.IARTNR,
L.ARTNAME
FROM
INVENTUR I,
LAGER L
WHERE
L.ARTSTR = I.IARTNR
AND
I.AB = '2015-81';
returns 20 rows, not 18 as expacted.
INVENTUR rows with AB set to 2015-81 are 18 and in LAGER there are <3000 rows. What I'm trying to do is select all articles von INVENTUR and add the article name thats written in LAGER.
Whats wrong with my select ? Running this "mysterious" since 3 days.
ANSI join syntax for Outer / Inner joins was added in v8.5 onwards ( now upto v12.1 ).
Before v8.5 , you can use the native Gupta Outer / Inner join syntax e.g.
SELECT t1.col1,t2.col1,t1.col2,t2.col2
FROM t1,t2
WHERE t1.col1 = t2.col1(+)
AND t1.col2 = t2.col2(+)
The next example lists customer names and their order numbers,including customers who have made no orders:
SELECT CUSTOMER.CUSTNO,NAME
FROM CUSTOMER,ORDERS
WHERE CUSTOMER.CUSTNO = ORDERS.CUSTNO(*)
The same query using ANSI syntax in SQLBase v8.5 onmwards is:
SELECT CUSTOMER.CUSTNO,NAME
FROM CUSTOMER LEFT OUTER JOIN ORDERS ON CUSTOMER.CUSTNO = ORDERS.CUSTNO
Use explicit joins.
SELECT I.IARTNR, L.ARTNAME
FROM INVENTUR I
INNER JOIN LAGER L ON I.IARTNR = L.ARTSTR
WHERE I.AB = '2015-81';
And if needs be DISTINCT.
SELECT DISTINCT I.IARTNR, L.ARTNAME
FROM INVENTUR I
INNER JOIN LAGER L ON I.IARTNR = L.ARTSTR
WHERE I.AB = '2015-81';
Of course SQLBase has inner / outer joins !
Either Native syntax ( using (+) ) or ANSI .
Here's the syntax:
NATIVE:
SELECT t1.col1, t2.col1, t1.col2, t2.col2
FROM t1, t2
WHERE t1.col1 = t2.col1 (+) AND t1.col2 = t2.col2 (+);
ANSI:
SELECT t1.col1, t2.col1, t1.col2, t2.col2
FROM t2 RIGHT OUTER JOIN t1
ON t1.col1 = t2.col1
AND t1.col2 = t2.col2 ;
p.s. SQLBase is no 'weird' database . v12 recently released will outstrip SQLServer every time in terms of performance , footprint and overall cost of ownership. Please be more aware of your facts before broadcasting nonsense.

Most Efficiently Written Query in SQL

I have two tables, Table1 and Table2 and am trying to select values from Table1 based on values in Table2. I am currently writing my query as follows:
SELECT Value From Table1
WHERE
(Key1 in
(SELECT KEY1 FROM Table2 WHERE Foo = Bar))
AND
(Key2 in
(SELECT KEY2 FROM Table2 WHERE Foo = Bar))
This seems a very inefficent way to code the query, is there a better way to write this?
It depends on how the table(s) are indexed. And it depends on what SQL implementation you're using (SQL Server? MySq1? Oracle? MS Access? something else?). It also depends on table size (if the table(s) are small, a table scan may be faster than something more advanced). It matters, too, whether or not the indices are covering indices (meaning that the test can be satisfied with data in the index itself, rather than requiring an additional look-aside to fetch the corresponding data page.) Unless you look at the execution plan, you can't really say that technique X is "better" than technique Y.
However, in general, for this case, you're better off using correlated subqueries, thus:
select *
from table1 t1
where exists( select *
from table2 t2
where t2.key1 = t1.key1
)
and exists( select *
from table2 t2
where t2.key2 = t1.key2
)
A join is a possibility, too:
select t1.*
from table1 t1
join table2 t2a = t2a.key1 = t1.key1 ...
join table2 t2b = t2b.key2 = t1.key2 ...
though that will give you 1 row for every matching combination, though that can be alleviated by using the distinct keyword. It should be noted that a join is not necessarily more efficient than other techniques. Especially, if you have to use distinct as that requires additional work to ensure distinctness.

How to improve performance of SQL query with parameters?

I'm using SQL Server 2005.
I have a problems with executing SQL statements like this
DECLARE #Param1 BIT
SET #Param1 = 1
SELECT
t1.Col1,
t1.Col2
FROM
Table1 t1
WHERE
#Param1=0 OR
(t1.Col2 in
(SELECT t2.Col4
FROM
Table2 t2
WHERE
t2.Col1 = t1.Col1 AND
t2.Col2 = 'AAA' AND
t2.t3 <> 0)
)
This query executes very long time.
But if I replace #Param1 with 1, than query execution time is ~2 seconds.
Any information how to resolve the problem would be greatly appreciated.
Well, the explanation seems simple enough. For your current condition, since #Param1=0 is false (you set the parameter to 1 previously), it needs to evaluate your second condition, wich has a subquery and might take a long time. If you change your first filter to #Param1=1, then you are saying that it is true and there is no need to evaluate your second filter, hence making your query faster.
You seem to be confusing the optimiser with your OR statement. If you remove it, you should find it generates two different execution plans for the SELECT statement - one with a filter, and the other without:
DECLARE #Param1 BIT
SET #Param1 = 1
if #Param1=0
begin
SELECT
t1.Col1,
t1.Col2
FROM
Table1 t1
end
else
begin
SELECT
t1.Col1,
t1.Col2
FROM
Table1 t1
WHERE
(t1.Col2 in
(SELECT t2.Col4
FROM
Table2 t2
WHERE
t2.Col1 = t1.Col1 AND
t2.Col2 = 'AAA' AND
t2.t3 <> 0)
)
end
This is commonly referred to as the N+1 problem. You're doing a select in table 1 and for each record you find you will go look for something in table 2.
By setting your #Param1 to an value which will never be found in your select the sql engine will skip the subquery.
To avoid this behavior you could use a JOIN statement to join both tables together and afterwards filter the results with a where statement. The join statement will be a bit slower then a single subquery because you are matching 2 tables to eachother but because you only need to do the join once(vs N times) you'll gain a serious performance boost.
Example code :
DECLARE #Param1 BIT
SET #Param1 = 1
SELECT t1.Col1,t1.Col2
FROM Table1 t1
INNER JOIN Table2 t2 on t1.Col1 = t2.Col1
WHERE #Param1=0
OR t2.Col2 = 'AAA'
AND t2.t3 <> 0