How can this query be optimized please? - sql

I have performence issues with the following query :
SELECT A,B,C,D,E,F
FROM TABLE1 T1
INNER JOIN TABLE2 T2
ON (((T1.E IS NULL OR T2.E IS NULL) AND T1.F= T2.F)
OR((T1.E IS NOT NULL OR T2.E IS NOT NULL) AND T1.E = T2.E))
More than 30 min to return about 1000 rows
I've tried this :
SELECT A,B,C,D,E,F
FROM TABLE1 T1
INNER JOIN TABLE2 T2
ON (((COALESCE(T1.E,-1) = COALESCE(T2.E,-1)
AND ((T1.F= T2.F)
OR(T1.E = T2.E)))))
but gives less results than the first one
Can you help me to find another way to write it in oreder to reduce execution time please ?
I'm using SQL Server 2016

Try this:
SELECT A,B,C,D,E,F
FROM TABLE1 T1
INNER JOIN TABLE2 T2 ON T1.F = T2.F
WHERE T1.E IS NULL OR T2.E IS NULL
UNION
SELECT A,B,C,D,E,F
FROM TABLE1 T1
INNER JOIN TABLE2 T2 ON T1.E = T2.E
WHERE COALESCE(T1.E, T2.E) IS NOT NULL
You might want a UNION ALL, but this should match the original.
This also exposes an interesting quirk in the original logic you may want to reconsider. If the E field from one table is NULL, but not the other, the original code would make checks on both the E and F fields. Which is interesting, because for the E field we know one side is null, but the other is not, so that case can't ever be true... but the logic says to still make the comparison.
It's hard to know what you're doing with the generic names, but there's definitely room to clean up that conditional check. Before worrying about matching to your first results, go back and make sure those first results clearly and accurately state what you want to accomplish, even if that means making the query even slower or longer.
Then, only when you are sure you have a query that both produces accurate results and describes them in an understandable way, you can start looking for different or clever ways to express the same logic that might perform better. But if you don't first take the step of better-defining your logic, you won't be able to validate your optimizations and you'll risk quickly producing incorrect data.

Non-equality conditions -- such as OR -- pretty much kill JOIN performance, especially in databases such as SQL Server that do not use indexes in such cases.
I would recommend a two-join approach, but you are going to have to fix the SELECT because it is not clear where the columns come from.
SELECT --A, B, C, D, E, F,
T1.A,
COALESCE(T2_1.B, T2_2.B) as B,
. . .
FROM TABLE1 T1 INNER JOIN
TABLE2 T2_1
ON T2.F = T1.F AND
(T1.E IS NULL OR T2_1.E IS NULL) LEFT JOIN
TABLE2 T2_2
ON T2_2.E = T1.E -- E cannot be NULL
WHERE T2_1.F IS NOT NULL OR T2_2.E IS NOT NULL; -- checks for a match for either condition
Then for performance, you want indexes on TABLE2(F, E) and TABLE2(E).

Statement OR might extremely decrease execution time. Try to get rid of it. Maybe something like this would do:
SELECT A,B,C,D,E,F
FROM TABLE1 T1
LEFT JOIN TABLE2 T2
ON T1.E = T2.E
LEFT JOIN TABLE2 T22
ON T1.F= T22.F
AND T2.E IS NULL
WHERE NOT (T2.E IS NULL AND T22.F IS NULL)

Related

SQL Query Performance Join with condition

calling all sql experts. I have the following select statement:
SELECT 1
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.id
WHERE t1.field = xyz
I'm a little bit worried about the performance here. Is the where clause evaluated before or after the join? If its evaluated after, is there way to first evaluate the where clause?
The whole table could easily contain more than a million entries but after the where clause it may be only 1-10 entries left so in my opinion it really is a big performance difference depending on when the where clause is evaluated.
Thanks in advance.
Dimi
You could rewrite your query like this:
SELECT 1
FROM (SELECT * FROM table1 WHERE field = xyz) t1
JOIN table2 t2 ON t1.id = t2.id
But depending on the database product the optimiser might still decide that the best way to do this is to JOIN table1 to table2 and then apply the constraint.
For this query:
SELECT 1
FROM table1 t1 JOIN
table2 t2
ON t1.id = t2.id
WHERE t1.field = xyz;
The optimal indexes are table1(field, id), table2(id).
How the query is executed depends on the optimizer. It is tasked with choosing the based execution plan, given the table statistics and environment.
Each DBMS has its own query optimizer. So by logic of things in case like yours WHERE will be executed first and then JOINpart of the query
As mentioned in the comments and other answers with performance the answer is always "it depends" depending on your dbms and the indexing of the base tables the query may be fine as is and the optimizer may evaluate the where first. Or the join may be efficient anyway if the indexes cover the join requirements.
Alternatively you can force the behavior you require by reducing the dataset of t1 before you do the join using a nested select as Richard suggested or adding the t1.field = xyz to the join for example
ON t1.field = xyz AND t1.id = t2.id
personally if i needed to reduce the dataset before the join I would use a cte
With T1 AS
(
SELECT * FROM table1
WHERE T1.Field = 'xyz'
)
SELECT 1
FROM T1
JOIN Table2 T2
ON T1.Id = T2.Id

Ora SQL Query: joining without references

I am trying to achieve some logic on Oracle by using simple query and feeling stuck on it. The thing is that I cannot use PL-SQL and this is giving me some headached.
I have three tables with below values
I am trying to get something like:
SELECT T1.CODE,T2.CODE,T3.VALUE
FROM TABLE1 T1
JOIN TABLE2 T2 ON T1.REF = T2.CODE
JOIN TABLE3 T3 ON T2.REF = T3.CODE
WHERE T1.CODE = XXXXX
Result for XXXX = 98
98,2,CCC
Whenever the parameter XXXXX is 99,98,96,95 it returns what I was expecting but the logic I need doesnt work for 97.
My requirement says that in case i cannot find a link in Table2 then I should use always DEF in Table3 and leave unlinked values as NULL. Something like:
Result for XXXX = 97
97,NULL,AAA
I think it could be achieved in a not very "clean" way by using CASE statements but this is an example in which the number of columns shown is very minimal. In my real case it is extremelly bigger... So I want to try to avoid using CASE statements as it would raise the complexity of it a lot.
I tried with different methods but my low experience on Oracle cannot deep so much :)
Any way to achieve this without using PLSQL neither those CASE?
If I'm understanding correctly, you need to use an outer join instead. You can then use COALESCE to return the value associated with "DEF" if T2.REF is NULL:
SELECT T1.CODE,
T2.CODE,
T3.VALUE
FROM TABLE1 T1
LEFT JOIN TABLE2 T2 ON T1.REF = T2.CODE
LEFT JOIN TABLE3 T3 ON COALESCE(T2.REF,'DEF') = T3.CODE
WHERE T1.CODE = XXXXX

Optimization of DB2 query which uses joins and takes 1.5 hours to execute

when i run SELECT stataement on my view it takes around 1.5 hours to run, what can i do to optimize it.
Below is the sample structure of how my view looks like
CREATE VIEW SCHEMANAME.VIEWNAME
{
COL, COL1, COL2, COL3 }
AS SELECT
COST.ETA,
CASE
WHEN VOL.CURR IS NOT NULL
THEN COALESCE {VOL.COMM,0}
END CASE,
CASE
WHEN...
END CASE
FROM TABLE1 t1 inner join TABLE2 t2 ON t1.ETA=t2.ETA
INNER JOIN TABLE3 t3 on t2.ETA=t3.ETA
LEFT OUTER JOIN TABLE4 t4 on t2.ETA=t4.ETA
This is your query:
SELECT COST.ETA,
(CASE WHEN VOL.CURR IS NOT NULL THEN COALESCE {VOL.COMM,0}
END) as ??,
. . .
FROM TABLE1 t1 inner join
TABLE2 t2
ON t1.ETA = t2.ETA INNER JOIN
TABLE3 t3
on t2.ETA = t3.ETA LEFT OUTER JOIN
TABLE4 t4
on t2.ETA = t4.ETA;
First, I will the fact that the select clause references tables that are not in the from clause. I assume this is a typo.
Second, you should be able to use indexes to improve this query: table1(eta), table2(eta),table3(eta), andtable4(eta).
Third, I am highly suspicious on seeing the same column used for joining so many tables. I suspect that you might have cartesian products occurring, because there are multiple values of any given eta in several tables. If that is the case, you need to fix the query to better reflect what you really need. If so, ask another question with sample data and desired results, because your query is probably not correct.

Access Removing CERTAIN PARTS of Duplicates in Union Query

I'm working in Access 2007 and know nothing about SQL and very, very little VBA. I am trying to do a union query to join two tables, and delete the duplicates.
BUT, a lot of my duplicates have info in one entry that's not in the other. It's not a 100% exact duplicate.
Example,
Row 1: A, B, BLANK
Row 2: A, BLANK, C
I want it to MERGE both of these to end up as one row of A, B, C.
I found a similar question on here but I don't understand the answer at all. Any help would be greatly appreciated.
I would suggest a query like this:
select
coalesce(t1.a, t2.a) as a,
coalesce(t1.b, t2.b) as b,
coalesce(t1.c, t2.c) as c
from
table1 t1
inner join table2 t2 on t1.key = t2.key
Here, I have used the keyword coalesce. This will take the first non null value in a list of values. Also note that I have used key to indicate the column that is the same between the two rows. From your example it looks like A but I cannot be sure.
If your first table has all the key values, then you can do:
select t1.a, nz(t1.b, t2.b), nz(t1.c, t2.c) as c
from table1 as t1 left join
table2 as t2
on t1.a = t2.a;
If this isn't the case, you can use this rather arcane looking construct:
select t1.a, nz(t1.b, t2.b), nz(t1.c, t2.c) as c
from table1 as t1 left join
table2 as t2
on t1.a = t2.a
union all
select t2.a, t2.b, t2.c
from table2 as t2
where not exists (select 1 from table1 as t1 where t1.key = t2.key)
The first part of the union gets the rows where there is a key value in the first table. The second gets the rows where the key value is in the second but not the first.
Note this is much harder in Access than in other (dare I say "real") databases. MS Access doesn't support common table expressions (CTEs), unions in subqueries, or full outer join -- all of which would help simplify the query.

How can I speed up MySQL query with multiple joins

Here is my issue, I am selecting and doing multiple joins to get the correct items...it pulls in a fair amount of rows, above 100,000. This query takes more than 5mins when the date range is set to 1 year.
I don't know if it's possible but I am afraid that the user might extend the date range to like ten years and crash it.
Anyone know how I can speed this up? Here is the query.
SELECT DISTINCT t1.first_name, t1.last_name, t1.email
FROM table1 AS t1
INNER JOIN table2 AS t2 ON t1.CU_id = t2.O_cid
INNER JOIN table3 AS t3 ON t2.O_ref = t3.I_oref
INNER JOIN table4 AS t4 ON t3.I_pid = t4.P_id
INNER JOIN table5 AS t5 ON t4.P_cat = t5.C_id
WHERE t1.subscribe =1
AND t1.Cdate >= $startDate
AND t1.Cdate <= $endDate
AND t5.store =2
I am not the greatest with mysql so any help would be appreciated!
Thanks in advance!
UPDATE
Here is the explain you asked for
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t5 ref PRIMARY,C_store_type,C_id,C_store_type_2 C_store_type_2 1 const 101 Using temporary
1 SIMPLE t4 ref PRIMARY,P_cat P_cat 5 alphacom.t5.C_id 326 Using where
1 SIMPLE t3 ref I_pid,I_oref I_pid 4 alphacom.t4.P_id 31
1 SIMPLE t2 eq_ref O_ref,O_cid O_ref 28 alphacom.t3.I_oref 1
1 SIMPLE t1 eq_ref PRIMARY PRIMARY 4 alphacom.t2.O_cid 1 Using where
Also I added an index to table5 rows and table4 rows because they don't really change, however the other tables get around 500-1000 entries a month... I heard you should add an index to a table that has that many new entries....is this true?
I'd try the following:
First, ensure there are indexes on the following tables and columns (each set of columns in parentheses should be a separate index):
table1 : (subscribe, CDate)
(CU_id)
table2 : (O_cid)
(O_ref)
table3 : (I_oref)
(I_pid)
table4 : (P_id)
(P_cat)
table5 : (C_id, store)
Second, if adding the above indexes didn't improve things as much as you'd like, try rewriting the query as
SELECT DISTINCT t1.first_name, t1.last_name, t1.email FROM
(SELECT CU_id, t1.first_name, t1.last_name, t1.email
FROM table1
WHERE subscribe = 1 AND
CDate >= $startDate AND
CDate <= $endDate) AS t1
INNER JOIN table2 AS t2
ON t1.CU_id = t2.O_cid
INNER JOIN table3 AS t3
ON t2.O_ref = t3.I_oref
INNER JOIN table4 AS t4
ON t3.I_pid = t4.P_id
INNER JOIN (SELECT C_id FROM table5 WHERE store = 2) AS t5
ON t4.P_cat = t5.C_id
I'm hoping here that the first sub-select would cut down significantly on the number of rows to be considered for joining, hopefully making the subsequent joins do less work. Ditto the reasoning behind the second sub-select on table5.
In any case, mess with it. I mean, ultimately it's just a SELECT - you can't really hurt anything with it. Examine the plans that are generated by each different permutation and try to figure out what's good or bad about each.
Share and enjoy.
Make sure your date columns and all the columns you are joining on are indexed.
Doing an unequivalence operator on your dates means it checks every row, which is inherently slower than an equivalence.
Also, using DISTINCT adds an extra comparison to the logic that your optimizer is running behind the scenes. Eliminate that if possible.
Well, first, make a subquery to decimate table1 down to just the records you actually want to go to all the trouble of joining...
SELECT DISTINCT t1.first_name, t1.last_name, t1.email
FROM (
SELECT first_name, last_name, email, CU_id FROM table1 WHERE
table1.subscribe = 1
AND table1.Cdate >= $startDate
AND table1.Cdate <= $endDate
) AS t1
INNER JOIN table2 AS t2 ON t1.CU_id = t2.O_cid
INNER JOIN table3 AS t3 ON t2.O_ref = t3.I_oref
INNER JOIN table4 AS t4 ON t3.I_pid = t4.P_id
INNER JOIN table5 AS t5 ON t4.P_cat = t5.C_id
WHERE t5.store = 2
Then start looking at modifying the directionality of the joins.
Additionally, if t5.store is only very rarely 2, then flip this idea around: construct the t5 subquery, then join it back and back and back.
At present, your query is returning all matching rows on table2-table5, just to establish whether t5.store = 2. If any of table2-table5 have a significantly higher row count than table1, this may be greatly increasing the number of rows processed - consequently, the following query may perform significantly better:
SELECT DISTINCT t1.first_name, t1.last_name, t1.email
FROM table1 AS t1
WHERE t1.subscribe =1
AND t1.Cdate >= $startDate
AND t1.Cdate <= $endDate
AND EXISTS
(SELECT NULL FROM table2 AS t2
INNER JOIN table3 AS t3 ON t2.O_ref = t3.I_oref
INNER JOIN table4 AS t4 ON t3.I_pid = t4.P_id
INNER JOIN table5 AS t5 ON t4.P_cat = t5.C_id AND t5.store =2
WHERE t1.CU_id = t2.O_cid);
Try adding indexes on the fields that you join. It may or may not improve the performance.
Moreover it also depends on the engine that you are using. If you are using InnoDB check your configuration params. I had faced a similar problem, as the default configuration of innodb wont scale much as myisam's default configuration.
As everyone says, make sure you have indexes.
You can also check if your server is set up properly so it can contain more of, of maybe the entire, dataset in memory.
Without an EXPLAIN, there's not much to work by. Also keep in mind that MySQL will look at your JOIN, and iterate through all possible solutions before executing the query, which can take time. Once you have the optimal JOIN order from the EXPLAIN, you could try and force this order in your query, eliminating this step from the optimizer.
It sounds like you should think about delivering subsets (paging) or limit the results some other way unless there is a reason that the users need every row possible all at once. Typically 100K rows is more than the average person can digest.