I dont understand whats functionally different about these 2 queries that would make them so different. First my initial query:
SELECT * FROM XSales_Code SC
WHERE SC.Status = 1
AND SC.SCode NOT IN
(
SELECT DISTINCT SCode FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
AND SCode IS NOT NULL
)
AND SC.Last_Mdt < '2014-01-01'
ORDER BY Last_Mdt desc
This took 13 minutes and 6 seconds to execute. Since I'm used to simple queries like this taking several seconds rather then several minutes I played around with and made this query which is, at least in my eyes, equivalent:
SELECT DISTINCT SCode INTO #TEMP1 FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
AND SCode IS NOT NULL
SELECT * FROM XSales_Code SC
WHERE SC.Status = 1
AND SC.SCode NOT IN
(
SELECT Scode FROM #TEMP1
)
AND SC.Last_Mdt < '2014-01-01'
ORDER BY Last_Mdt desc
DROP TABLE #TEMP1
The difference is this query takes 2 seconds to execute vs the 13 minutes above. Whats going on here?
In both cases you're using a "correlated subquery", which executes for every row in XSales_Code that passes the Status = 1 AND Last_Mdt < '2014-01-01' conditions.
Think of it like this: XSales_Code is filtered by Status = 1 AND Last_Mdt < '2014-01-01', then SQL Server scans each row of this intermediate result, and for every single row it executes your SELECT DISTINCT SCode FROM XTransactions_01... query to see if the row should be included.
Your second query executes the correlated subquery the same number of times, but it's faster because it's executing against a smaller table.
Generally, the fastest way to do a NOT IN query is to left join to the "not in" subquery and then omit any rows where the left-joined column is null. This gets rid of the correlated subquery.
SELECT * FROM XSales_Code SC
LEFT JOIN (
SELECT DISTINCT SCode FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
AND SCode IS NOT NULL
) whatevs ON SC.SCode = whatevs.SCode
WHERE SC.Status = 1
AND SC.Last_Mdt < '2014-01-01'
AND whatevs.SCode IS NULL
ORDER BY Last_Mdt desc
This is hard to explain, but try running the query above without the second-to-last line (AND whatevs.SCode IS NULL) and you'll see how whatevs.SCODE has a value when the condition is "IN" and is null when the condition is "NOT IN".
Finally, I want to stress that correlated subqueries aren't inherently evil. Generally they work just fine for an IN condition and plenty of other use cases, but for a NOT IN condition they tend to be slow.
Related
We have a table called PROTOKOLL, with the following definition:
PROTOKOLL TableDefinition
The table has 10 million pcs of records.
SELECT *
FROM (SELECT /*+ FIRST_ROWS */ a.*, ROWNUM rnum
FROM (SELECT t0.*, t1.*
FROM PROTOKOLL t0
, PROTOKOLL t1
WHERE (
(
(t0.BENUTZER_ID = 'A07BU0006')
AND (t0.TYP = 'E')
) AND
(
(t1.UUID = t0.ANDERES_PROTOKOLL_UUID)
AND
(t1.TYP = 'A')
)
)
ORDER BY t0.ZEITPUNKT DESC
) a
WHERE ROWNUM <= 4999) WHERE rnum > 0;
So practically we join the table with itself through ANDERES_PROTOKOLL_UUID field, we apply simple filterings. The results are sorted with creation time and the number of the result record set is limited to 5000.
The elapsed time of the query is about 10 Minutes! --- which is not acceptable ☹
I already have the execution plan and statistic information in place and trying to figure out how to speed up the query, pls. find them attached.
My first recognition, that the optimizer puts “"P"."ANDERES_PROTOKOLL_UUID" IS NOT NULL” condition additionally to the where clause, but I do not know why. Is it a problem?
Or where are the bottleneck of the query?
How can I avoid….Any suggestion is welcome.
The following SQL :
SELECT *
FROM Transaction_Auth_Series t
WHERE t.Auth_ID =
(
SELECT MAX(p.Session_ID)
FROM Clone_Db_Derective p
WHERE p.Date = trunc(sysdate)
AND p.Regularity = 'THEME'
);
is very slow when the referred tables contain about 300 million rows. But, it's just a matter of few seconds when the SQL is written in two cursors, i.e.
CURSOR GetMaxValue IS
SELECT MAX(p.Session_ID)
FROM Clone_Db_Derective p
WHERE p.Date = trunc(sysdate)
AND p.Regularity = 'THEME'
CURSOR GetAllItems(temp VARCHAR2) IS
SELECT *
FROM Transaction_Auth_Series t
WHERE t.Auth_ID = temp;
and
........
FOR item in GETMAX LOOP
FOR itemx in GETITEMS(item.aaa) LOOP.......
Joins won't work as the tables are not related. How can we optimize the above main SQL please?
For this query:
SELECT t.*
FROM Transaction_Auth_Series t
WHERE t.Auth_ID = (SELECT MAX(p.Session_ID)
FROM Clone_Db_Derective p
WHERE p.Date = trunc(sysdate) AND p.Regularity = 'THEME'
);
I would recommend indexes on Clone_Db_Derective(Regularity, Date, Session_ID) and Transaction_Auth_Series(Auth_Id).
The optimization for this query (assuming the tables are not views) seems pretty simple. I am surprised that the cursor version is so much faster.
WITH max_session
AS (SELECT MAX (p.Session_ID) id
FROM Clone_Db_Derective p
WHERE p.Date = TRUNC (SYSDATE) AND p.Regularity = 'THEME')
SELECT *
FROM Transaction_Auth_Series t
WHERE t.Auth_ID = (SELECT id FROM max_session)
A WITH clause is most valuable when the result of the WITH query is required more than one time in the body of the main query such as where one averaged value needs to be compared against two or three times. The point is to minimize the number of accesses to a table joined multiple times into a single query.
I have 2 tables: Documents and DocumentAttributes.
Document with relevant columns DocID, DelFlag
DocumentAttributes: DocID, aID, aValue
Now I want all DocIDs and aValues with following restriction:
SELECT
[o1].[docid]
, [o1].[aValue]
FROM [DocumentAttributes] [o1]
WHERE [o1].[aID] = 9
AND [o1].[DocID] >= 2356
AND [o1].[DocID] < 90000000
AND [o1].[DocID] NOT IN
(
SELECT
[o].[DocID]
FROM [DocumentAttributes] [o]
WHERE [o].[aID] = 2
)
AND [o1].[DocID] IN
(
SELECT
[d].[DocID]
FROM [DOCUMENTS] [d]
WHERE [d].[DELFLAG] != 2
);
So I want all IDs where Documents have no Attribute with AttributeID = 2 and which are not marked as Deleted.
The SQL statement above works, but it's too slow since I have about 1kk documents with each having about 10 Attributes at least.
The 3 selects themselves cost less than 1 second, so the "not in" is the problem I guess.
Does anyone have an idea how to make it faster?
Thanks in advance
Re-write the not in as left join:
select o1.docid, o1.aValue
from DocumentAttributes o1
left join DocumentAttributes o on o1.DocID = o.DocID and o.aID = 2
where o1.aID = 9 and o1.DocID >= 2356 and o1.DocID < 90000000
and o.DocID is null
and o1.DocID in (
select d.DocID
from DOCUMENTS d
where d.DELFLAG != 2)
Oracle supports the minus keyword. That means you can replace this sort of thing
where myField not in (
select someField
from etc
)
with this sort of thing.
where myField in (
select someField
from wherever
where they are available
minus
select someField
from the same tables
where I want to exclude them
)
I suggest trying both this and the left join method to see which performs better.
Subselects from 'not in' statements are executed for every row, so maybe thats one of your cause.
I have 2 queries that I would like to make work together:
1) one query sums the number of geometry points within a certain distance of another and returns results only where the count is greater 6 points;
2) one query returns the unique ID for all points within that distance (without a count and so also without a minimum number of records)
I would like to generate a single query that returns the new_ref from table t2 for all (and only) the records that are summed in the first query. (Ideally the output would be as columns in a single row, but at the moment I can’t even get the records listed in a single column against multiple rows – so this is my first aim and I could leave the pivoting bit until later).
Obviously, the system is identifying the records to count them, so I would think it should be possible to ask which records they are…
Adding in the sum statement to the second query nullifies the results. Should I be structuring this as a sub-query and, if so, how would I do this?
Query 1 is:
DECLARE #radius as float = 50
SELECT
t1.new_ref,
t1.hatrisref,
SUM
(CASE WHEN t1.geolocation.STDistance(t2.Geolocation) <= #radius
THEN 1 Else 0
End) Group size'
FROM table1 as t1,
table1 as t2
WHERE
[t1].[new_ref] != [t2].[new_ref]
GROUP BY
[t1].[new_ref],
[t1].[hatrisref]
HAVING
SUM(CASE WHEN
t1.geolocation.STDistance(t2.Geolocation) <= #radius
THEN 1 Else 0
End) >5
ORDER BY
[t1].[new_ref],
[t1].[hatrisref]
query 2 is:
DECLARE #radius as float = 50
SELECT
t1.hatrisref,
t1.new_ref,
t2.new_ref
FROM table1 as t1,
table1 as t2
WHERE
[t1].[new_ref] != [t2].[new_ref]
and
t1.geolocation.STDistance(t2.Geolocation) <= #radius
GROUP BY
[t1].[new_ref],
[t1].[hatrisref],
t2.new_ref
ORDER BY
[t1].[hatrisref],
[t1].[new_ref],
t2.new_ref
Yes, a sub-query would work:
SELECT ...
FROM table1 as t1, table1 as t2
WHERE t1.new_ref != t2.new_ref
and t1.geolocation.STDistance(t2.Geolocation) <= #radius
and 5 < (select count(*)
from table1 as t3
WHERE t1.new_ref != t3.new_ref
and t1.geolocation.STDistance(t3.Geolocation) <= #radius
)
See this SQL Fiddle for a simplified example
I have a SQL statement like following:
SELECT {SOME CASE WHEN STATEMENTS}
FROM {SUBQUERY 1} A, {SUBQUERY 2} B
WHERE {SOME JOIN CONDITIONS}
Background:
Both subquery A and B can be executed in 15 seconds and returns fewer than 20 rows.
JOIN CONDITIONS are only three simple common fields bind together.
Execution plan looks fine, only 25 costs.
The problem in that the whole join operation takes 4 to 5 minutes to run.
Could anyone tell me any possibility to cause this?
Try to use NO_PUSH_PRED Hint:
http://docs.oracle.com/cd/E11882_01/server.112/e41084/sql_elements006.htm#BABGJBJC
Alternatively try to rewrite the query into something like:
SELECT {SOME CASE WHEN STATEMENTS}
FROM (
SELECT * FROM (
{ SUBQUERY 1}
)
WHERE rownum > 0
) A,
(
SELECT * FROM (
{SUBQUERY 2}
)
WHERE rownum > 0
) B
WHERE {SOME JOIN CONDITIONS}
this will prevent from pusing join predcates into nested subqueries.
Then both subqueries should be executed using "old" plans, and the total time should be 15+15 seconds + a small time to join 40 rows from both subqueries.