Query Performance Opttimization using NOT IN(Oracle Sql Developer) - sql

I have been trying to optimize performance to the following query. I request all experts in this field to give me a hand and suggestions.
I have app. 70k records and my requirement says to remove duplicates. I need to improve the performance of the below query.
select *
from x.vw_records
where id not in
(select distinct id
from x.vw_datarecords
where effective_date >= trunc(sysdate - 30)
and book in (select book_shortname from x.vw_datarecords))
union
select distinct id
from x.vw_historyrecords
where effective_date >= trunc(sysdate - 30)
and book in (select book_shortname from x.vw_datarecords)
union
select distinct id
from x.vw_transactiondata
where effective_date >= trunc(sysdate - 30)
and book in (select book_shortname from x.vw_datarecords);
union
select distinct id
from x.vw_cashdata
where effective_date >= trunc(sysdate - 30)
and book in (select book_shortname from x.vw_datarecords)
Currently It takes ten minutes to count no. of rows using count(*). Suggest me any ideas to tune performance of this query.
Thanks in Advance.

I've always found better performance swapping out a NOT IN (query) with a left join + where iS NULL
example instead of:
select *
from x.vw_records
where id not in (
select distinct id
from x.vw_datarecords
where effective_date >= trunc(sysdate - 30)
and book in (
select book_shortname from x.vw_datarecords
)
use:
select *
from x.vw_records vr
left join vw_datarecords vdr on vr.id = vdr.id
and effective_date >= trunc(sysdate - 30)
and book in (
select book_shortname from x.vw_datarecords
)
where vdr.id IS NULL
additionally, you can sometimes get noticeably better performance by doing a group by rather than distinct.

I suspect you need indexes.
What indexes do you have on the tables involved in your query?
& Time to learn how to use an "explain plan" which is an essential tool for query optimization. It isn't that hard to get one. They may be a bit harder to understand however. Please include the explain plan output with your question.
EXPLAIN PLAN FOR
<<Your SQL_Statement here>>
;
SET LINESIZE 130
SET PAGESIZE 0
SELECT * FROM table(DBMS_XPLAN.DISPLAY);
There is absolutely zero benefit from using "select distinct" when you are using "union", do not do both, just do one.

If you could try to use exists/not exists clause in place of in/not in (http://www.techonthenet.com/sql/exists.php). That generally runs much faster.

Related

Speed up SQL simple Query

We have a table called PROTOKOLL, with the following definition:
PROTOKOLL TableDefinition
The table has 10 million pcs of records.
SELECT *
FROM (SELECT /*+ FIRST_ROWS */ a.*, ROWNUM rnum
FROM (SELECT t0.*, t1.*
FROM PROTOKOLL t0
, PROTOKOLL t1
WHERE (
(
(t0.BENUTZER_ID = 'A07BU0006')
AND (t0.TYP = 'E')
) AND
(
(t1.UUID = t0.ANDERES_PROTOKOLL_UUID)
AND
(t1.TYP = 'A')
)
)
ORDER BY t0.ZEITPUNKT DESC
) a
WHERE ROWNUM <= 4999) WHERE rnum > 0;
So practically we join the table with itself through ANDERES_PROTOKOLL_UUID field, we apply simple filterings. The results are sorted with creation time and the number of the result record set is limited to 5000.
The elapsed time of the query is about 10 Minutes! --- which is not acceptable ☹
I already have the execution plan and statistic information in place and trying to figure out how to speed up the query, pls. find them attached.
My first recognition, that the optimizer puts “"P"."ANDERES_PROTOKOLL_UUID" IS NOT NULL” condition additionally to the where clause, but I do not know why. Is it a problem?
Or where are the bottleneck of the query?
How can I avoid….Any suggestion is welcome.

Oracle subquery's SELECT MAX() on large dataset is very slow

The following SQL :
SELECT *
FROM Transaction_Auth_Series t
WHERE t.Auth_ID =
(
SELECT MAX(p.Session_ID)
FROM Clone_Db_Derective p
WHERE p.Date = trunc(sysdate)
AND p.Regularity = 'THEME'
);
is very slow when the referred tables contain about 300 million rows. But, it's just a matter of few seconds when the SQL is written in two cursors, i.e.
CURSOR GetMaxValue IS
SELECT MAX(p.Session_ID)
FROM Clone_Db_Derective p
WHERE p.Date = trunc(sysdate)
AND p.Regularity = 'THEME'
CURSOR GetAllItems(temp VARCHAR2) IS
SELECT *
FROM Transaction_Auth_Series t
WHERE t.Auth_ID = temp;
and
........
FOR item in GETMAX LOOP
FOR itemx in GETITEMS(item.aaa) LOOP.......
Joins won't work as the tables are not related. How can we optimize the above main SQL please?
For this query:
SELECT t.*
FROM Transaction_Auth_Series t
WHERE t.Auth_ID = (SELECT MAX(p.Session_ID)
FROM Clone_Db_Derective p
WHERE p.Date = trunc(sysdate) AND p.Regularity = 'THEME'
);
I would recommend indexes on Clone_Db_Derective(Regularity, Date, Session_ID) and Transaction_Auth_Series(Auth_Id).
The optimization for this query (assuming the tables are not views) seems pretty simple. I am surprised that the cursor version is so much faster.
WITH max_session
AS (SELECT MAX (p.Session_ID) id
FROM Clone_Db_Derective p
WHERE p.Date = TRUNC (SYSDATE) AND p.Regularity = 'THEME')
SELECT *
FROM Transaction_Auth_Series t
WHERE t.Auth_ID = (SELECT id FROM max_session)
A WITH clause is most valuable when the result of the WITH query is required more than one time in the body of the main query such as where one averaged value needs to be compared against two or three times. The point is to minimize the number of accesses to a table joined multiple times into a single query.

Minor change to SQL SERVER query causes extremely slow execution time

I dont understand whats functionally different about these 2 queries that would make them so different. First my initial query:
SELECT * FROM XSales_Code SC
WHERE SC.Status = 1
AND SC.SCode NOT IN
(
SELECT DISTINCT SCode FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
AND SCode IS NOT NULL
)
AND SC.Last_Mdt < '2014-01-01'
ORDER BY Last_Mdt desc
This took 13 minutes and 6 seconds to execute. Since I'm used to simple queries like this taking several seconds rather then several minutes I played around with and made this query which is, at least in my eyes, equivalent:
SELECT DISTINCT SCode INTO #TEMP1 FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
AND SCode IS NOT NULL
SELECT * FROM XSales_Code SC
WHERE SC.Status = 1
AND SC.SCode NOT IN
(
SELECT Scode FROM #TEMP1
)
AND SC.Last_Mdt < '2014-01-01'
ORDER BY Last_Mdt desc
DROP TABLE #TEMP1
The difference is this query takes 2 seconds to execute vs the 13 minutes above. Whats going on here?
In both cases you're using a "correlated subquery", which executes for every row in XSales_Code that passes the Status = 1 AND Last_Mdt < '2014-01-01' conditions.
Think of it like this: XSales_Code is filtered by Status = 1 AND Last_Mdt < '2014-01-01', then SQL Server scans each row of this intermediate result, and for every single row it executes your SELECT DISTINCT SCode FROM XTransactions_01... query to see if the row should be included.
Your second query executes the correlated subquery the same number of times, but it's faster because it's executing against a smaller table.
Generally, the fastest way to do a NOT IN query is to left join to the "not in" subquery and then omit any rows where the left-joined column is null. This gets rid of the correlated subquery.
SELECT * FROM XSales_Code SC
LEFT JOIN (
SELECT DISTINCT SCode FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
AND SCode IS NOT NULL
) whatevs ON SC.SCode = whatevs.SCode
WHERE SC.Status = 1
AND SC.Last_Mdt < '2014-01-01'
AND whatevs.SCode IS NULL
ORDER BY Last_Mdt desc
This is hard to explain, but try running the query above without the second-to-last line (AND whatevs.SCode IS NULL) and you'll see how whatevs.SCODE has a value when the condition is "IN" and is null when the condition is "NOT IN".
Finally, I want to stress that correlated subqueries aren't inherently evil. Generally they work just fine for an IN condition and plenty of other use cases, but for a NOT IN condition they tend to be slow.

Fetch data from table using SQL

I have a table named "Orders" with 1-1000 rows and 3 columns (S.no, Order and Status). I need to fetch Order from 50-1000 which has its Status as "Cancelled". How can i do this in SQL Server?
Logic operator:
SELECT Order
FROM Orders
WHERE Status = 'Cancelled'
AND (S.no > 50 AND S.no < 1000)
BETWEEN:
SELECT Order
FROM Orders
WHERE Status = 'Cancelled'
AND (S.no BETWEEN 50 and 1000)
select *
from orders
where no between 50 and 1000
and status = 'Cancelled'
Assuming you meant to say that the column was named "no". S.no would not be a valid column name.
You can try something like this:
SELECT *
FROM Orders
WHERE (S.no BETWEEN 50 AND 1000) AND (Status = 'Cancelled')
Hope this helps
If you're using SQL Server, you don't have access to Limit and Offset (unless that's changed in the last year or so, in which case please someone correct me).
There's a really nice generalizable solution discussed here: Equivalent of LIMIT and OFFSET for SQL Server?
I'd definitely take a look at that. If indeed your s_no values range from 1-1000, then the solution above by Notulysses should work just fine. But if you don't have so_no between 1-1000 (or in some other easy to filter way) then check out the solution linked to above. If you can what Notulysses recommended, go for it. If you need a generalizable solution, the one above is very good. I've also copied it below, for reference
;WITH Results_CTE AS
(
SELECT
Col1, Col2, ...,
ROW_NUMBER() OVER (ORDER BY SortCol1, SortCol2, ...) AS RowNum
FROM Table
WHERE <whatever>
)
SELECT *
FROM Results_CTE
WHERE RowNum >= #Offset
AND RowNum < #Offset + #Limit

Oracle/SQL - Need help optimizing this union/group/count query

I'm trying to optimize this query however possible. In my test tables this does exactly what I want it too, but on the live tables this takes a VERY long time to run.
select THING_,
count(case STATUS_ when '_Good_' then 1 end) as GOOD,
count(case STATUS_ when '_Bad_' then 1 end) as BAD,
count(case STATUS_ when '_Bad_' then 1 end) / count(case STATUS_ when '_Good_' then 1 end) * 100 as FAIL_PERCENT
from
(
select THING_,
STATUS_,
from <good table>
where TIMESTAMP_ > (sysdate - 1) and
STATUS_ = '_Good_' and
upper(THING_) like '%TEST%'
UNION ALL
select THING_,
STATUS_,
from <bad table>
where TIMESTAMP_ > (sysdate - 1) and
STATUS_ = '_Bad_' and
THING_THING_ like '%TEST%'
) u
group by THING_
I think by looking at the query it should be self explanatory what I want to do, but if not or if additional info is needed please let me know and I will post some sample tables.
Thanks!
Create composite indexes on (STATUS_, TIMESTAMP_) in both tables.
(1) Looking at the execution plan should always be your first step in diagnosing SQL performance issues
(2) A possible problem with the query as written is that, because SYSDATE is a function that is not evaluated until execution time (i.e. after the execution plan is determined), the optimizer cannot make use of histograms on the timestamp column to evaluate the utility of an index. I have seen that lead to bad optimizer decisions. If you can work out a way to calculate the date first then feed it into the query as a bind or a literal, that may help, although this is really just a guess.
(3) Maybe a better overall way to structure the query would be as a join (possibly full outer join) between aggregate queries on each of the tables.
SELECT COALESCE(g.thing_,b.thing_), COALESCE(good_count,0), COALESCE(bad_count,0)
FROM (SELECT thing_,count(*) good_count from good_table WHERE ... GROUP BY thing_) g
FULL OUTER JOIN
(SELECT thing_,count(*) bad_count from bad_table WHERE ... GROUP BY thing_) b
ON b.thing_ = g.thing_
(Have to say, it seems kind of weird that you have two separate tables when you also have a status column to indicate "good" or "bad". But maybe I am overinterpreting.)
Have you tried analytical function to use? It might decrease some time execution. Here you are an example:
select distinct col1, col2, col3
(Select col1,
count(col2) over (partition by col1) col2,
count(col3) over (partition by col1) col3
from table
)
Its something like that.