Optimise a Query Which is taking too long to Run - sql

Im Using toad for Oracle to run a query which is taking much too long to run, sometimes over 15 minutes.
The query is pulling memos which are left to be approved by managers. The query is not bringing back alot of rows. Typically when it is run it will return about 30 or 40 rows. The query needs to access a few tables for its information so I'm using alot of joins to get this information.
I have attached my query below.
If anyone can help with optimising this query I would be very greatfull.
Query:
SELECT (e.error_Description || DECODE(t.trans_Comment, 'N', '', '','', ' - ' || t.trans_Comment)) AS Title,
t.Date_Time_Recorded AS Date_Recorded,
DECODE(t.user_ID,0,'System',(SELECT Full_Name FROM employee WHERE t.user_Id = user_id)) AS Recorded_by,
DECODE(t.user_ID,0, Dm_General.getCalendarShiftName(t.Date_Time_Recorded), (SELECT shift FROM employee WHERE t.user_Id = user_id)) AS Shift,
l.Lot_Number AS entity_number,
ms.Line_Num,
'L' AS Entity_Type,
t.entity_id, l.lot_Id AS Lot_Id
FROM DAT_TRANSACTION t
JOIN ADM_ERRORCODES e ON e.error_id = t.error_id
JOIN ADM_ACTIONS a ON a.action_id = t.action_id,
DAT_LOT l
INNER JOIN Status s ON l.Lot_Status_ID = s.Status_ID,
DAT_MASTER ms
INNER JOIN ADM_LINE LN ON ms.Line_Num = LN.Line_Num
WHERE
(e.memo_req = 'Y' OR a.memo_req = 'Y')
AND ms.Run_type_Id = Constants.Runtype_Production_Run --Production Run type
AND s.completed_type NOT IN ('D', 'C', 'R') -- Destroyed /closed / Released
AND LN.GEN = '2GT'
AND (NOT EXISTS (SELECT 1 FROM LNK_MEMO_TRANS lnk, DAT_MEMO m
WHERE lnk.Trans_ID = t.trans_id AND lnk.Memo_ID = m.Memo_ID
AND NVL(m.approve, 'Y') = 'Y'))--If it's null, it's
been created and is awaiting approval
AND l.Master_ID = ms.Master_ID
AND t.Entity_ID = l.Lot_ID
AND t.Entity_Type IN ('L', 'G');

The usual cause for bad performance of queries is that Oracle can't find an appropriate index. Use EXPLAIN PLAN with TOAD so Oracle can tell you what it thinks the best way to execute the query. That should give you some idea when it uses indexes and when not.
For general pointers, see http://www.orafaq.com/wiki/Oracle_database_Performance_Tuning_FAQ
See here for EXPLAIN PLAN.

You have some function calls in your SQL:
dm_general.getcalendarshiftname(t.date_time_recorded)
constants.runtype_production_run
Function calls are slow in SQL, and depending on the query plan may get called redundantly many times - e.g. computing dm_general.getcalendarshiftname for rows that end up being filtered out of the results.
To see if this is a significant factor, try replacing the function calls with literal constants temporarily and see if the performance improves.
The number of function calls can sometimes be reduced by restructuring the query like this:
select /*+ no_merge(v) */ a, b, c, myfunction(d)
from
( select a, b, c, d
from my_table
where ...
) v;
This ensures that myfunction is only called for rows that will appear in the results.

I have replaced function calls with literal constants and this speeds it up by only a second or 2. The query is still taking about about 50 seconds to run.
Is there anything I can do around the Joins to help spped this up. Have a used the INNER JOIN function correctly here.
Im not really sure I understand what you mean about the below or how to use it.
I get the error d invalid identifier when I try to call the function in the second select
select /*+ no_merge(v) */ a, b, c, myfunction(d)
from
( select a, b, c, d
from my_table
where ...
) v;
Any other views would be greatly appreciated

Before we can say anything sensible, we have to take a look at where time is being spent. And that means you have to collect some information first.
Therefore, my standard reaction to a question like this, is this one: http://forums.oracle.com/forums/thread.jspa?threadID=501834
Regards,
Rob.

Related

Runtime of stored procedure with temporary tables varies intermittently

We are facing a performance issue while executing a stored procedure. It usually takes between 10-15 minutes to run, but sometimes it takes up to more than 30 minutes to execute.
We captured visualize plan execute files for the Normal run and Long run cases.
By checking the visualized plan we came to know that, one particular Insert block of code takes extra time in the long run. And by checking
"EXPLAIN PLAN FOR SQL PLAN CACHE ENTRY <plan_id> "
the table we found that the order of execution differs in the long run.
This is the block which takes extra time to run sometimes.
INSERT INTO #TMP_DATI_SALDI_LORDI_BASE (
"COD_SCENARIO","COD_PERIODO","COD_CONTO","COD_DEST1","COD_DEST2","COD_DEST3","COD_DEST4","COD_DEST5"
,"IMPORTO","COD_VALUTA","IMPORTO_VALUTA_ORIGINARIA","COD_VALUTA_ORIGINARIA","NOTE"
)
( SELECT
SCEN_P.SCENARIO
,SCEN_P.PERIOD
,ACCOUT_ADJ.ATTRIBUTO1 AS "COD_CONTO"
,DATAS_rev.COD_DEST1
,DATAS_rev.COD_DEST2
,DATAS_rev.COD_DEST3
,__typed_NString__($1, 50)
,'RPT_NON'
,SUM(
CASE WHEN INFO.INCOT = 'FOB' THEN
CASE ACCOUT_rev.ATTRIBUTO1 WHEN 'CalcInsurance' THEN
0
ELSE
DATAS_rev.IMPORTO
END
ELSE
DATAS_rev.IMPORTO
END
* (DATAS_ADJ.IMPORTO - DATAS.IMPORTO)
)
,DATAS_rev.COD_VALUTA
,SUM(
CASE WHEN INFO.INCOT = 'FOB' THEN
CASE ACCOUT_rev.ATTRIBUTO1 WHEN 'CalcInsurance' THEN
0
ELSE
DATAS_rev.IMPORTO_VALUTA_ORIGINARIA
END
ELSE
DATAS_rev.IMPORTO_VALUTA_ORIGINARIA
END
* (DATAS_ADJ.IMPORTO_VALUTA_ORIGINARIA - DATAS.IMPORTO_VALUTA_ORIGINARIA)
)
,DATAS_rev.COD_VALUTA_ORIGINARIA
,'CPM_SP_CACL_FY_E3 Parts Option ADJ'
FROM #TMP_TAGERT_SCEN_P SCEN_P
INNER JOIN #TMP_DATI_SALDI_LORDI_BASE DATAS_rev
ON DATAS_rev.COD_SCENARIO = SCEN_P.SCENARIO
AND DATAS_rev.COD_PERIODO = SCEN_P.PERIOD
AND LEFT(DATAS_rev.COD_DEST3, 1) = 'O'
INNER JOIN CONTO ACCOUT_rev
ON ACCOUT_rev.COD_CONTO = DATAS_rev.COD_CONTO
AND ACCOUT_rev.ATTRIBUTO1 IN ('CalcFOB','CalcInsurance') --FOB,Insurance(Ocean freight is Nothing by Option)
INNER JOIN #DSL DATAS
ON DATAS.COD_SCENARIO = 'LAUNCH'
AND DATAS.COD_PERIODO = 12
AND DATAS.COD_DEST1 = 'NC'
AND DATAS.COD_DEST2 = 'NC'
AND DATAS.COD_DEST3 = 'F001'
AND DATAS.COD_DEST4 = DATAS_rev.COD_DEST4
AND DATAS.COD_DEST5 = 'INP'
INNER JOIN CONTO ACCOUT
ON ACCOUT.COD_CONTO = DATAS.COD_CONTO
AND ACCOUT.ATTRIBUTO2 = 'E3'
INNER JOIN CONTO ACCOUT_ADJ
ON ACCOUT_ADJ.ATTRIBUTO3 = DATAS.COD_CONTO
AND ACCOUT_ADJ.ATTRIBUTO2 = 'HE3'
INNER JOIN #DSL DATAS_ADJ
ON LEFT(DATAS_ADJ.COD_SCENARIO,4) = LEFT(SCEN_P.SCENARIO,4)
AND DATAS_ADJ.COD_PERIODO = 12
AND DATAS_ADJ.COD_DEST1 = DATAS.COD_DEST1
AND DATAS_ADJ.COD_DEST2 = DATAS.COD_DEST2
AND DATAS_ADJ.COD_DEST3 = DATAS.COD_DEST3
AND DATAS_ADJ.COD_DEST4 = DATAS.COD_DEST4
AND DATAS_ADJ.COD_DEST5 = DATAS.COD_DEST5
AND DATAS_ADJ.COD_CONTO = ACCOUT_ADJ.COD_CONTO
LEFT OUTER JOIN #TMP_KDPWT_INCOTERMS INFO
ON INFO.P_CODE = DATAS.COD_DEST4
GROUP BY
SCEN_P.SCENARIO,SCEN_P.PERIOD,ACCOUT_ADJ.ATTRIBUTO1,DATAS_rev.COD_DEST1,DATAS_rev.COD_DEST2
,DATAS_rev.COD_DEST3, DATAS.COD_DEST4,DATAS_rev.COD_VALUTA,DATAS_rev.COD_VALUTA_ORIGINARIA,INFO.INCOT
)
I will share the order of execution details also for normal and long run case.
Could someone please help us to overcome this issue? And also we don't know how to fix the order of the join execution. Is there any way to fix the join order execution, Please guide us.
Thanks in advance
Vinothkumar
Without a lot more detailed information, there is no way to tell exactly why your INSERT statement shows this alternating runtime behaviour.
Based on my experience, such an analysis can take quite some time and there are only few people available that are capable to perform it. If you can get someone like that to look at this, make sure to understand and learn.
What I can tell from the information shared is this
using temporary tables to structure a multi-stage data flow is the wrong thing to do on SAP HANA. Instead, use table variables in SQLScript.
if you insist on using the temporary tables, make them at least column tables; this will allow to avoid a need for some internal data materialisation.
when using joins make sure that the joined columns are of the same data type. The explain plan is full of TO_INT(), TO_DECIMAL(), and other conversion functions. Those take time, memory, and make it hard for the optimiser(s) to estimate cardinalities.
as the statement uses a lot of temporary tables, the different join orders can easily result from different volumes of data that was present when the SQL was parsed, prepared and optimised. One option to avoid this is to have HANA ignore any cached plans for the statement. The documentation has the HINTS for that.
And that is about what I can say about this with the available information.

the below select statement takes a long in running

This select statement takes a long time running, after my investigation I found that the problem un subquery, stored procedure, please I appreciate your help.
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM
apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'
AND COKE_CHQ_NUMBER NOT IN (SELECT DISTINCT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
WHERE UPPER(COKE_CHQ_NUMBER_DELIVER_STATUS) <> 'DELIVERED')
AND COKE_CHQ_NUMBER NOT IN (SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V)
Well there are a few issues with your SELECT statement that you should address:
First let's look at this condition:
COKE_CHQ_NUMBER NOT IN (SELECT DISTINCT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
WHERE UPPER(COKE_CHQ_NUMBER_DELIVER_STATUS) <> 'DELIVERED')
First you select DISTINCT cheque numbers with a not delivered status then you say you don't want this. Rather than saying I don't want non delivered it is much more readable to say I want delivered ones. However this is not really an issue but rather it would make your SELECT easier to read and understand.
Second let's look at your second cheque condition:
COKE_CHQ_NUMBER NOT IN (SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V)
Here you want to exclude all cheques that have an entry in Q_COKE_AP_CHECKS_DELIVERY_ST_V. This makes your first DISTINCT condition redundant as whatever cheques numbers will bring back would be rejected by this second condition of yours. I do't know if Oracle SQL engine is clever enough to work out this redundancy but this could cause your slowness as SELECT distinct can take longer to run
In addition to this if you don't have them already I would recommend adding the following indexes:
CREATE INDEX index_1 ON q_coke_ap_checks_sign_status_v(coke_chq_number, coke_pay_supplier);
CREATE INDEX index_2 ON q_coke_ap_checks_sign_status_v(plan_id, coke_signature__a, coke_signature__b, coke_audit);
CREATE INDEX index_3 ON q_coke_ap_checks_delivery_st_v(coke_chq_number_deliver);
I called the index_1,2,3 for easy to read obviously not a good naming convention.
With this in place your select should be optimized to retrieve you your data in an acceptable performance. But of course it all depends on the size and the distribution of your data which is hard to control without performing specific data analysis.
looking to you code .. seems you have redundant where condition the second NOT IN implies the firts so you could avoid
you could also transform you NOT IN clause in a MINUS clause .. join the same query with INNER join of you not in subquery
and last be careful you have proper composite index on table
Q_COKE_AP_CHECKS_SIGN_STATUS_V
cols (plan_id,COKE_SIGNATURE__A , COKE_SIGNATURE__B, COKE_AUDIT, COKE_CHQ_NUMBER, COKE_PAY_SUPPLIER)
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM
apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'
MINUS
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
INNER JOIN (
SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
) T ON T.COKE_CHQ_NUMBER_DELIVER = apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'

Sybase DB - How to store a SELECT query result in a variable?

I need to store the result of the SELECT query below in a variable to cut down on computation time. The results are of the form 'X', 'Y', 'Z', ...
WHERE
PEL.kuerzel in (SELECT KL.kuerzel from ictq.KLE KL WHERE FachgruppeKuerzel=526)
Right now the SELECT query gets executed 3 times for each of ~2000 entries. If I was able to store the result locally, I would have to run it only once.
I'm working on a Sybase 11 database. How can I achieve this or anything similar?
The subquery where the snippet was taken from, out of a 150 line query alltogether:
SELECT list(PEL.kuerzel) from ictq.PEpisode PE
INNER JOIN ictq.PEpisodeLeistung PEL ON(PE.IDPATIENTEPISODE = PEL.IDPATIENTEPISODE)
WHERE
PE.IDPATIENTKLINIK = P.IDPATIENTKLINIK and
PEL.Datum between dateadd(month, -12, #startdatum) and #startdatum and
PEL.kuerzel in (SELECT kl.kuerzel from ictq.KLE kl where FachgruppeKuerzel=526)
I have no control over the structure and cannot add anything. The query in itself is legacy work and I'm happy it works as it is now. The slow computation, however, needs an overhaul.
It is often more efficient to use exists rather than in, or to move the in subquery to the from clause:
FROM PEL JOIN
(SELECT DISTINCT KL.kuerzel
FROM ictq.KLE KL
WHERE FachgruppeKuerzel = 526
) KL
ON PEL.kuerzel = KL.kuerzel;
For performance for this query, you want an index on ictq.KLE(FachgruppeKuerzel, kuerzel) and PEL(kuerzel).

Oracle Sub-select taking a long time

I have a SQL Query that comprise of two level sub-select. This is taking too much time.
The Query goes like:
select * from DALDBO.V_COUNTRY_DERIV_SUMMARY_XREF
where calculation_context_key = 130205268077
and DERIV_POSITION_KEY in
(select ctry_risk_derivs_psn_key
from DALDBO.V_COUNTRY_DERIV_PSN
where calculation_context_key = 130111216755
--and ctry_risk_derivs_psn_key = 76296412
and CREDIT_PRODUCT_TYPE = 'SWP OP'
and CALC_OBLIGOR_COUNTRY_OF_ASSETS in
(select ctry_cd
from DALDBO.V_PSN_COUNTRY
where calculation_context_key = 130134216755
--and ctry_risk_derivs_psn_key = 76296412
)
)
These tables are huge! Is there any optimizations available?
Without knowing anything about your table or view definitions, indexing, etc. I would start by looking at the sub-selects and ensuring that they are performing optimally. I would also want to know how many values are being returned by each sub-select as this can impact performance.
How is calculation_context_key used to retrieve rows from V_COUNTRY_DERIV_PSN and V_PSN_COUNTRY? Is it an optimal execution plan?
How is DERIV_POSITION_KEY and CALC_OBLIGOR_COUNTRY_OF_ASSETS used in V_COUNTRY_DERIV_SUMMARY_XREF to retrieve rows? Again, look at the explain plan.
first of all, can you write this query using inner joins (and not subselect) ??
select A.*
from DALDBO.V_COUNTRY_DERIV_SUMMARY_XREF a,
DALDBO.V_COUNTRY_DERIV_PSN b,
DALDBO.V_PSN_COUNTRY c
where calculation_context_key = 130205268077
and a.DERIV_POSITION_KEY = b.ctry_risk_derivs_psn_key
and b.calculation_context_key = 130111216755
--and b.ctry_risk_derivs_psn_key = 76296412
and b.CREDIT_PRODUCT_TYPE = 'SWP OP'
and b.CALC_OBLIGOR_COUNTRY_OF_ASSETS = c.ctry_cd
and c.calculation_context_key = 130134216755
--and c.ctry_risk_derivs_psn_key = 76296412
second, best practice says that when you don't query any data from the tables in the subselect you better of using an EXISTS instead of IN. new versions of oracle does that automatically and actually rewrite the whole thing as an inner join.
last, without any knowledge on you data and of what you are trying to do i would suggest you to try and use views as less as you can - if you can query the underling tables it would be best and you will probably see immediate performance improvement.

Validating optimization of an Oracle query

Ok, so I'm working on this (rather old) project at work, which uses loads of queries towards an Oracle database. I recently stumbled upon this gem, which takes about 6-7 hours to run and returns ~1400 rows. The table/view in question contains ~200'000 rows. I thought this felt like it was taking maybe a little longer than seemed reasonable, so I started having a closer look at it. Now I can't, for security/proprietary reasons, share the exact query, but this should show what the query does in more general terms:
SELECT
some_field,
some_other_field
FROM (
SELECT
*
FROM
some_view a
WHERE
some_criteria AND
a.client_no || ':' || a.engagement_no || ':' || a.registered_date = (
SELECT
b.client_no || ':' || b.engagement_no || ':' || MAX(b.registered_date)
FROM
some_view b
JOIN some_engagement_view e
ON e.client_no = b.client_no AND e.engagement_no = b.engagement_no
JOIN some_client_view c
ON c.client_no = b.client_no
WHERE
some_other_criteria AND
b.client_no = a.client_no AND
b.engagement_no = a.engagement_no
GROUP BY
b.client_no,
b.engagement_no
)
);
Basically what it is supposed to do, as far as I've managed to figure out, is to from some_view (which contains evaluations of customers/engagements) fetch the latest evaluation for every unique client/engagement.
The two joins are there to ensure that the client and engagement exists in another system, where they are primarily handled after you have done the evaluation in this system.
Notice how it concatenates two numbers and a date, and then compares that to a sub-query? "Interesting" design-choice. So I thought that if you replace the concatenation with a proper comparison you might get some kind of performance gain at least. Please notice that I primarily develop .NET and for the web, and am far from an expert when it comes to databases, but I rewrote it as follows:
SELECT
some_field,
some_other_filed
FROM
some_view a
WHERE
some_criteria AND
(a.client_no, a.engagement_no, a.registered_date) = (
SELECT
b.client_no,
b.engagement_no,
MAX(b.registered_date)
FROM
some_view b
JOIN some_engagement_view e
ON e.client_no = b.client_no AND e.engagement_no = b.engagement_no
JOIN some_client_view c
ON c.client_no = b.client_no
WHERE
some_other_criteria AND
b.client_no = a.client_no AND
b.engagement_no = a.engagement_no
GROUP BY
b.client_no,
b.engagement_no
)
);
Now if I replace the fields in the very first select with a COUNT(1), I get exactly the same number of rows with both queries, so a good start. The new query fetches data just as fast as it counts, < 10 seconds. The old query gets the count in ~20 seconds, and as I mentioned before, the data takes close to 6-7 hours. It is currently running so that I can do some kind of analysis to see if the new query is valid, but I thought that I'd ask here as well to see if there is anything apparently wrong that I have done?
EDIT Also removed the outer-most query, which did not seem to fulfill any kind of purpose, except maybe making the query look cooler.. or something.. I dunno..
Expanding on my comment... if I try to replicate your query structure using built-in views it also runs for a long time. For example, getting the most recently created table for each owner (purely for demo purposes, it can be done more simply) like this takes several minutes, with either version:
SELECT
owner,
object_name
FROM
all_objects a
WHERE
(a.owner, a.object_type, TRUNC(a.created)) = (
SELECT
b.owner, b.object_type, TRUNC(MAX(b.created))
FROM
all_objects b
JOIN all_tables e
ON e.owner = b.owner and e.table_name = b.object_name
JOIN all_users c
ON c.username = b.owner
WHERE
b.owner = a.owner AND
b.object_type = a.object_type
GROUP BY
b.owner,
b.object_type
);
If I rewrite that to avoid the self-join on all_objects (equivalent to some_view in your example) by using an analytic function instead:
SELECT
owner,
object_name
FROM (
SELECT
a.owner,
a.object_name,
row_number() over (partition by a.owner, a.object_type
order by a.created desc) as rn
FROM
all_objects a
JOIN all_tables e
ON e.owner = a.owner and e.table_name = a.object_name
JOIN all_users c
ON c.username = a.owner
)
WHERE
rn = 1;
... then it takes a few seconds.
Now, in this case I don't get exactly the same output because I have multiple objects created at the same time (within the same second as far as created is concerned).
I don't know the precision of the values stored in your registered_date of course. So you might need to look at different functions, possibly rank rather than row_number, or adjust the ordering to deal with ties if necessary.
rank() over (partition by a.owner, a.object_type
order by trunc(a.created) desc) as rn
...
WHERE
rn = 1;
gives me the same results (well, almost; the join to all_tables is also skewing things, as I seem to have tables listed in all_objects that aren't in all_tables, but that's a side issue). Or max could work too:
max(created) over (partition by a.owner, a.object_type) as mx
...
WHERE
TRUNC(created) = TRUNC(mx)
In both of those I'm using trunc to get everything on the same day; you may not need to if your registered_date doesn't have a time component.
But of course, check you do actually get the same results.