Oracle SQL taking long time in second invocation - sql

I am running a SQL query in Oracle 12.1.0 database, which is taking drastically different amount of time between first and subsequent invocations.
Our second and subsequent invocations are taking almost 10 times the time it took for first invocation.( too bad - that I don't have a cold backup ).
According to the developer, no changes have been made in the application or underlying tables.
We are truncating and analyzing (using DBMS_STATS) all relevant tables before each invocation. But are unable to replicate the performance seen in first invocation.
My gut feeling is that it has something to do with parsing and bad query plan.
The statement is as follows :
SELECT CASE input_val
WHEN 'DEL'
THEN
(SELECT a_key
FROM d_ad, ad_xr
WHERE d_ad.m_id = ad_xr.m_id
AND d_ad.source = ad_xr.source
AND ad_xr.c_id = :1
AND ad_xr.source = :3 )
WHEN 'WRITE'
THEN
(SELECT w_key
FROM wp_xr x, d_wl w
WHERE w.m_id = x.m_id
AND w.source = x.source
AND x.client_id = :4
AND x.source = :5 )
WHEN 'APPEND'
THEN
(SELECT a_key
FROM F0_A
WHERE a_id = :5 AND source = :7 )
END
FROM DUAL

Related

Runtime of stored procedure with temporary tables varies intermittently

We are facing a performance issue while executing a stored procedure. It usually takes between 10-15 minutes to run, but sometimes it takes up to more than 30 minutes to execute.
We captured visualize plan execute files for the Normal run and Long run cases.
By checking the visualized plan we came to know that, one particular Insert block of code takes extra time in the long run. And by checking
"EXPLAIN PLAN FOR SQL PLAN CACHE ENTRY <plan_id> "
the table we found that the order of execution differs in the long run.
This is the block which takes extra time to run sometimes.
INSERT INTO #TMP_DATI_SALDI_LORDI_BASE (
"COD_SCENARIO","COD_PERIODO","COD_CONTO","COD_DEST1","COD_DEST2","COD_DEST3","COD_DEST4","COD_DEST5"
,"IMPORTO","COD_VALUTA","IMPORTO_VALUTA_ORIGINARIA","COD_VALUTA_ORIGINARIA","NOTE"
)
( SELECT
SCEN_P.SCENARIO
,SCEN_P.PERIOD
,ACCOUT_ADJ.ATTRIBUTO1 AS "COD_CONTO"
,DATAS_rev.COD_DEST1
,DATAS_rev.COD_DEST2
,DATAS_rev.COD_DEST3
,__typed_NString__($1, 50)
,'RPT_NON'
,SUM(
CASE WHEN INFO.INCOT = 'FOB' THEN
CASE ACCOUT_rev.ATTRIBUTO1 WHEN 'CalcInsurance' THEN
0
ELSE
DATAS_rev.IMPORTO
END
ELSE
DATAS_rev.IMPORTO
END
* (DATAS_ADJ.IMPORTO - DATAS.IMPORTO)
)
,DATAS_rev.COD_VALUTA
,SUM(
CASE WHEN INFO.INCOT = 'FOB' THEN
CASE ACCOUT_rev.ATTRIBUTO1 WHEN 'CalcInsurance' THEN
0
ELSE
DATAS_rev.IMPORTO_VALUTA_ORIGINARIA
END
ELSE
DATAS_rev.IMPORTO_VALUTA_ORIGINARIA
END
* (DATAS_ADJ.IMPORTO_VALUTA_ORIGINARIA - DATAS.IMPORTO_VALUTA_ORIGINARIA)
)
,DATAS_rev.COD_VALUTA_ORIGINARIA
,'CPM_SP_CACL_FY_E3 Parts Option ADJ'
FROM #TMP_TAGERT_SCEN_P SCEN_P
INNER JOIN #TMP_DATI_SALDI_LORDI_BASE DATAS_rev
ON DATAS_rev.COD_SCENARIO = SCEN_P.SCENARIO
AND DATAS_rev.COD_PERIODO = SCEN_P.PERIOD
AND LEFT(DATAS_rev.COD_DEST3, 1) = 'O'
INNER JOIN CONTO ACCOUT_rev
ON ACCOUT_rev.COD_CONTO = DATAS_rev.COD_CONTO
AND ACCOUT_rev.ATTRIBUTO1 IN ('CalcFOB','CalcInsurance') --FOB,Insurance(Ocean freight is Nothing by Option)
INNER JOIN #DSL DATAS
ON DATAS.COD_SCENARIO = 'LAUNCH'
AND DATAS.COD_PERIODO = 12
AND DATAS.COD_DEST1 = 'NC'
AND DATAS.COD_DEST2 = 'NC'
AND DATAS.COD_DEST3 = 'F001'
AND DATAS.COD_DEST4 = DATAS_rev.COD_DEST4
AND DATAS.COD_DEST5 = 'INP'
INNER JOIN CONTO ACCOUT
ON ACCOUT.COD_CONTO = DATAS.COD_CONTO
AND ACCOUT.ATTRIBUTO2 = 'E3'
INNER JOIN CONTO ACCOUT_ADJ
ON ACCOUT_ADJ.ATTRIBUTO3 = DATAS.COD_CONTO
AND ACCOUT_ADJ.ATTRIBUTO2 = 'HE3'
INNER JOIN #DSL DATAS_ADJ
ON LEFT(DATAS_ADJ.COD_SCENARIO,4) = LEFT(SCEN_P.SCENARIO,4)
AND DATAS_ADJ.COD_PERIODO = 12
AND DATAS_ADJ.COD_DEST1 = DATAS.COD_DEST1
AND DATAS_ADJ.COD_DEST2 = DATAS.COD_DEST2
AND DATAS_ADJ.COD_DEST3 = DATAS.COD_DEST3
AND DATAS_ADJ.COD_DEST4 = DATAS.COD_DEST4
AND DATAS_ADJ.COD_DEST5 = DATAS.COD_DEST5
AND DATAS_ADJ.COD_CONTO = ACCOUT_ADJ.COD_CONTO
LEFT OUTER JOIN #TMP_KDPWT_INCOTERMS INFO
ON INFO.P_CODE = DATAS.COD_DEST4
GROUP BY
SCEN_P.SCENARIO,SCEN_P.PERIOD,ACCOUT_ADJ.ATTRIBUTO1,DATAS_rev.COD_DEST1,DATAS_rev.COD_DEST2
,DATAS_rev.COD_DEST3, DATAS.COD_DEST4,DATAS_rev.COD_VALUTA,DATAS_rev.COD_VALUTA_ORIGINARIA,INFO.INCOT
)
I will share the order of execution details also for normal and long run case.
Could someone please help us to overcome this issue? And also we don't know how to fix the order of the join execution. Is there any way to fix the join order execution, Please guide us.
Thanks in advance
Vinothkumar
Without a lot more detailed information, there is no way to tell exactly why your INSERT statement shows this alternating runtime behaviour.
Based on my experience, such an analysis can take quite some time and there are only few people available that are capable to perform it. If you can get someone like that to look at this, make sure to understand and learn.
What I can tell from the information shared is this
using temporary tables to structure a multi-stage data flow is the wrong thing to do on SAP HANA. Instead, use table variables in SQLScript.
if you insist on using the temporary tables, make them at least column tables; this will allow to avoid a need for some internal data materialisation.
when using joins make sure that the joined columns are of the same data type. The explain plan is full of TO_INT(), TO_DECIMAL(), and other conversion functions. Those take time, memory, and make it hard for the optimiser(s) to estimate cardinalities.
as the statement uses a lot of temporary tables, the different join orders can easily result from different volumes of data that was present when the SQL was parsed, prepared and optimised. One option to avoid this is to have HANA ignore any cached plans for the statement. The documentation has the HINTS for that.
And that is about what I can say about this with the available information.

Query taking long time to execute in different database instance

My application is used with different database instances. A particular query is executing in 1 second in all database instances except for one where it is taking more than 30 minutes. What can be the reason? Although data volume is almost the same. My Database is Oracle 11g.
Here is the query
SELECT b.VC_CUSTOMER_NAME customer,
TO_CHAR( sum(c.INV_VALUE), '999,999,999,999') value,
ROUND(
(SUM (c.inv_value) / (SELECT SUM (c.inv_value)
FROM mks_mst_customer b,
sls_temp_invoice_ticket c,
sls_dt_invoice_ticket d
WHERE c.vc_comp_code = b.vc_comp_code
AND b.vc_comp_code = '01'
AND INV_LABEL LIKE 'COLLECT FROM CUSTOMER%'
AND d.vc_ticket_no=c.vc_ticket_no
AND d.dt_invoice_date BETWEEN '01-Dec-2021' AND '07-Dec-2021'
AND b.nu_account_code=c.nu_account_code)
)* 100
) PERCENT
FROM mks_mst_customer b,
sls_temp_invoice_ticket c,
sls_dt_invoice_ticket d
WHERE c.vc_comp_code = b.vc_comp_code
AND b.vc_comp_code = '01'
AND INV_LABEL like 'COLLECT FROM CUSTOMER%'
AND b.nu_account_code=c.nu_account_code
AND d.vc_ticket_no=c.vc_ticket_no
AND d.dt_invoice_date BETWEEN '01-Dec-2021' AND '07-Dec-2021'
GROUP BY b.VC_CUSTOMER_NAME
ORDER BY SUM(c.INV_VALUE) DESC
The most obvious step would be to check indexes, on this slow instance they might not be configured.
Little more demanding would be to get statistics

How to iterate many Hive scripts over spark

I have many hive scripts (somewhat 20-25 scripts), each scripts having multiple queries. I want to run these scripts using spark so that the process can run faster. As map reduce job in hive takes long time to execute from spark it will be much faster. Below is the code I have written but its working for 3-4 files but when given multiple files with multiple queries its getting failed.
Below is the code for the same. Please help me if possible to optimize the same.
val spark = SparkSession.builder.master("yarn").appName("my app").enableHiveSupport().getOrCreate()
val filename = new java.io.File("/mapr/tmp/validation_script/").listFiles.filter(_.getName.endsWith(".hql")).toList
for ( i <- 0 to filename.length - 1)
{
val filename1 = filename(i)
scala.io.Source.fromFile(filename1).getLines()
.filterNot(_.isEmpty) // filter out empty lines
.foreach(query =>
spark.sql(query))
}
some of the error I cam getting is like
ERROR SparkSubmit: Job aborted.
org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:224)
ERROR FileFormatWriter: Aborting job null.
org.apache.spark.SparkException: Job aborted due to stage failure: ShuffleMapStage 12 (sql at validationtest.scala:67) has failed the maximum allowable number of times: 4. Most recent failure reason: org.apache.spark.shuffle.FetchFailedException: failed to allocate 16777216 byte(s) of direct memory (used: 1023410176, max: 1029177344) at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:528)
many different types of error I get when run the same code multiple times.
Below is how one of the HQL file looks like. its name is xyz.hql and has
drop table pontis_analyst.daydiff_log_sms_distribution
create table pontis_analyst.daydiff_log_sms_distribution as select round(datediff(date_sub(current_date(),cast(date_format(CURRENT_DATE ,'u') as int) ),cast(subscriberActivationDate as date))/7,4) as daydiff,subscriberkey as key from pontis_analytics.prepaidsubscriptionauditlog
drop table pontis_analyst.weekly_sms_usage_distribution
create table pontis_analyst.weekly_sms_usage_distribution as select sum(event_count_ge) as eventsum,subscriber_key from pontis_analytics.factadhprepaidsubscriptionsmsevent where effective_date_ge_prt < date_sub(current_date(),cast(date_format(CURRENT_DATE ,'u') as int) - 1 ) and effective_date_ge_prt >= date_sub(date_sub(current_date(),cast(date_format(CURRENT_DATE ,'u') as int) ),84) group by subscriber_key;
drop table pontis_analyst.daydiff_sms_distribution
create table pontis_analyst.daydiff_sms_distribution as select day.daydiff,sms.subscriber_key,sms.eventsum from pontis_analyst.daydiff_log_sms_distribution day inner join pontis_analyst.weekly_sms_usage_distribution sms on day.key=sms.subscriber_key
drop table pontis_analyst.weekly_sms_usage_final_distribution
create table pontis_analyst.weekly_sms_usage_final_distribution as select spp.subscriberkey as key, case when spp.tenure < 3 then round((lb.eventsum )/dayDiff,4) when spp.tenure >= 3 then round(lb.eventsum/12,4)end as result from pontis_analyst.daydiff_sms_distribution lb inner join pontis_analytics.prepaidsubscriptionsubscriberprofilepanel spp on spp.subscriberkey = lb.subscriber_key
INSERT INTO TABLE pontis_analyst.validatedfinalResult select 'prepaidsubscriptionsubscriberprofilepanel' as fileName, 'average_weekly_sms_last_12_weeks' as attributeName, tbl1_1.isEqual as isEqual, tbl1_1.isEqualCount as isEqualCount, tbl1_2.countAll as countAll, (tbl1_1.isEqualCount/tbl1_2.countAll)* 100 as percentage from (select tbl1_0.isEqual as isEqual, count(isEqual) as isEqualCount from (select case when round(aal.result) = round(srctbl.average_weekly_sms_last_12_weeks) then 1 when aal.result is null then 1 when aal.result = 'NULL' and srctbl.average_weekly_sms_last_12_weeks = '' then 1 when aal.result = '' and srctbl.average_weekly_sms_last_12_weeks = '' then 1 when aal.result is null and srctbl.average_weekly_sms_last_12_weeks = '' then 1 when aal.result is null and srctbl.average_weekly_sms_last_12_weeks is null then 1 else 0 end as isEqual from pontis_analytics.prepaidsubscriptionsubscriberprofilepanel srctbl left join pontis_analyst.weekly_sms_usage_final_distribution aal on srctbl.subscriberkey = aal.key) tbl1_0 group by tbl1_0.isEqual) tbl1_1 inner join (select count(*) as countAll from pontis_analytics.prepaidsubscriptionsubscriberprofilepanel) tbl1_2 on 1=1
Your issue is your code is running out of memory as shown below
failed to allocate 16777216 byte(s) of direct memory (used: 1023410176, max: 1029177344)
Though what you are trying to do is not optimal way of doing things in Spark but I would recommend that you remove the memory serialization as it will not help in anyways. You should cache data only if it is going to be used in multiple transformations. If it is going to be used once there is no reason to put the data in cache.

Optimize a SQL select query in a loop

I want to optimize a select query in a loop of a SQL procedure. The loop iterates around 10000 times and the select query takes approx. 30 ms for each iteration which increases the overall execution time of the procedure
SELECT *
FROM BANKACCOUNTS B,
MAPPING M,
UPL_DTR_UPLOAD UP,
(SELECT * FROM MAPPING WHERE SOURCE = 'KARVY_BANK_CODE') M1
WHERE B.SCHEME_CODE = M.INTERNALCODE
AND M1.INTERNALCODE = B.BANK_CODE
AND M.SOURCE = 'R0'
AND B.AC_TYPE = 'FUNDING'
AND M.EXTERNALCODE IS NOT NULL
AND UPPER(TRIM(M.EXTERNALCODE || M1.EXTERNALCODE || B.AC_NO)) =
Upper(UP.Scheme || UP.Fundingbnk || UP.fundingacc);
There are lot of solutions
But first use modern,explicit joins.
Your query for column m1 contains *, use required columns only
Check explain plan and use of index
Code:
SELECT *
FROM bankaccounts B
JOIN mapping M ON B.scheme_code = M.internalcode
JOIN
(SELECT internalcode, externalcode
FROM mapping
WHERE source = 'KARVY_BANK_CODE') M1 ON M1.internalcode = B.bank_code
JOIN upl_dtr_upload UP ON UPPER(TRIM(M.externalcode || M1.externalcode || B.ac_no)) = UPPER(UP.scheme || UP.fundingbnk || UP.fundingacc)
WHERE
M.source = 'R0'
AND B.ac_type = 'FUNDING'
AND M.externalcode IS NOT NULL;
As #LoztInSpace mentions, you can almost certainly replace your PL/SQL loop to iterate "about 10,000 times" to become the driving query. IE: if you need to do something with the results from each row returned in your query you posted, for each row in the "do something about 10,000 times" that implies the outside loop is another query, then nest your query (well, Kedar's version of the query) inside your outer loop.
Each execution of the PL/SQL loop is going to have to invoke the SQL engine, forcing a context switch; that is probably 10 ms of your 30 ms if not more. Search https://asktom.oracle.com with keywords PL/SQL "nested loop" for examples.
You can also look at PL/SQL bulk processing statements FORALL and BULK COLLECT for possible improvements.

Optimise a Query Which is taking too long to Run

Im Using toad for Oracle to run a query which is taking much too long to run, sometimes over 15 minutes.
The query is pulling memos which are left to be approved by managers. The query is not bringing back alot of rows. Typically when it is run it will return about 30 or 40 rows. The query needs to access a few tables for its information so I'm using alot of joins to get this information.
I have attached my query below.
If anyone can help with optimising this query I would be very greatfull.
Query:
SELECT (e.error_Description || DECODE(t.trans_Comment, 'N', '', '','', ' - ' || t.trans_Comment)) AS Title,
t.Date_Time_Recorded AS Date_Recorded,
DECODE(t.user_ID,0,'System',(SELECT Full_Name FROM employee WHERE t.user_Id = user_id)) AS Recorded_by,
DECODE(t.user_ID,0, Dm_General.getCalendarShiftName(t.Date_Time_Recorded), (SELECT shift FROM employee WHERE t.user_Id = user_id)) AS Shift,
l.Lot_Number AS entity_number,
ms.Line_Num,
'L' AS Entity_Type,
t.entity_id, l.lot_Id AS Lot_Id
FROM DAT_TRANSACTION t
JOIN ADM_ERRORCODES e ON e.error_id = t.error_id
JOIN ADM_ACTIONS a ON a.action_id = t.action_id,
DAT_LOT l
INNER JOIN Status s ON l.Lot_Status_ID = s.Status_ID,
DAT_MASTER ms
INNER JOIN ADM_LINE LN ON ms.Line_Num = LN.Line_Num
WHERE
(e.memo_req = 'Y' OR a.memo_req = 'Y')
AND ms.Run_type_Id = Constants.Runtype_Production_Run --Production Run type
AND s.completed_type NOT IN ('D', 'C', 'R') -- Destroyed /closed / Released
AND LN.GEN = '2GT'
AND (NOT EXISTS (SELECT 1 FROM LNK_MEMO_TRANS lnk, DAT_MEMO m
WHERE lnk.Trans_ID = t.trans_id AND lnk.Memo_ID = m.Memo_ID
AND NVL(m.approve, 'Y') = 'Y'))--If it's null, it's
been created and is awaiting approval
AND l.Master_ID = ms.Master_ID
AND t.Entity_ID = l.Lot_ID
AND t.Entity_Type IN ('L', 'G');
The usual cause for bad performance of queries is that Oracle can't find an appropriate index. Use EXPLAIN PLAN with TOAD so Oracle can tell you what it thinks the best way to execute the query. That should give you some idea when it uses indexes and when not.
For general pointers, see http://www.orafaq.com/wiki/Oracle_database_Performance_Tuning_FAQ
See here for EXPLAIN PLAN.
You have some function calls in your SQL:
dm_general.getcalendarshiftname(t.date_time_recorded)
constants.runtype_production_run
Function calls are slow in SQL, and depending on the query plan may get called redundantly many times - e.g. computing dm_general.getcalendarshiftname for rows that end up being filtered out of the results.
To see if this is a significant factor, try replacing the function calls with literal constants temporarily and see if the performance improves.
The number of function calls can sometimes be reduced by restructuring the query like this:
select /*+ no_merge(v) */ a, b, c, myfunction(d)
from
( select a, b, c, d
from my_table
where ...
) v;
This ensures that myfunction is only called for rows that will appear in the results.
I have replaced function calls with literal constants and this speeds it up by only a second or 2. The query is still taking about about 50 seconds to run.
Is there anything I can do around the Joins to help spped this up. Have a used the INNER JOIN function correctly here.
Im not really sure I understand what you mean about the below or how to use it.
I get the error d invalid identifier when I try to call the function in the second select
select /*+ no_merge(v) */ a, b, c, myfunction(d)
from
( select a, b, c, d
from my_table
where ...
) v;
Any other views would be greatly appreciated
Before we can say anything sensible, we have to take a look at where time is being spent. And that means you have to collect some information first.
Therefore, my standard reaction to a question like this, is this one: http://forums.oracle.com/forums/thread.jspa?threadID=501834
Regards,
Rob.