Query taking long time to execute in different database instance - sql

My application is used with different database instances. A particular query is executing in 1 second in all database instances except for one where it is taking more than 30 minutes. What can be the reason? Although data volume is almost the same. My Database is Oracle 11g.
Here is the query
SELECT b.VC_CUSTOMER_NAME customer,
TO_CHAR( sum(c.INV_VALUE), '999,999,999,999') value,
ROUND(
(SUM (c.inv_value) / (SELECT SUM (c.inv_value)
FROM mks_mst_customer b,
sls_temp_invoice_ticket c,
sls_dt_invoice_ticket d
WHERE c.vc_comp_code = b.vc_comp_code
AND b.vc_comp_code = '01'
AND INV_LABEL LIKE 'COLLECT FROM CUSTOMER%'
AND d.vc_ticket_no=c.vc_ticket_no
AND d.dt_invoice_date BETWEEN '01-Dec-2021' AND '07-Dec-2021'
AND b.nu_account_code=c.nu_account_code)
)* 100
) PERCENT
FROM mks_mst_customer b,
sls_temp_invoice_ticket c,
sls_dt_invoice_ticket d
WHERE c.vc_comp_code = b.vc_comp_code
AND b.vc_comp_code = '01'
AND INV_LABEL like 'COLLECT FROM CUSTOMER%'
AND b.nu_account_code=c.nu_account_code
AND d.vc_ticket_no=c.vc_ticket_no
AND d.dt_invoice_date BETWEEN '01-Dec-2021' AND '07-Dec-2021'
GROUP BY b.VC_CUSTOMER_NAME
ORDER BY SUM(c.INV_VALUE) DESC

The most obvious step would be to check indexes, on this slow instance they might not be configured.
Little more demanding would be to get statistics

Related

TPC-DS Query 6: Why do we need 'where j.i_category = i.i_category' condition?

I'm going through TPC-DS for Amazon Athena.
It was fine until query 5.
I got some problem on query 6. (which is below)
select a.ca_state state, count(*) cnt
from customer_address a
,customer c
,store_sales s
,date_dim d
,item i
where a.ca_address_sk = c.c_current_addr_sk
and c.c_customer_sk = s.ss_customer_sk
and s.ss_sold_date_sk = d.d_date_sk
and s.ss_item_sk = i.i_item_sk
and d.d_month_seq =
(select distinct (d_month_seq)
from date_dim
where d_year = 2002
and d_moy = 3 )
and i.i_current_price > 1.2 *
(select avg(j.i_current_price)
from item j
where j.i_category = i.i_category)
group by a.ca_state
having count(*) >= 10
order by cnt, a.ca_state
limit 100;
It took more than 30 minutes so it failed with timeout.
I tried to find which part cause problem, so I checked the where conditions and I found where j.i_category = i.i_category for the last part of where condition.
I don't know why this condition is needed so I deleted this part and the query ran Ok.
can you guys tell me why this part is needed?
The j.i_category = i.i_category is subquery correlation condition.
If you remove it from the subquery
select avg(j.i_current_price)
from item j
where j.i_category = i.i_category)
the subquery becomes uncorrelated, and becomes a global aggregation on the item table, which is easy to calculate and the query engine needs to do it once.
If you want a fast, performant query engine on AWS, i can recommend Starburst Presto (disclaimer: i am from Starburst). See https://www.concurrencylabs.com/blog/starburst-presto-vs-aws-redshift/ for a related comparison (note: this is not a comparison with Athena).
If it doesn't have to be that fast, you can use PrestoSQL on EMR (note that "PrestoSQL" and "Presto" components on EMR are not the same thing).

SQL Filtering duplicate rows due to bad ETL

The database is Postgres but any SQL logic should help.
I am retrieving the set of sales quotations that contain a given product within the bill of materials. I'm doing that in two steps: step 1, retrieve all DISTINCT quote numbers which contain a given product (by product number).
The second step, retrieve the full quote, with all products listed for each unique quote number.
So far, so good. Now the tough bit. Some rows are duplicates, some are not. Those that are duplicates (quote number & quote version & line number) might or might not have maintenance on them. I want to pick the row that has maintenance greater than 0. The duplicate rows I want to exclude are those that have a 0 maintenance. The problem is that some rows, which have no duplicates, have 0 maintenance, so I can't just filter on maintenance.
To make this exciting, the database holds quotes over 20+ years. And the data scientists guys have just admitted that maybe the ETL process has some bugs...
--- step 0
--- cleanup the workspace
SET CLIENT_ENCODING TO 'UTF8';
DROP TABLE IF EXISTS product_quotes;
--- step 1
--- get list of Product Quotes
CREATE TEMPORARY TABLE product_quotes AS (
SELECT DISTINCT master_quote_number
FROM w_quote_line_d
WHERE item_number IN ( << model numbers >> )
);
--- step 2
--- Now join on that list
SELECT
d.quote_line_number,
d.item_number,
d.item_description,
d.item_quantity,
d.unit_of_measure,
f.ref_list_price_amount,
f.quote_amount_entered,
f.negtd_discount,
--- need to calculate discount rate based on list price and negtd discount (%)
CASE
WHEN ref_list_price_amount > 0
THEN 100 - (ref_list_price_amount + negtd_discount) / ref_list_price_amount *100
ELSE 0
END AS discount_percent,
f.warranty_months,
f.master_quote_number,
f.quote_version_number,
f.maintenance_months,
f.territory_wid,
f.district_wid,
f.sales_rep_wid,
f.sales_organization_wid,
f.install_at_customer_wid,
f.ship_to_customer_wid,
f.bill_to_customer_wid,
f.sold_to_customer_wid,
d.net_value,
d.deal_score,
f.transaction_date,
f.reporting_date
FROM w_quote_line_d d
INNER JOIN product_quotes pq ON (pq.master_quote_number = d.master_quote_number)
INNER JOIN w_quote_f f ON
(f.quote_line_number = d.quote_line_number
AND f.master_quote_number = d.master_quote_number
AND f.quote_version_number = d.quote_version_number)
WHERE d.net_value >= 0 AND item_quantity > 0
ORDER BY f.master_quote_number, f.quote_version_number, d.quote_line_number
The logic to filter the duplicate rows is like this:
For each master_quote_number / version_number pair, check to see if there are duplicate line numbers. If so, pick the one with maintenance > 0.
Even in a CASE statement, I'm not sure how to write that.
Thoughts? The database is Postgres but any SQL logic should help.
I think you will want to use Window Functions. They are, in a word, awesome.
Here is a query that would "dedupe" based on your criteria:
select *
from (
select
* -- simplifying here to show the important parts
,row_number() over (
partition by master_quote_number, version_number
order by maintenance desc) as seqnum
from w_quote_line_d d
inner join product_quotes pq
on (pq.master_quote_number = d.master_quote_number)
inner join w_quote_f f
on (f.quote_line_number = d.quote_line_number
and f.master_quote_number = d.master_quote_number
and f.quote_version_number = d.quote_version_number)
) x
where seqnum = 1
The use of row_number() and the chosen partition by and order by criteria guarantee that only ONE row for each combination of quote_number/version_number will get the value of 1, and it will be the one with the highest value in maintenance (if your colleagues are right, there would only be one with a value > 0 anyway).
Can you do something like...
select
*
from
w_quote_line_d d
inner join
(
select
...
,max(maintenance)
from
w_quote_line_d
group by
...
) d1
on
d1.id = d.id
and d1.maintenance = d.maintenance;
Am I understanding your problem correctly?
Edit: Forgot the group by!
I'm not sure, but maybe you could Group By all other columns and use MAX(Maintenance) to get only the greatest.
What do you think?

SQl Query get data very slow from different tables

I am writing a sql query to get data from different tables but it is getting data from different tables very slowly.
Approximately above 2 minutes to complete.
What i am doing is here :
1. I am getting data differences and on behalf of date difference i am getting account numbers
2. I am comparing tables to get exact data i need.
here is my query
select T.accountno,
MAX(T.datetxn) as MxDt,
datediff(MM,MAX(T.datetxn), '2011-6-30') as Diffs,
max(P.Name) as POName
from Account_skd A,
AccountTxn_skd T,
POName P
where A.AccountNo = T.AccountNo and
GPOCode = A.OfficeCode and
Code = A.POCode and
A.servicecode = T.ServiceCode
group by T.AccountNo
order by len(T.AccountNo) DESC
please help that how i can use joins or any other way to get data within very less time say 5-10 seconds.
Since it appears you are getting EVERY ACCOUNT, and performance is slow, I would try by creating a prequery by just account, then do a single join to the other join tables something like..
select
T.Accountno,
T.MxDt,
datediff(MM, T.MxDt, '2011-6-30') as Diffs,
P.Name as POName
from
( select T1.AccountNo,
Max( T1.DateTxn ) MxDt
from AccontTxn_skd T1
group by T1.AccountNo ) T
JOIN Account_skd A
on T.AccountNo = A.AccountNo
JOIN POName P
on A.POCode = P.Code <-- GUESSING as you didn't qualify alias.field
AND A.OfficeCode = P.GPOCode <-- in your query for these two fields
order by
len(T.AccountNo) DESC
You had other elements based on the T.ServiceCode matching, but since you are only grouping on the account number anyhow, did it matter which service code was used? Otherwise, you would need to group by both the account AND service code (which I would have added the service code into the prequery and added as join condition to the account table too).

SQL "Count (Distinct...)" returns 1 less than actual data shows?

I have some data that doesn't appear to be counting correctly. When I look at the raw data I see 5 distinct values in a given column, but when I run an "Count (Distinct ColA)" it reports 4. This is true for all of the categories I am grouping by, too, not just one. E.g. a 2nd value in the column reports 2 when there are 3, a 3rd value reports 1 when there are 2, etc.
Table A: ID, Type
Table B: ID_FK, WorkID, Date
Here is my query that summarizes:
SELECT COUNT (DISTINCT B.ID_FK), A.Type
FROM A INNER JOIN B ON B.ID_FK = A.ID
WHERE Date > 5/1/2013 and Date < 5/2/2013
GROUP BY Type
ORDER BY Type
And a snippet of the results:
4|Business
2|Design
2|Developer
Here is a sample of my data, non-summarized. Pipe is the separator; I just removed the 'COUNT...' and 'GROUP BY...' parts of the query above to get this:
4507|Business
4515|Business
7882|Business
7889|Business
7889|Business
8004|Business
4761|Design
5594|Design
5594|Design
5594|Design
7736|Design
7736|Design
7736|Design
3132|Developer
3132|Developer
3132|Developer
4826|Developer
5403|Developer
As you can see from the data, Business should be 5, not 4, etc. At least that is what my eyes tell me. :)
I am running this inside a FileMaker 12 solution using it's internal ExecuteSQL call. Don't be concerned by that too much, though: the code should be the same as nearly anything else. :)
Any help would be appreciated.
Thanks,
J
Try using a subquery:
SELECT COUNT(*), Type
FROM (SELECT DISTINCT B.ID_FK, A.Type Type
FROM A
INNER JOIN B ON B.ID_FK = A.ID
WHERE Date > 5/1/2013 and Date < 5/2/2013) x
GROUP BY Type
ORDER BY Type
This could be a FileMaker issue, have you seen this post on the FileMaker forum? It describes the same issue (a count distinct smaller by 1) with 11V3 back in 03/2012 with a plug in, then updated with same issue with 12v3 in 11/2012 with ExecuteSQL. It didn't seem to be resolved in either case.
Other considerations might be if there are any referential integrity constraints on the joined tables, or if you can get a query execution plan, you might find it is executing the query differently than expected. not sure if FileMaker can do this.
I like Barmar's suggestion, it would sort twice.
If you are dealing with a bug, directing the COUNT DISTINCT, Join and/or Group By by structuring the query to make them happen at different times might work around it:
SELECT COUNT (DISTINCT x.ID), x.Type
FROM (SELECT A.ID ID, A.Type Type
FROM A
INNER JOIN B ON B.ID_FK = A.ID
WHERE B.Date > 5/1/2013 and B.Date < 5/2/2013) x
GROUP BY Type
ORDER BY Type
you might also try replacing B.ID_FK with A.ID, who knows what context it applies, such as:
SELECT COUNT (DISTINCT A.ID), A.Type

Optimise a Query Which is taking too long to Run

Im Using toad for Oracle to run a query which is taking much too long to run, sometimes over 15 minutes.
The query is pulling memos which are left to be approved by managers. The query is not bringing back alot of rows. Typically when it is run it will return about 30 or 40 rows. The query needs to access a few tables for its information so I'm using alot of joins to get this information.
I have attached my query below.
If anyone can help with optimising this query I would be very greatfull.
Query:
SELECT (e.error_Description || DECODE(t.trans_Comment, 'N', '', '','', ' - ' || t.trans_Comment)) AS Title,
t.Date_Time_Recorded AS Date_Recorded,
DECODE(t.user_ID,0,'System',(SELECT Full_Name FROM employee WHERE t.user_Id = user_id)) AS Recorded_by,
DECODE(t.user_ID,0, Dm_General.getCalendarShiftName(t.Date_Time_Recorded), (SELECT shift FROM employee WHERE t.user_Id = user_id)) AS Shift,
l.Lot_Number AS entity_number,
ms.Line_Num,
'L' AS Entity_Type,
t.entity_id, l.lot_Id AS Lot_Id
FROM DAT_TRANSACTION t
JOIN ADM_ERRORCODES e ON e.error_id = t.error_id
JOIN ADM_ACTIONS a ON a.action_id = t.action_id,
DAT_LOT l
INNER JOIN Status s ON l.Lot_Status_ID = s.Status_ID,
DAT_MASTER ms
INNER JOIN ADM_LINE LN ON ms.Line_Num = LN.Line_Num
WHERE
(e.memo_req = 'Y' OR a.memo_req = 'Y')
AND ms.Run_type_Id = Constants.Runtype_Production_Run --Production Run type
AND s.completed_type NOT IN ('D', 'C', 'R') -- Destroyed /closed / Released
AND LN.GEN = '2GT'
AND (NOT EXISTS (SELECT 1 FROM LNK_MEMO_TRANS lnk, DAT_MEMO m
WHERE lnk.Trans_ID = t.trans_id AND lnk.Memo_ID = m.Memo_ID
AND NVL(m.approve, 'Y') = 'Y'))--If it's null, it's
been created and is awaiting approval
AND l.Master_ID = ms.Master_ID
AND t.Entity_ID = l.Lot_ID
AND t.Entity_Type IN ('L', 'G');
The usual cause for bad performance of queries is that Oracle can't find an appropriate index. Use EXPLAIN PLAN with TOAD so Oracle can tell you what it thinks the best way to execute the query. That should give you some idea when it uses indexes and when not.
For general pointers, see http://www.orafaq.com/wiki/Oracle_database_Performance_Tuning_FAQ
See here for EXPLAIN PLAN.
You have some function calls in your SQL:
dm_general.getcalendarshiftname(t.date_time_recorded)
constants.runtype_production_run
Function calls are slow in SQL, and depending on the query plan may get called redundantly many times - e.g. computing dm_general.getcalendarshiftname for rows that end up being filtered out of the results.
To see if this is a significant factor, try replacing the function calls with literal constants temporarily and see if the performance improves.
The number of function calls can sometimes be reduced by restructuring the query like this:
select /*+ no_merge(v) */ a, b, c, myfunction(d)
from
( select a, b, c, d
from my_table
where ...
) v;
This ensures that myfunction is only called for rows that will appear in the results.
I have replaced function calls with literal constants and this speeds it up by only a second or 2. The query is still taking about about 50 seconds to run.
Is there anything I can do around the Joins to help spped this up. Have a used the INNER JOIN function correctly here.
Im not really sure I understand what you mean about the below or how to use it.
I get the error d invalid identifier when I try to call the function in the second select
select /*+ no_merge(v) */ a, b, c, myfunction(d)
from
( select a, b, c, d
from my_table
where ...
) v;
Any other views would be greatly appreciated
Before we can say anything sensible, we have to take a look at where time is being spent. And that means you have to collect some information first.
Therefore, my standard reaction to a question like this, is this one: http://forums.oracle.com/forums/thread.jspa?threadID=501834
Regards,
Rob.